skip to main content
article
Open access

Optimal register reassignment for register stack overflow minimization

Published: 01 March 2006 Publication History

Abstract

Architectures with a register stack can implement efficient calling conventions. Using the overlapping of callers' and callees' registers, callers are able to pass parameters to callees without a memory stack. The most recent instance of a register stack can be found in the Intel Itanium architecture. A hardware component called the register stack engine (RSE) provides an illusion of an infinite-length register stack using a memory-backed process to handle overflow and underflow for a physically limited number of registers. Despite such hardware support, some applications suffer from the overhead required to handle register stack overflow and underflow. The memory latency associated with the overflow and underflow of a register stack can be reduced by generating multiple register allocation instructions within a procedure [Settle et al. 2003]. Live analysis is utilized to find a set of registers that are not required to keep their values across procedure boundaries. However, among those dead registers, only the registers that are consecutively located in a certain part of the register stack frame can be removed. We propose a compiler-supported register reassignment technique that reduces RSE overflow/underflow further. By reassigning registers based on live analysis, our technique forces as many dead registers to be removed as possible. We define the problem of optimal register reassignment, which minimizes interprocedural register stack heights considering multiple call sites within a procedure. We present how this problem is related to a path-finding problem in a graph called a sequence graph. We also propose an efficient heuristic algorithm for the problem. Finally, we present the measurement of effects of the proposed techniques on SPEC CINT2000 benchmark suite and the analysis of the results. The result shows that our approach reduces the RSE cycles by 6.4% and total cpu cycles by 1.7% on average.

References

[1]
Briggs, P., Cooper, K., and Torczon, L. 1994. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems 16, 3, 428--425.
[2]
Chaitin, G. 1982. Register allocation and spilling via graph coloring. In Proceedings of the SIGPLAN'82 Symposium on Compiler Construction. 201--207.
[3]
Chow, F. 1988. Minimizing register usage penalty at procedure calls. In Proceedings of the SIGPLAN'88 Conference on Programming Language Design and Implementation. 85--94.
[4]
Chow, F. and Hennessy, J. 1984. Register allocation by priority-based coloring. In Proceedings of the SIGPLAN'84 Symposium on Compiler Construction. 222--232.
[5]
Douillet, A., Amaral, J., and Gao, G. 2002. Fine-grained stacked register allocation for the itanium architecture. In 15th Workshop on Languages and Compilers for Parallel Computing.
[6]
Hoflehner, G., Kirkegaard, K., Skinner, R., Lavery, D., and Lee, Y. 2004. Compiler optimizations for transaction processing workloads on itanium linux systems. In Proceedings of the 37th International Symposium on Microarchitecture. 294--303.
[7]
Intel Corporation 2002. Intel Itanium Architecture Software Developer's Manual. Intel Corporation, Santa Clara, CA.
[8]
Intel Corporation and Chinese Academy of Sciences 2002. Open Research Compiler for Itanium Processor Family (ORC version 2.1). Intel Corporation and Chinese Academy of Sciences, http://ipf-orc.sourceforge.net.
[9]
Kurlander, S. and Fisher, C. 1996. Minimum cost interprocedural register allocation. In Proceedings of the 23rd SIGPLAN-SIGACT Symposium on Principles of Programming Language. 230--241.
[10]
Lueh, G. and Gross, T. 1997. Call-cost directed register allocation. In Proceedings of the SIGPLAN'97 Conference on Programming Language Design and Implementation. 296--307.
[11]
Settle, A., Connors, D., Hoflehner, G., and Lavery, D. 2003. Optimization for the intel itanium architecture register stack. In Proceedings of the International Symposium on Code Generation and Optimization. 115--124.
[12]
Steenkiste, P. and Henessy, J. 1989. A simple interprocedural register allocation algorithm and its effectiveness for list. ACM Transactions on Programming Languages and Systems 11, 1, 1--30.
[13]
Wall, D. 1986. Global register allocation at link time. In Proceedings of the SIGPLAN'86 Symposium on Compiler Construction. 264--275.
[14]
Weaver, D. and Germond, T. 1994. The SPARC Architecture Manual. SPARC International Inc., Menlo Park, CA.
[15]
Weldon, R., Chang, S., Wang, H., Hoflehner, G., Wang, P., Lavery, D., and Shen, J. 2002. Quantitative evaluation of the register stack engine and optimization for future itanium processors. In Proceedings of the Sixth Workshop on Interaction between Compilers and Computer Architectures. Boston.
[16]
Yang, L., Chan, S., Gao, G., Ju, R., Lueh, G., and Zhang, Z. 2003. Inter-procedural stacked register allocation for itanium like architecture. In Proceedings of the 17th International Conference on Supercomputing. 215--225.

Cited By

View all
  • (2016)OrionProceedings of the 17th International Middleware Conference10.1145/2988336.2988355(1-13)Online publication date: 28-Nov-2016
  • (2014)Unified on-chip memory allocation for SIMT architectureProceedings of the 28th ACM international conference on Supercomputing10.1145/2597652.2597685(293-302)Online publication date: 10-Jun-2014
  • (2012)Rethinking Java call stack design for tiny embedded devicesACM SIGPLAN Notices10.1145/2345141.224842047:5(1-10)Online publication date: 12-Jun-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 3, Issue 1
March 2006
114 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/1132462
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2006
Published in TACO Volume 3, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Register assignment
  2. register allocation
  3. register stack
  4. sequence graph

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)113
  • Downloads (Last 6 weeks)30
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2016)OrionProceedings of the 17th International Middleware Conference10.1145/2988336.2988355(1-13)Online publication date: 28-Nov-2016
  • (2014)Unified on-chip memory allocation for SIMT architectureProceedings of the 28th ACM international conference on Supercomputing10.1145/2597652.2597685(293-302)Online publication date: 10-Jun-2014
  • (2012)Rethinking Java call stack design for tiny embedded devicesACM SIGPLAN Notices10.1145/2345141.224842047:5(1-10)Online publication date: 12-Jun-2012
  • (2012)Rethinking Java call stack design for tiny embedded devicesProceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems10.1145/2248418.2248420(1-10)Online publication date: 12-Jun-2012

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media