article

Open access

Optimal register reassignment for register stack overflow minimization

Authors:

Hwansoo HanAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 3, Issue 1

Pages 90 - 114

https://doi.org/10.1145/1132462.1132467

Published: 01 March 2006 Publication History

Abstract

Architectures with a register stack can implement efficient calling conventions. Using the overlapping of callers' and callees' registers, callers are able to pass parameters to callees without a memory stack. The most recent instance of a register stack can be found in the Intel Itanium architecture. A hardware component called the register stack engine (RSE) provides an illusion of an infinite-length register stack using a memory-backed process to handle overflow and underflow for a physically limited number of registers. Despite such hardware support, some applications suffer from the overhead required to handle register stack overflow and underflow. The memory latency associated with the overflow and underflow of a register stack can be reduced by generating multiple register allocation instructions within a procedure [Settle et al. 2003]. Live analysis is utilized to find a set of registers that are not required to keep their values across procedure boundaries. However, among those dead registers, only the registers that are consecutively located in a certain part of the register stack frame can be removed. We propose a compiler-supported register reassignment technique that reduces RSE overflow/underflow further. By reassigning registers based on live analysis, our technique forces as many dead registers to be removed as possible. We define the problem of optimal register reassignment, which minimizes interprocedural register stack heights considering multiple call sites within a procedure. We present how this problem is related to a path-finding problem in a graph called a sequence graph. We also propose an efficient heuristic algorithm for the problem. Finally, we present the measurement of effects of the proposed techniques on SPEC CINT2000 benchmark suite and the analysis of the results. The result shows that our approach reduces the RSE cycles by 6.4% and total cpu cycles by 1.7% on average.

References

[1]

Briggs, P., Cooper, K., and Torczon, L. 1994. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems 16, 3, 428--425.

Digital Library

[2]

Chaitin, G. 1982. Register allocation and spilling via graph coloring. In Proceedings of the SIGPLAN'82 Symposium on Compiler Construction. 201--207.

[3]

Chow, F. 1988. Minimizing register usage penalty at procedure calls. In Proceedings of the SIGPLAN'88 Conference on Programming Language Design and Implementation. 85--94.

[4]

Chow, F. and Hennessy, J. 1984. Register allocation by priority-based coloring. In Proceedings of the SIGPLAN'84 Symposium on Compiler Construction. 222--232.

[5]

Douillet, A., Amaral, J., and Gao, G. 2002. Fine-grained stacked register allocation for the itanium architecture. In 15th Workshop on Languages and Compilers for Parallel Computing.

[6]

Hoflehner, G., Kirkegaard, K., Skinner, R., Lavery, D., and Lee, Y. 2004. Compiler optimizations for transaction processing workloads on itanium linux systems. In Proceedings of the 37th International Symposium on Microarchitecture. 294--303.

[7]

Intel Corporation 2002. Intel Itanium Architecture Software Developer's Manual. Intel Corporation, Santa Clara, CA.

[8]

Intel Corporation and Chinese Academy of Sciences 2002. Open Research Compiler for Itanium Processor Family (ORC version 2.1). Intel Corporation and Chinese Academy of Sciences, http://ipf-orc.sourceforge.net.

[9]

Kurlander, S. and Fisher, C. 1996. Minimum cost interprocedural register allocation. In Proceedings of the 23rd SIGPLAN-SIGACT Symposium on Principles of Programming Language. 230--241.

[10]

Lueh, G. and Gross, T. 1997. Call-cost directed register allocation. In Proceedings of the SIGPLAN'97 Conference on Programming Language Design and Implementation. 296--307.

[11]

Settle, A., Connors, D., Hoflehner, G., and Lavery, D. 2003. Optimization for the intel itanium architecture register stack. In Proceedings of the International Symposium on Code Generation and Optimization. 115--124.

[12]

Steenkiste, P. and Henessy, J. 1989. A simple interprocedural register allocation algorithm and its effectiveness for list. ACM Transactions on Programming Languages and Systems 11, 1, 1--30.

Digital Library

[13]

Wall, D. 1986. Global register allocation at link time. In Proceedings of the SIGPLAN'86 Symposium on Compiler Construction. 264--275.

[14]

Weaver, D. and Germond, T. 1994. The SPARC Architecture Manual. SPARC International Inc., Menlo Park, CA.

[15]

Weldon, R., Chang, S., Wang, H., Hoflehner, G., Wang, P., Lavery, D., and Shen, J. 2002. Quantitative evaluation of the register stack engine and optimization for future itanium processors. In Proceedings of the Sixth Workshop on Interaction between Compilers and Computer Architectures. Boston.

[16]

Yang, L., Chan, S., Gao, G., Ju, R., Lueh, G., and Zhang, Z. 2003. Inter-procedural stacked register allocation for itanium like architecture. In Proceedings of the 17th International Conference on Supercomputing. 215--225.

Cited By

Hayes ALi LChavarría-Miranda DSong SZhang E(2016)OrionProceedings of the 17th International Middleware Conference10.1145/2988336.2988355(1-13)Online publication date: 28-Nov-2016
https://dl.acm.org/doi/10.1145/2988336.2988355
Hayes AZhang EBode AGerndt MStenström PRauchwerger LMiller BSchulz M(2014)Unified on-chip memory allocation for SIMT architectureProceedings of the 28th ACM international conference on Supercomputing10.1145/2597652.2597685(293-302)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2597652.2597685
Aslam FBaig GQureshi MUzmi ZFennell LThiemann PSchindelhauer CHaussmann E(2012)Rethinking Java call stack design for tiny embedded devicesACM SIGPLAN Notices10.1145/2345141.224842047:5(1-10)Online publication date: 12-Jun-2012
https://dl.acm.org/doi/10.1145/2345141.2248420
Show More Cited By

Index Terms

Optimal register reassignment for register stack overflow minimization

Recommendations

Differential register allocation
PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

Micro-architecture designers are very cautious about expanding the number of architected registers (also the register field), because increasing the register field adds to the code size, raises I-cache and memory pressure, complicates processor ...
Differential register allocation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

Micro-architecture designers are very cautious about expanding the number of architected registers (also the register field), because increasing the register field adds to the code size, raises I-cache and memory pressure, complicates processor ...
Achieving spilling-friendly register file assignment for highly distributed register files

Distributed register file architectures divide registers into multiple sets, and it follows that the register files could be small. This can increase the frequency of spilling if register allocation encounters high register pressure, which will reduce ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 3, Issue 1

March 2006

114 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/1132462

Issue’s Table of Contents

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2006

Published in TACO Volume 3, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
905
Total Downloads

Downloads (Last 12 months)113
Downloads (Last 6 weeks)30

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hayes ALi LChavarría-Miranda DSong SZhang E(2016)OrionProceedings of the 17th International Middleware Conference10.1145/2988336.2988355(1-13)Online publication date: 28-Nov-2016
https://dl.acm.org/doi/10.1145/2988336.2988355
Hayes AZhang EBode AGerndt MStenström PRauchwerger LMiller BSchulz M(2014)Unified on-chip memory allocation for SIMT architectureProceedings of the 28th ACM international conference on Supercomputing10.1145/2597652.2597685(293-302)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2597652.2597685
Aslam FBaig GQureshi MUzmi ZFennell LThiemann PSchindelhauer CHaussmann E(2012)Rethinking Java call stack design for tiny embedded devicesACM SIGPLAN Notices10.1145/2345141.224842047:5(1-10)Online publication date: 12-Jun-2012
https://dl.acm.org/doi/10.1145/2345141.2248420
Aslam FBaig GQureshi MUzmi ZFennell LThiemann PSchindelhauer CHaussmann EWilhelm RFalk HYi W(2012)Rethinking Java call stack design for tiny embedded devicesProceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems10.1145/2248418.2248420(1-10)Online publication date: 12-Jun-2012
https://dl.acm.org/doi/10.1145/2248418.2248420

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents