research-article

Open access

Register allocation for software pipelined multidimensional loops

Authors:

Alban Douillet,

Guang R. GaoAuthors Info & Claims

ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 30, Issue 4

Article No.: 23, Pages 1 - 68

https://doi.org/10.1145/1377492.1377498

Published: 01 August 2008 Publication History

Abstract

This article investigates register allocation for software pipelined multidimensional loops where the execution of successive iterations from an n-dimensional loop is overlapped. For single loop software pipelining, the lifetimes of a loop variable in successive iterations of the loop form a repetitive pattern. An effective register allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau's terminology [Rau 1992]) and map it to rotating registers. Unfortunately, the software pipelined schedule of a multidimensional loop is considerably more complex and so are the vector lifetimes in it.

In this article, we develop a way to normalize and represent the vector lifetimes, which captures their complexity, while exposing their regularity that enables a simple solution. The problem is formulated as bin-packing of the multidimensional vector lifetimes on the surface of a space-time cylinder. A metric, called distance, is calculated either conservatively or aggressively to guide the bin-packing process, so that there is no overlapping between any two vector lifetimes, and the register requirement is minimized. This approach subsumes the classical register allocation for software pipelined single loops as a special case. The method has been implemented in the ORC compiler and produced code for the IA-64 architecture. Experimental results show the effectiveness. Several strategies for register allocation are compared and analyzed.

References

[1]

Aiken, A., Nicolau, A., and Novack, S. 1995. Resource-constrained software pipelining. IEEE Trans. Parall. Distrib. Syst. 6, 12, 1248--1270.]]

Digital Library

[2]

Allan, V. H., Jones, R. B., Lee, R. M., and Allan, S. J. 1995. Software pipelining. ACM Comput. Surv. 27, 3, 367--432.]]

Digital Library

[3]

Allen, J. R., Kennedy, K., Porterfield, C., and Warren, J. 1983. Conversion of control dependence to data dependence. In Proceedings of the 10th Annual ACM Symposium on Principles of Programming Languages. 177--189.]]

Digital Library

[4]

Auslander, M. and Hopkins, M. 2004. An overview of the pl.8 compiler. SIGPLAN Notices 39, 4, 38--48.]]

Digital Library

[5]

Callahan, D. and Koblenz, B. 1991. Register allocation via hierarchical graph coloring. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation (PLDI'91). ACM Press, 192--203.]]

Digital Library

[6]

Carr, S., Ding, C., and Sweany, P. 1996. Improving software pipelining with unroll-and-jam. In Proceedings of the 29th Hawaii International Conference on System Sciences (HICSS'96), Software Technology and Architecture, vol. 1. IEEE Computer Society, 183.]]

Digital Library

[7]

Chaitin, G. 2004. Register allocation and spilling via graph coloring. SIGPLAN Notices 39, 4, 66--74.]]

Digital Library

[8]

Cheng, W.-K. and Lin, Y.-L. 1999. Code generation of nested loops for dsp processors with heterogeneous registers and structural pipelining. ACM Trans. Des. Autom. Electro. Syst. 4, 3, 231--256.]]

Digital Library

[9]

Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13, 4, 451--490.]]

Digital Library

[10]

Darte, A. and Robert, Y. 1994. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parall. Distrib. Syst. 5, 8, 814--822.]]

Digital Library

[11]

Dehnert, J. C. and Towle, R. A. 1993. Compiling for the cydra 5. J. Supercomput. 7, 1-2, 181--227.]]

Digital Library

[12]

Douillet, A. and Gao, G. R. 2005. Register pressure in software-pipelined loop nests: Fast computation and impact on architecture design. In The 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC'05). Hawthorne, NY, 17--31.]]

Digital Library

[13]

Douillet, A. and Gao, G. R. 2007. Software-pipelining on multi-core architectures. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007). IEEE Computer Society, 39--48.]]

Digital Library

[14]

Ebcioglu, K. and Nakatani, T. 1990. A new compilation technique for parallelizing loops with unpredictable branches on a vliw architecture. In Selected Papers of the Second Workshop on Languages and Compilers for Parallel Computing. Pitman Publishing, London, UK, 213--229.]]

Digital Library

[15]

Gao, G. R., Ning, Q., and Dongen, V. V. 1993. Software pipelining for nested loops. ACAPS Tech Memo 53, School of Computer Science, McGill Univ., Montréal, Québec.]]

[16]

Hendren, L. J., Gao, G. R., Altman, E. R., and Mukerji, C. 1992. A register allocation framework based on hierarchical cyclic interval graphs. In Proceedings of the 4th International Conference on Compiler Construction (CC '92). Springer-Verlag, 176--191.]]

Digital Library

[17]

Huff, R. A. 1993. Lifetime-sensitive modulo scheduling. In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (PLDI'93). Albuquerque, 258--267.]]

Digital Library

[18]

Intel. 2001. Intel IA-64 Architecture Software Developer's Manual. Vol. 1: IA-64 Application Architecture. Intel Corporation, Santa Clara, CA.]]

[19]

Lam, M. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (PLDI'88). 318--328.]]

Digital Library

[20]

Lamport, L. 1974. The parallel execution of DO loops. Comm. ACM 17, 2, 83--93.]]

Digital Library

[21]

Lawler, E. L., Lenstra, J. K., Khan, A. H. G. R., and Shmoys, D. B. 1985. The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. John Wiley & Sons.]]

[22]

Moon, S.-M. and Ebcioğlu, K. 1997. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Trans. Program. Lang. Syst. 19, 6, 853--898.]]

Digital Library

[23]

Muthukumar, K. and Doshi, G. 2001. Software pipelining of nested loops. Lecture Notes in Computer Science, Vol. 2027, 165--181.]]

Digital Library

[24]

Ramanujam, J. 1994. Optimal software pipelining of nested loops. In Proceedings of the 8th International Parallel Processing Symposium. IEEE, 335--342.]]

Digital Library

[25]

Rau, B. R. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture. San Jose, CA, 63--74.]]

Digital Library

[26]

Rau, B. R. and Fisher, J. A. 1993. Instruction-level parallel processing: History, overview and perspective. J. Supercomput. 7, 9--50.]]

Digital Library

[27]

Rau, B. R., Lee, M., Tirumalai, P. P., and Schlansker, M. S. 1992. Register allocation for modulo scheduled loops: Strategies, algorithms and heuristics. HP Labs Tech. rep. HPL-92-48, Hewlett-Packard Laboratories, Palo Alto, CA.]]

[28]

Rong, H., Douillet, A., Govindarajan, R., and Gao, G. R. 2004. Code generation for single-dimension software pipelining of multi-dimensional loops. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'04). IEEE Computer Society, 175--186.]]

Digital Library

[29]

Rong, H. and Govindarajan, R. 2007. Advances in software piplining. In The Compiler Design Handbook: Optimization and Machine Code Generation, 2nd Ed. Y. N. Srikant and P. Shankar, Eds. CRC, Chapter 20.]]

[30]

Rong, H., Tang, Z., Govindarajan, R., Douillet, A., and Gao, G. R. 2007a. Single-dimension software pipelining for multi-dimensional loops. CAPSL Technical Memo, Department of Electrical and Computer Engineering, University of Delaware, Newark, DE. In ftp://ftp.capsl.udel.edu/pub/doc/memos/memo049.ps.gz.]]

[31]

Rong, H., Tang, Z., Govindarajan, R., Douillet, A., and Gao, G. R. 2007b. Single-dimension software pipelining for multidimensional loops. ACM Trans. Architec. Code Optim. 4, 1, 7.]]

Digital Library

[32]

Turkington, K., Masselos, K., Constantinides, G. A., and Leong, P. 2006. FPGA based acceleration of the linpack benchmark: A high level code transformation approach. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). Madrid, Spain. IEEE, 1--6.]]

[33]

Wang, J. and Gao, G. R. 1996. Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops. In Proceedings of the 6th International Conference on Compiler Construction (CC '96). Springer-Verlag, London, UK, 1--17.]]

Digital Library

Cited By

Touati SDinechin B(2014)BibliographyAdvanced Backend Code Optimization10.1002/9781118625446.biblio(327-343)Online publication date: 3-Jun-2014
https://doi.org/10.1002/9781118625446.biblio
Bachir MTouati SBrault FGregg DCohen A(2012)Minimal Unroll Factor for Code Generation of Software PipeliningInternational Journal of Parallel Programming10.1007/s10766-012-0203-z41:1(1-58)Online publication date: 17-Jul-2012
https://doi.org/10.1007/s10766-012-0203-z

Index Terms

Register allocation for software pipelined multidimensional loops
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Register allocation for software pipelined multi-dimensional loops
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper ...
Register allocation for software pipelined multi-dimensional loops
PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper ...
Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors

Integrating register allocation and software pipelining of loops is an active research area. We focus on techniques that precondition the dependence graph before software pipelining in order to ensure that no register spill instructions are inserted by ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems

ACM Transactions on Programming Languages and Systems Volume 30, Issue 4

July 2008

358 pages

ISSN:0164-0925

EISSN:1558-4593

DOI:10.1145/1377492

Issue’s Table of Contents

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2008

Accepted: 01 August 2007

Revised: 01 May 2007

Received: 01 September 2006

Published in TOPLAS Volume 30, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
577
Total Downloads

Downloads (Last 12 months)73
Downloads (Last 6 weeks)10

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Touati SDinechin B(2014)BibliographyAdvanced Backend Code Optimization10.1002/9781118625446.biblio(327-343)Online publication date: 3-Jun-2014
https://doi.org/10.1002/9781118625446.biblio
Bachir MTouati SBrault FGregg DCohen A(2012)Minimal Unroll Factor for Code Generation of Software PipeliningInternational Journal of Parallel Programming10.1007/s10766-012-0203-z41:1(1-58)Online publication date: 17-Jul-2012
https://doi.org/10.1007/s10766-012-0203-z

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents