skip to main content
article

Register allocation for software pipelined multi-dimensional loops

Published: 12 June 2005 Publication History

Abstract

Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper investigates register allocation for software pipelined multi-dimensional loops.For single loop software pipelining, the lifetime instances of a loop variant in successive iterations of the loop form a repetitive pattern. An effective register allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau's terminology) and map it to rotating registers. Unfortunately, the software pipelined schedule of a multi-dimensional loop is considerably more complex, and so are the vector lifetimes in it.In this paper, we develop a way to normalize and represent vector lifetimes in multi-dimensional loop software pipelining, which capture their complexity, while exposing their regularity that enables us to develop a simple, yet powerful solution. Our algorithm is based on the development of a metric, called distance, that quantitatively determines the degree of potential overlapping (conflicts) between two vector lifetimes. We show how to calculate and use the distance, conservatively or aggressively, to guide the register allocation of the vector lifetimes under a bin-packing algorithm framework. The classical register allocation for software pipelined single loops is subsumed by our method as a special case.The method has been implemented in the ORC compiler and produced code for the Itanium architecture. We report the effectiveness of our method on 134 loop nests with 348 loop levels. Several strategies for register allocation are compared and analyzed.

References

[1]
V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan. Software pipelining. ACM Computing Surveys, 27(3):367--432, 1995.]]
[2]
D. Callahan and B. Koblenz. Register allocation via hierarchical graph coloring. In Proc. of PLDI'91, pages 192--203, 1991.]]
[3]
S. Carr, C. Ding, and P. Sweany. Improving software pipelining with unroll-and-jam. In Proc. of HICSS'96, pages 183--192, 1996.]]
[4]
G. J. Chaitin. Register allocation & spilling via graph coloring. In Proc. of CC'82, pages 98--101. ACM Press, 1982.]]
[5]
J. C. Dehnert and R. A. Towle. Compiling for the Cydra 5. J. Supercomput., 7(1-2):181--227, 1993.]]
[6]
L. J. Hendren, G. R. Gao, E. R. Altman, and C. Mukerji. A register allocation framework based on hierarchical cyclic interval graphs. In Computational Complexity, pages 176--191, 1992.]]
[7]
R. A. Huff. Lifetime-sensitive modulo scheduling. In PLDI'93, pages 258--267, 1993.]]
[8]
Intel. Intel IA-64 Architecture Software Developer's Manual, Vol. 1. Intel, 2001.]]
[9]
M. Lam. Software pipelining: An effective scheduling technique for VLIW machines. In Proc. of PLDI'88, pages 318--328, 1988.]]
[10]
J. Llosa, M. Valero, and E. Ayguad. Heuristics for Register-Constrained Software Pipelining. In Proc. of MICRO'96, pages 250--261, 1996.]]
[11]
K. Muthukumar and G. Doshi. Software pipelining of nested loops. LNCS, 2027:165--181, 2001.]]
[12]
B. R. Rau, M. Lee, P. P. Tirumalai, and M. S. Schlansker. Register allocation for software pipelined loops. In PLDI'92, pages 283--299, 1992.]]
[13]
H. Rong, A. Douillet, R. Govindarajan, and G. R. Gao. Code generation for single-dimension software pipelining of multi-dimensional loops. In Proc. of CGO'04, pages 175--186, 2004.]]
[14]
H. Rong, Z. Tang, R. Govindarajan, A. Douillet, and G. R. Gao. Single-dimension software pipelining for multi-dimensional loops. In Proc. of CGO'04, pages 163--174, 2004.]]

Cited By

View all
  • (2019)Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processorsJournal of Systems and Software10.1016/j.jss.2009.11.72783:5(772-785)Online publication date: 3-Jan-2019
  • (2018)A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture detailsComputing10.1007/s00607-016-0535-499:9(865-888)Online publication date: 31-Dec-2018
  • (2015)A methodology for speeding up loop kernels by exploiting the software information and the memory architectureComputer Languages, Systems and Structures10.1016/j.cl.2015.01.00341:C(21-41)Online publication date: 1-Apr-2015
  • Show More Cited By

Index Terms

  1. Register allocation for software pipelined multi-dimensional loops

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 40, Issue 6
    Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
    June 2005
    325 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1064978
    Issue’s Table of Contents
    • cover image ACM Conferences
      PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
      June 2005
      338 pages
      ISBN:1595930566
      DOI:10.1145/1065010
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 June 2005
    Published in SIGPLAN Volume 40, Issue 6

    Check for updates

    Author Tags

    1. register allocation
    2. software pipelining

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processorsJournal of Systems and Software10.1016/j.jss.2009.11.72783:5(772-785)Online publication date: 3-Jan-2019
    • (2018)A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture detailsComputing10.1007/s00607-016-0535-499:9(865-888)Online publication date: 31-Dec-2018
    • (2015)A methodology for speeding up loop kernels by exploiting the software information and the memory architectureComputer Languages, Systems and Structures10.1016/j.cl.2015.01.00341:C(21-41)Online publication date: 1-Apr-2015
    • (2009)Periodic register saturation in innermost loopsParallel Computing10.1016/j.parco.2008.12.00135:4(239-254)Online publication date: 1-Apr-2009
    • (2014)Loop scheduling with memory access reduction subject to register constraints for DSP applicationsSoftware—Practice & Experience10.1002/spe.218644:8(999-1026)Online publication date: 1-Aug-2014
    • (2013)Allocating rotating registers by schedulingProceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2540708.2540738(346-358)Online publication date: 7-Dec-2013
    • (2011)Leakage-Aware Modulo Scheduling for Embedded VLIW ProcessorsJournal of Computer Science and Technology10.1007/s11390-011-1143-626:3(405-417)Online publication date: 12-May-2011
    • (2009)Advances in Software PipeliningThe Compiler Design Handbook10.1201/9781420043839.ch20(20-1-20-73)Online publication date: 7-Dec-2009
    • (2007)Single-dimension software pipelining for multidimensional loopsACM Transactions on Architecture and Code Optimization10.1145/1216544.12165504:1(7-es)Online publication date: 1-Mar-2007
    • (2007)Rotating Register Allocation for Enhanced Pipeline Scheduling16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)10.1109/PACT.2007.4336200(60-72)Online publication date: Sep-2007
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media