research-article

Register allocation for Intel processor graphics

Authors:
Wei-Yu Chen

Intel, USA

Intel, USA
View Profile

,
Guei-Yuan Lueh

Intel, USA

Intel, USA
View Profile

,
Pratik Ashar

Intel, India

Intel, India
View Profile

,
Kaiyu Chen

Intel, USA

Intel, USA
View Profile

,
Buqi Cheng

Intel, USA

Intel, USA
View Profile

CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and OptimizationFebruary 2018Pages 352–364https://doi.org/10.1145/3168806

Published:24 February 2018Publication History

CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization

Pages 352–364

ABSTRACT

Register allocation is a well-studied problem, but surprisingly little work has been published on assigning registers for GPU architectures. In this paper we present the register allocator in the production compiler for Intel HD and Iris Graphics. Intel GPUs feature a large byte-addressable register file organized into banks, an expressive instruction set that supports variable SIMD-sizes and divergent control flow, and high spill overhead due to relatively long memory latencies. These distinctive characteristics impose challenges for register allocation, as input programs may have arbitrarily-sized variables, partial updates, and complex control flow. Not only should the allocator make a program spill-free, but it must also reduce the number of register bank conflicts and anti-dependencies. Since compilation occurs in a JIT environment, the allocator also needs to incur little overhead.

To manage compilation overhead, our register allocation framework adopts a hybrid approach that separates the assignment of local and global variables. Several extensions are introduced to the traditional graph-coloring algorithm to support variables with different sizes and to accurately model liveness under divergent branches. Different assignment polices are applied to exploit the trade-offs between minimizing register usage and avoiding bank conflicts and anti-dependencies. Experimental results show our framework produces very few spilling kernels and can improve RA JIT time by up to 4x over pure graph-coloring. Our round-robin and bank-conflict-reduction assignment policies can also achieve up to 20% runtime improvements.

References

2K Games. 2013. Bioshock Infinite. (2013). https://bioshockinfinite.ghoststorygames.com/.Google Scholar
Advanced Micro Devices. 2016. AMD GCN3 ISA Architecture Manual. (2016). http://gpuopen.com/compute-product/amd-gcn3-isa-architecture-manual/.Google Scholar
Apple. 2017. Apple Metal 2. (2017). https://developer.apple.com/metal/.Google Scholar
Blizzard Entertainment. 2016. World of Warcraft: Legion. (2016). https://worldofwarcraft.com/.Google Scholar
David Blythe. 2006. The direct3d 10 system. In ACM Transactions on Graphics (TOG), Vol. 25. ACM, 724-734. Google ScholarDigital Library
Matthias Braun and Sebastian Hack. 2009. Register Spilling and Live-Range Splitting for SSA-Form Programs. In Proceedings of the 18th International Conference on Compiler Construction: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009 (CC '09). Springer-Verlag, Berlin, Heidelberg, 174-189. Google ScholarDigital Library
Preston Briggs, Keith D Cooper, and Linda Torczon. 1994. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 16, 3 (1994), 428-455. Google ScholarDigital Library
John Cavazos, J. Eliot B. Moss, and Michael F. P. O'Boyle. 2006. Hybrid Optimizations: Which Optimization Algorithm to Use?. In Proceedings of the 15th International Conference on Compiler Construction (CC'06). Springer-Verlag, Berlin, Heidelberg, 124-138. Google ScholarDigital Library
Gregory Chaitin. 2004. Register allocation and spilling via graph coloring. Acm Sigplan Notices 39, 4 (2004), 66-74. Google ScholarDigital Library
Fred C Chow and John L Hennessy. 1990. The priority-based coloring approach to register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 12, 4 (1990), 501-536. Google ScholarDigital Library
Quentin Colombet, Benoit Boissinot, Philip Brisk, Sebastian Hack, and Fabrice Rastello. 2011. Graph-coloring and Treescan Register Allocation Using Repairing. In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES '11). ACM, New York, NY, USA, 45-54. Google ScholarDigital Library
Gregory Diamos, Benjamin Ashbaugh, Subramaniam Maiyuran, Andrew Kerr, Haicheng Wu, and Sudhakar Yalamanchili. 2011. SIMD re-convergence at thread frontiers. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 477-488. Google ScholarDigital Library
Ahmed ElTantawy, Jessica Wenjie Ma, Mike O'Connor, and Tor M Aamodt. 2014. A scalable multi-path microarchitecture for efficient GPU control flow. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on. IEEE, 248-259.Google ScholarCross Ref
Wilson WL Fung, Ivan Sham, George Yuan, and Tor M Aamodt. 2007. Dynamic warp formation and scheduling for efficient GPU control flow. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 407-420. Google ScholarDigital Library
Futuremark. 2017. 3DMark. (2017). https://www.futuremark.com/benchmarks/3dmark.Google Scholar
Futuremark. 2017. VRMark - the virtual reality benchmark. (2017). https://www.futuremark.com/benchmarks/vrmark.Google Scholar
James R Goodman and W-C Hsu. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the 2nd international conference on Supercomputing. ACM, 442-452. Google ScholarDigital Library
Sebastian Hack, Daniel Grund, and Gerhard Goos. 2006. Register allocation for programs in SSA-form. CC 6 (2006), 247-262. Google ScholarDigital Library
Tianyi David Han and Tarek S Abdelrahman. 2011. Reducing branch divergence in GPU programs. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units. ACM, 3. Google ScholarDigital Library
Ari B Hayes and Eddy Z Zhang. 2014. Unified on-chip memory allocation for SIMT architecture. In Proceedings of the 28th ACM international conference on Supercomputing. ACM, 293-302. Google ScholarDigital Library
Intel 2017. Intel Graphics for Linux documentation. (2017). https://01.org/linuxgraphics/documentation.Google Scholar
Intel 2017. Intel Processor Graphics. (2017). https://software.intel.com/en-us/articles/intel-graphics-developers-guides.Google Scholar
Chi keung Luk. 2015. MICRO48-Tutorial on Intel Processor Graphics: Architecture and Programming. (2015). https://software.intel.com/en-us/blogs/2015/08/27/micro48-tutorial-on-intel-processor-graphics-architecture-and-programming.Google Scholar
Kishonti Ltd. 2017. CompuBench - performance benchmark for various compute APIs. (2017). https://compubench.com/.Google Scholar
Kishonti Ltd. 2017. GFXBench. (2017). https://gfxbench.com/benchmark.jsp.Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097-1105. Google ScholarDigital Library
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO '04). IEEE Computer Society, Washington, DC, USA, 75-. http://dl.acm.org/citation.cfm?id=977395.977673 Google ScholarDigital Library
Guei-Yuan Lueh, Thomas Gross, and Ali-Reza Adl-Tabatabai. 2000. Fusion-based Register Allocation. ACM Trans. Program. Lang. Syst. 22, 3 (May 2000), 431-470. Google ScholarDigital Library
Sparsh Mittal. 2017. A survey of techniques for architecting and managing GPU register file. IEEE Transactions on Parallel and Distributed Systems 28, 1 (2017), 16-28. Google ScholarDigital Library
Rajeev Motwani, Krishna V Palem, Vivek Sarkar, and Salem Reyen. 1995. Combining register allocation and instruction scheduling. Courant Institute, New York University (1995).Google Scholar
Brian R Nickerson. 1990. Graph coloring register allocation for processors with multi-register operands. In ACM SIGPLAN Notices, Vol. 25. ACM, 40-52. Google ScholarDigital Library
Nvidia 2016. Pascal Architecture Whitepaper. (2016). https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf.Google Scholar
Fernando Magno Quintao Pereira and Jens Palsberg. 2005. Register allocation via coloring of chordal graphs. In APLAS, Vol. 5. Springer, 315-329. Google ScholarDigital Library
Shlomit S Pinter. 1993. Register allocation with instruction scheduling. In ACM SIGPLAN Notices, Vol. 28. ACM, 248-257. Google ScholarDigital Library
Massimiliano Poletto and Vivek Sarkar. 1999. Linear scan register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 5 (1999), 895-913. Google ScholarDigital Library
Fernando Magno Quintão Pereira and Jens Palsberg. 2008. Register Allocation by Puzzle Solving. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08). ACM, New York, NY, USA, 216-226. Google ScholarDigital Library
Minsoo Rhu and Mattan Erez. 2013. The dual-path execution model for efficient GPU control flow. In High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on. IEEE, 591-602. Google ScholarDigital Library
Johan Runeson and Sven-Olof Nyström. 2003. Retargetable graph-coloring register allocation for irregular architectures. In SCOPES. Springer, 240-254.Google Scholar
Diogo Sampaio, Rafael Martins de Souza, Sylvain Collange, and Fernando Magno Quintão Pereira. 2014. Divergence Analysis. ACM Trans. Program. Lang. Syst. 35, 4, Article 13 (Jan. 2014), 36 pages.Google Scholar
Dave Shreiner, Graham Sellers, John Kessenich, and Bill Licea-Kane. 2013. OpenGL programming guide: The Official guide to learning OpenGL, version 4.3. Addison-Wesley. Google ScholarDigital Library
SiSoftware. 2016. Sandra Platinum. (2016). http://www.sisoftware.net.Google Scholar
Michael D. Smith, Norman Ramsey, and Glenn Holloway. 2004. A Generalized Algorithm for Graph-coloring Register Allocation. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI '04). ACM, New York, NY, USA, 277-288. Google ScholarDigital Library
John E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 3 (2010), 66-73. Google ScholarDigital Library
The Khronos Group. 2016. Vulkan Overview. (2016). https://www.khronos.org/vulkan/.Google Scholar
Xiaolong Xie, Yun Liang, Xiuhong Li, Yudong Wu, Guangyu Sun, Tao Wang, and Dongrui Fan. 2015. Enabling coordinated register allocation and thread-level parallelism optimization for GPUs. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 395-406. Google ScholarDigital Library

Index Terms

Register allocation for Intel processor graphics
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Just-in-time compilers

Recommendations

Differential register allocation
PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

Micro-architecture designers are very cautious about expanding the number of architected registers (also the register field), because increasing the register field adds to the code size, raises I-cache and memory pressure, complicates processor ...
Read More
Differential register allocation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

Micro-architecture designers are very cautious about expanding the number of architected registers (also the register field), because increasing the register field adds to the code size, raises I-cache and memory pressure, complicates processor ...
Read More
Bytewise Register Allocation
SCOPES '15: Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems

Traditionally, variables have been considered as atoms by register allocation: Each variable was to be placed in one register, or spilt (placed in main memory) or rematerialized (recalculated as needed). Some flexibility arose from what would be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization
February 2018
377 pages
ISBN:9781450356176
DOI:10.1145/3179541
General Chairs:
Jens Knoop
Vienna University of Technology, Austria
,
Markus Schordan
Lawrence Livermore National Laboratory, USA
,
Program Chairs:
Teresa Johnson
Google, USA
,
Michael O'Boyle
University of Edinburgh, UK
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 February 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU
JIT compilers
Register Allocation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate312of1,061submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 737
  Total Downloads
- Downloads (Last 12 months)61
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Register allocation for Intel processor graphics

CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization

ABSTRACT

References

Cited By

Index Terms

Recommendations

Differential register allocation

Differential register allocation

Bytewise Register Allocation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Register allocation for Intel processor graphics

CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization

ABSTRACT

References

Cited By

Index Terms

Recommendations

Differential register allocation

Differential register allocation

Bytewise Register Allocation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media