skip to main content
10.1145/3168806acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Register allocation for Intel processor graphics

Published:24 February 2018Publication History

ABSTRACT

Register allocation is a well-studied problem, but surprisingly little work has been published on assigning registers for GPU architectures. In this paper we present the register allocator in the production compiler for Intel HD and Iris Graphics. Intel GPUs feature a large byte-addressable register file organized into banks, an expressive instruction set that supports variable SIMD-sizes and divergent control flow, and high spill overhead due to relatively long memory latencies. These distinctive characteristics impose challenges for register allocation, as input programs may have arbitrarily-sized variables, partial updates, and complex control flow. Not only should the allocator make a program spill-free, but it must also reduce the number of register bank conflicts and anti-dependencies. Since compilation occurs in a JIT environment, the allocator also needs to incur little overhead.

To manage compilation overhead, our register allocation framework adopts a hybrid approach that separates the assignment of local and global variables. Several extensions are introduced to the traditional graph-coloring algorithm to support variables with different sizes and to accurately model liveness under divergent branches. Different assignment polices are applied to exploit the trade-offs between minimizing register usage and avoiding bank conflicts and anti-dependencies. Experimental results show our framework produces very few spilling kernels and can improve RA JIT time by up to 4x over pure graph-coloring. Our round-robin and bank-conflict-reduction assignment policies can also achieve up to 20% runtime improvements.

References

  1. 2K Games. 2013. Bioshock Infinite. (2013). https://bioshockinfinite.ghoststorygames.com/.Google ScholarGoogle Scholar
  2. Advanced Micro Devices. 2016. AMD GCN3 ISA Architecture Manual. (2016). http://gpuopen.com/compute-product/amd-gcn3-isa-architecture-manual/.Google ScholarGoogle Scholar
  3. Apple. 2017. Apple Metal 2. (2017). https://developer.apple.com/metal/.Google ScholarGoogle Scholar
  4. Blizzard Entertainment. 2016. World of Warcraft: Legion. (2016). https://worldofwarcraft.com/.Google ScholarGoogle Scholar
  5. David Blythe. 2006. The direct3d 10 system. In ACM Transactions on Graphics (TOG), Vol. 25. ACM, 724-734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Matthias Braun and Sebastian Hack. 2009. Register Spilling and Live-Range Splitting for SSA-Form Programs. In Proceedings of the 18th International Conference on Compiler Construction: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009 (CC '09). Springer-Verlag, Berlin, Heidelberg, 174-189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Preston Briggs, Keith D Cooper, and Linda Torczon. 1994. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 16, 3 (1994), 428-455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. John Cavazos, J. Eliot B. Moss, and Michael F. P. O'Boyle. 2006. Hybrid Optimizations: Which Optimization Algorithm to Use?. In Proceedings of the 15th International Conference on Compiler Construction (CC'06). Springer-Verlag, Berlin, Heidelberg, 124-138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gregory Chaitin. 2004. Register allocation and spilling via graph coloring. Acm Sigplan Notices 39, 4 (2004), 66-74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fred C Chow and John L Hennessy. 1990. The priority-based coloring approach to register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 12, 4 (1990), 501-536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Quentin Colombet, Benoit Boissinot, Philip Brisk, Sebastian Hack, and Fabrice Rastello. 2011. Graph-coloring and Treescan Register Allocation Using Repairing. In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES '11). ACM, New York, NY, USA, 45-54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gregory Diamos, Benjamin Ashbaugh, Subramaniam Maiyuran, Andrew Kerr, Haicheng Wu, and Sudhakar Yalamanchili. 2011. SIMD re-convergence at thread frontiers. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 477-488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ahmed ElTantawy, Jessica Wenjie Ma, Mike O'Connor, and Tor M Aamodt. 2014. A scalable multi-path microarchitecture for efficient GPU control flow. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on. IEEE, 248-259.Google ScholarGoogle ScholarCross RefCross Ref
  14. Wilson WL Fung, Ivan Sham, George Yuan, and Tor M Aamodt. 2007. Dynamic warp formation and scheduling for efficient GPU control flow. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 407-420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Futuremark. 2017. 3DMark. (2017). https://www.futuremark.com/benchmarks/3dmark.Google ScholarGoogle Scholar
  16. Futuremark. 2017. VRMark - the virtual reality benchmark. (2017). https://www.futuremark.com/benchmarks/vrmark.Google ScholarGoogle Scholar
  17. James R Goodman and W-C Hsu. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the 2nd international conference on Supercomputing. ACM, 442-452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sebastian Hack, Daniel Grund, and Gerhard Goos. 2006. Register allocation for programs in SSA-form. CC 6 (2006), 247-262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tianyi David Han and Tarek S Abdelrahman. 2011. Reducing branch divergence in GPU programs. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units. ACM, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ari B Hayes and Eddy Z Zhang. 2014. Unified on-chip memory allocation for SIMT architecture. In Proceedings of the 28th ACM international conference on Supercomputing. ACM, 293-302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Intel 2017. Intel Graphics for Linux documentation. (2017). https://01.org/linuxgraphics/documentation.Google ScholarGoogle Scholar
  22. Intel 2017. Intel Processor Graphics. (2017). https://software.intel.com/en-us/articles/intel-graphics-developers-guides.Google ScholarGoogle Scholar
  23. Chi keung Luk. 2015. MICRO48-Tutorial on Intel Processor Graphics: Architecture and Programming. (2015). https://software.intel.com/en-us/blogs/2015/08/27/micro48-tutorial-on-intel-processor-graphics-architecture-and-programming.Google ScholarGoogle Scholar
  24. Kishonti Ltd. 2017. CompuBench - performance benchmark for various compute APIs. (2017). https://compubench.com/.Google ScholarGoogle Scholar
  25. Kishonti Ltd. 2017. GFXBench. (2017). https://gfxbench.com/benchmark.jsp.Google ScholarGoogle Scholar
  26. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097-1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO '04). IEEE Computer Society, Washington, DC, USA, 75-. http://dl.acm.org/citation.cfm?id=977395.977673 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Guei-Yuan Lueh, Thomas Gross, and Ali-Reza Adl-Tabatabai. 2000. Fusion-based Register Allocation. ACM Trans. Program. Lang. Syst. 22, 3 (May 2000), 431-470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sparsh Mittal. 2017. A survey of techniques for architecting and managing GPU register file. IEEE Transactions on Parallel and Distributed Systems 28, 1 (2017), 16-28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rajeev Motwani, Krishna V Palem, Vivek Sarkar, and Salem Reyen. 1995. Combining register allocation and instruction scheduling. Courant Institute, New York University (1995).Google ScholarGoogle Scholar
  31. Brian R Nickerson. 1990. Graph coloring register allocation for processors with multi-register operands. In ACM SIGPLAN Notices, Vol. 25. ACM, 40-52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Nvidia 2016. Pascal Architecture Whitepaper. (2016). https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf.Google ScholarGoogle Scholar
  33. Fernando Magno Quintao Pereira and Jens Palsberg. 2005. Register allocation via coloring of chordal graphs. In APLAS, Vol. 5. Springer, 315-329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Shlomit S Pinter. 1993. Register allocation with instruction scheduling. In ACM SIGPLAN Notices, Vol. 28. ACM, 248-257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Massimiliano Poletto and Vivek Sarkar. 1999. Linear scan register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 5 (1999), 895-913. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Fernando Magno Quintão Pereira and Jens Palsberg. 2008. Register Allocation by Puzzle Solving. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08). ACM, New York, NY, USA, 216-226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Minsoo Rhu and Mattan Erez. 2013. The dual-path execution model for efficient GPU control flow. In High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on. IEEE, 591-602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Johan Runeson and Sven-Olof Nyström. 2003. Retargetable graph-coloring register allocation for irregular architectures. In SCOPES. Springer, 240-254.Google ScholarGoogle Scholar
  39. Diogo Sampaio, Rafael Martins de Souza, Sylvain Collange, and Fernando Magno Quintão Pereira. 2014. Divergence Analysis. ACM Trans. Program. Lang. Syst. 35, 4, Article 13 (Jan. 2014), 36 pages.Google ScholarGoogle Scholar
  40. Dave Shreiner, Graham Sellers, John Kessenich, and Bill Licea-Kane. 2013. OpenGL programming guide: The Official guide to learning OpenGL, version 4.3. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. SiSoftware. 2016. Sandra Platinum. (2016). http://www.sisoftware.net.Google ScholarGoogle Scholar
  42. Michael D. Smith, Norman Ramsey, and Glenn Holloway. 2004. A Generalized Algorithm for Graph-coloring Register Allocation. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI '04). ACM, New York, NY, USA, 277-288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. John E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 3 (2010), 66-73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. The Khronos Group. 2016. Vulkan Overview. (2016). https://www.khronos.org/vulkan/.Google ScholarGoogle Scholar
  45. Xiaolong Xie, Yun Liang, Xiuhong Li, Yudong Wu, Guangyu Sun, Tao Wang, and Dongrui Fan. 2015. Enabling coordinated register allocation and thread-level parallelism optimization for GPUs. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 395-406. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Register allocation for Intel processor graphics

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization
      February 2018
      377 pages
      ISBN:9781450356176
      DOI:10.1145/3179541

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 February 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate312of1,061submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader