ABSTRACT
The LLVM community is currently developing OpenMP 4.1 support, consisting of software improvements for Clang and new runtime libraries. OpenMP 4.1 includes offloading constructs that permit execution of user selected regions on generic devices, external to the main host processor. This paper describes our ongoing work towards delivering support for OpenMP offloading constructs for the OpenPower system into the LLVM compiler infrastructure. We previously introduced a design for a control loop scheme necessary to implement the OpenMP generic offloading model on NVIDIA GPUs. In this paper we show how we integrated the complexity of the control loop into Clang by limiting its support to OpenMP-related functionality. We also synthetically report the results of performance analysis on benchmarks and a complex application kernel. We show an optimization in the Clang code generation scheme for specific code patterns, alternative to the control loop, which delivers improved performance.
- A. Baker. Custom hardware state-machines and datapaths: Using llvm to generate fpga accelerators, October 2014. http://llvm.org/devmtg/2014-10/Slides/Baker-CustomHardwareStateMachines.pdf.Google Scholar
- J. Barker and J. Bowden. Manycore parallelism through openmp. In A. P. Rendell, B. M. Chapman, and M. S. Muller, editors, OpenMP in the Era of Low Power Devices and Accelerators, volume 8122 of Lecture Notes in Computer Science, pages 45--57. Springer Berlin Heidelberg, 2013.Google Scholar
- M. M. Baskaran, J. Ramanujam, and P. Sadayappan. Automatic c-to-cuda code generation for affine programs. In Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction, CC'10/ETAPS'10, pages 244--263, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
- A. Bataev. Openmp support in clang/llvm: Status update and future directions, October 2014. http://llvm.org/devmtg/2014-10/Slides/Bataev-OpenMP.pdf.Google Scholar
- G.-T. Bercea, C. Bertolli, S. F. Antao, A. C. Jacob, A. E. Eichenberger, L. Duran, T. Chen, Z. Sura, H. Sung, G. Rokos, D. Appelhans, and K. O'Brien. Performance analysis of openmp on a gpu using a coral proxy application. In Submitted to 6th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS15), 2015. Google ScholarDigital Library
- C. Bertolli, S. F. Antao, A. E. Eichenberger, K. O'Brien, Z. Sura, A. C. Jacob, T. Chen, and O. Sallenave. Coordinating gpu threads for openmp 4.0 in llvm. In Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, LLVM-HPC '14, pages 12--21, Piscataway, NJ, USA, 2014. IEEE Press. Google ScholarDigital Library
- G. Brown. Implementing the sycl for opencl shared source c++ programming model using clang/llvm. https://www.codeplay.com/public/uploaded/publications/SC2014_LLVM_HPC.pdf.Google Scholar
- Github repository for extended clang implementation supporting openmp 4.0. https://github.com/clang-omp/clang_trunk.Google Scholar
- Coral award announcement. http://energy.gov/articles/department-energy-awards-425-million-next-generation-supercomputing-technologies.Google Scholar
- Cuda toolkit webpage. http://docs.nvidia.com/cuda/index.html.Google Scholar
- M. Haidl and S. Gorlatch. Pacxx: Towards a unified programming model for programming accelerators using c++14. In Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, LLVM-HPC '14, pages 1--11, Piscataway, NJ, USA, 2014. IEEE Press. Google ScholarDigital Library
- Nvidia libnvvm library manual. http://docs.nvidia.com/cuda/libnvvm-api/modules.htm.Google Scholar
- The llvm compiler infrastructure webpage. http://llvm.org/.Google Scholar
- Llvm backend component for nvptx archietecture (nvidia gpus). http://llvm.org/docs/NVPTXUsage.html.Google Scholar
- Lulesh webpage. https://codesign.llnl.gov/lulesh.php.Google Scholar
- Github repository for libomptarget offloading and gpu openmp runtime. https://github.com/clang-omp/libomptarget.Google Scholar
- OpenMP Language Committee. OpenMP Application Program Interface, version 4.0 edition, July 2013. http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf.Google Scholar
- E. Stotzer, A. Jayaraj, M. Ali, A. Friedmann, G. Mitra, A. Rendell, and I. Lintault. Openmp on the low-power ti keystone ii arm/dsp system-on-chip. In A. P. Rendell, B. M. Chapman, and M. S. Muller, editors, OpenMP in the Era of Low Power Devices and Accelerators, volume 8122 of Lecture Notes in Computer Science, pages 114--127. Springer Berlin Heidelberg, 2013.Google Scholar
- Vikas, T. Scott, N. Giacaman, and O. Sinnen. Using openmp under android. In A. P. Rendell, B. M. Chapman, and M. S. Muller, editors, OpenMP in the Era of Low Power Devices and Accelerators, volume 8122 of Lecture Notes in Computer Science, pages 15--29. Springer Berlin Heidelberg, 2013.Google Scholar
- U. Weigand. Supporting the new ibm z13 mainframe and its simd vector unit, April 2015. http://llvm.org/devmtg/2015-04/slides/Euro-LLVM-2015-Weigand.pdf.Google Scholar
Index Terms
- Integrating GPU support for OpenMP offloading directives into Clang
Recommendations
Remote OpenMP offloading
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingOpenMP has a long and successful history in parallel programming for CPUs, and more recently GPUs through accelerator offloading.
In this work we show that the OpenMP accelerator offloading model is sufficient to seamlessly and efficiently utilize more ...
Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading
LLVM-HPC'17: Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPCThe latest OpenMP standard offers automatic device offloading capabilities which facilitate GPU programming. Despite this, there remain many challenges. One of these is the unified memory feature introduced in recent GPUs. GPUs in current and future HPC ...
Performance analysis and optimization of Clang's OpenMP 4.5 GPU support
PMBS '16: Proceedings of the 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing SystemsThe Clang implementation of OpenMP® 4.5 now provides full support for the specification, offering the only open source option for targeting NVIDIA® GPUs. While using OpenMP allows portability across different architectures, matching native CUDA® ...
Comments