skip to main content
10.1145/3148173.3148189acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Implementing implicit OpenMP data sharing on GPUs

Authors Info & Claims
Published:12 November 2017Publication History

ABSTRACT

OpenMP is a shared memory programming model which supports the offloading of target regions to accelerators such as NVIDIA GPUs. The implementation in Clang/LLVM aims to deliver a generic GPU compilation toolchain that supports both the native CUDA C/C++ and the OpenMP device offloading models. There are situations where the semantics of OpenMP and those of CUDA diverge. One such example is the policy for implicitly handling local variables. In CUDA, local variables are implicitly mapped to thread local memory and thus become private to a CUDA thread. In OpenMP, due to semantics that allow the nesting of regions executed by different numbers of threads, variables need to be implicitly shared among the threads of a contention group.

In this paper we introduce a re-design of the OpenMP device data sharing infrastructure that is responsible for the implicit sharing of local variables in the Clang/LLVM toolchain. We introduce a new data sharing infrastructure that lowers implicitly shared variables to the shared memory of the GPU.

We measure the amount of shared memory used by our scheme in cases that involve scalar variables and statically allocated arrays. The evaluation is carried out by offloading to K40 and P100 NVIDIA GPUs. For scalar variables the pressure on shared memory is relatively low, under 26% of shared memory utilization for the K40, and does not negatively impact occupancy. The limiting occupancy factor in that case is register pressure. The data sharing scheme offers the users a simple memory model for controlling the implicit allocation of device shared memory.

References

  1. Samuel F. Antao, Alexey Bataev, Arpith C. Jacob, Gheorghe-Teodor Bercea, Alexandre E. Eichenberger, Georgios Rokos, Matt Martineau, Tian Jin, Guray Ozen, Zehra Sura, Tong Chen, Hyojin Sung, Carlo Bertolli, and Kevin O'Brien. 2016. Offloading Support for OpenMP in Clang and LLVM. In Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC (LLVM-HPC '16). IEEE Press, Piscataway, NJ, USA, 1--11. https://doi.org/10.1109/LLVM-HPC.2016.6 Google ScholarGoogle ScholarCross RefCross Ref
  2. Gheorghe-Teodor Bercea, Carlo Bertolli, Samuel F. Antao, Arpith C. Jacob, Alexandre E. Eichenberger, Tong Chen, Zehra Sura, Hyojin Sung, Georgios Rokos, David Appelhans, and Kevin O'Brien. 2015. Performance Analysis of OpenMP on a GPU Using a CORAL Proxy Application. In Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems (PMBS '15). ACM, New York, NY, USA, Article 2, 11 pages. https://doi.org/10.1145/2832087.2832089Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Carlo Bertolli, Samuel F. Antao, Gheorghe-Teodor Bercea, Arpith C. Jacob, Alexandre E. Eichenberger, Tong Chen, Zehra Sura, Hyojin Sung, Georgios Rokos, David Appelhans, and Kevin O'Brien. 2015. Integrating GPU Support for OpenMP Offloading Directives into Clang. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC (LLVM '15). ACM, New York, NY, USA, Article 5, 11 pages. https://doi.org/10.1145/2833157.2833161Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Carlo Bertolli, Samuel F. Antao, Alexandre E. Eichenberger, Kevin O'Brien, Zehra Sura, Arpith C. Jacob, Tong Chen, and Olivier Sallenave. 2014. Coordinating GPU Threads for OpenMP 4.0 in LLVM. In Proceedings of the 2014 LLVM Compiler Infrastructure in HPC (LLVM-HPC '14). IEEE Press, Piscataway, NJ, USA, 12--21. https://doi.org/10.1109/LLVM-HPC.2014.10 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arpith C. Jacob, Alexandre E. Eichenberger, Hyojin Sung, Samuel F. Antao, Gheorghe-Teodor Bercea, Carlo Bertolli, Alexey Bataev, Tian Jin, Tong Chen, Zehra Sura, Georgios Rokos, and Kevin O'Brien. [n. d.]. Efficient Fork-Join on GPUs through Warp Specialization. To be published at the IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2017) ([n. d.]).Google ScholarGoogle Scholar
  6. M. Martineau, S. McIntosh-Smith, C. Bertolli, A. C. Jacob, S. F. Antao, A. Eichenberger, G. T. Bercea, T. Chen, T. Jin, K. O'Brien, G. Rokos, H. Sung, and Z. Sura. 2016. Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support. In 2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 54--64. https://doi.org/10.1109/PMBS.2016.011Google ScholarGoogle Scholar
  7. All members of the OpenMP Language Working Group. 2017. OpenMP Technical Report 4: Version 5.0 Preview 1. Technical Report. The OpenMP ARB.Google ScholarGoogle Scholar
  8. Eric Stotzer, Ajay Jayaraj, Murtaza Ali, Arnon Friedmann, Gaurav Mitra, Alistair P. Rendell, and Ian Lintault. 2013. OpenMP on the Low-Power TI Keystone II ARM/DSP System-on-Chip. Springer Berlin Heidelberg, Berlin, Heidelberg, 114--127. https://doi.org/10.1007/978-3-642-40698-0_9Google ScholarGoogle Scholar
  9. Yi Yang and Huiyang Zhou. 2014. CUDA-NP: Realizing Nested Thread-level Parallelism in GPGPU Applications. SIGPLAN Not. 49, 8 (Feb. 2014), 93--106. https://doi.org/10.1145/2692916.2555254 Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    LLVM-HPC'17: Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC
    November 2017
    106 pages
    ISBN:9781450355650
    DOI:10.1145/3148173

    Copyright © 2017 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 12 November 2017

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    LLVM-HPC'17 Paper Acceptance Rate9of10submissions,90%Overall Acceptance Rate16of22submissions,73%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader