research-article

Integrating GPU support for OpenMP offloading directives into Clang

Authors:
Carlo Bertolli

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Samuel F. Antao

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Gheorghe-Teodor Bercea

IBM T.J. Watson Research Center, Yorktown Heights, NY and Imperial College London, United Kingdom

IBM T.J. Watson Research Center, Yorktown Heights, NY and Imperial College London, United Kingdom
View Profile

,
Arpith C. Jacob

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Alexandre E. Eichenberger

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Tong Chen

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Zehra Sura

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Hyojin Sung

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Georgios Rokos

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
David Appelhans

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Kevin O'Brien

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPCNovember 2015Article No.: 5Pages 1–11https://doi.org/10.1145/2833157.2833161

Published:15 November 2015Publication History

LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC

Pages 1–11

ABSTRACT

The LLVM community is currently developing OpenMP 4.1 support, consisting of software improvements for Clang and new runtime libraries. OpenMP 4.1 includes offloading constructs that permit execution of user selected regions on generic devices, external to the main host processor. This paper describes our ongoing work towards delivering support for OpenMP offloading constructs for the OpenPower system into the LLVM compiler infrastructure. We previously introduced a design for a control loop scheme necessary to implement the OpenMP generic offloading model on NVIDIA GPUs. In this paper we show how we integrated the complexity of the control loop into Clang by limiting its support to OpenMP-related functionality. We also synthetically report the results of performance analysis on benchmarks and a complex application kernel. We show an optimization in the Clang code generation scheme for specific code patterns, alternative to the control loop, which delivers improved performance.

References

A. Baker. Custom hardware state-machines and datapaths: Using llvm to generate fpga accelerators, October 2014. http://llvm.org/devmtg/2014-10/Slides/Baker-CustomHardwareStateMachines.pdf.Google Scholar
J. Barker and J. Bowden. Manycore parallelism through openmp. In A. P. Rendell, B. M. Chapman, and M. S. Muller, editors, OpenMP in the Era of Low Power Devices and Accelerators, volume 8122 of Lecture Notes in Computer Science, pages 45--57. Springer Berlin Heidelberg, 2013.Google Scholar
M. M. Baskaran, J. Ramanujam, and P. Sadayappan. Automatic c-to-cuda code generation for affine programs. In Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction, CC'10/ETAPS'10, pages 244--263, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
A. Bataev. Openmp support in clang/llvm: Status update and future directions, October 2014. http://llvm.org/devmtg/2014-10/Slides/Bataev-OpenMP.pdf.Google Scholar
G.-T. Bercea, C. Bertolli, S. F. Antao, A. C. Jacob, A. E. Eichenberger, L. Duran, T. Chen, Z. Sura, H. Sung, G. Rokos, D. Appelhans, and K. O'Brien. Performance analysis of openmp on a gpu using a coral proxy application. In Submitted to 6th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS15), 2015. Google ScholarDigital Library
C. Bertolli, S. F. Antao, A. E. Eichenberger, K. O'Brien, Z. Sura, A. C. Jacob, T. Chen, and O. Sallenave. Coordinating gpu threads for openmp 4.0 in llvm. In Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, LLVM-HPC '14, pages 12--21, Piscataway, NJ, USA, 2014. IEEE Press. Google ScholarDigital Library
G. Brown. Implementing the sycl for opencl shared source c++ programming model using clang/llvm. https://www.codeplay.com/public/uploaded/publications/SC2014_LLVM_HPC.pdf.Google Scholar
Github repository for extended clang implementation supporting openmp 4.0. https://github.com/clang-omp/clang_trunk.Google Scholar
Coral award announcement. http://energy.gov/articles/department-energy-awards-425-million-next-generation-supercomputing-technologies.Google Scholar
Cuda toolkit webpage. http://docs.nvidia.com/cuda/index.html.Google Scholar
M. Haidl and S. Gorlatch. Pacxx: Towards a unified programming model for programming accelerators using c++14. In Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, LLVM-HPC '14, pages 1--11, Piscataway, NJ, USA, 2014. IEEE Press. Google ScholarDigital Library
Nvidia libnvvm library manual. http://docs.nvidia.com/cuda/libnvvm-api/modules.htm.Google Scholar
The llvm compiler infrastructure webpage. http://llvm.org/.Google Scholar
Llvm backend component for nvptx archietecture (nvidia gpus). http://llvm.org/docs/NVPTXUsage.html.Google Scholar
Lulesh webpage. https://codesign.llnl.gov/lulesh.php.Google Scholar
Github repository for libomptarget offloading and gpu openmp runtime. https://github.com/clang-omp/libomptarget.Google Scholar
OpenMP Language Committee. OpenMP Application Program Interface, version 4.0 edition, July 2013. http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf.Google Scholar
E. Stotzer, A. Jayaraj, M. Ali, A. Friedmann, G. Mitra, A. Rendell, and I. Lintault. Openmp on the low-power ti keystone ii arm/dsp system-on-chip. In A. P. Rendell, B. M. Chapman, and M. S. Muller, editors, OpenMP in the Era of Low Power Devices and Accelerators, volume 8122 of Lecture Notes in Computer Science, pages 114--127. Springer Berlin Heidelberg, 2013.Google Scholar
Vikas, T. Scott, N. Giacaman, and O. Sinnen. Using openmp under android. In A. P. Rendell, B. M. Chapman, and M. S. Muller, editors, OpenMP in the Era of Low Power Devices and Accelerators, volume 8122 of Lecture Notes in Computer Science, pages 15--29. Springer Berlin Heidelberg, 2013.Google Scholar
U. Weigand. Supporting the new ibm z13 mainframe and its simd vector unit, April 2015. http://llvm.org/devmtg/2015-04/slides/Euro-LLVM-2015-Weigand.pdf.Google Scholar

Index Terms

Integrating GPU support for OpenMP offloading directives into Clang
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Communications management
        Message passing

Recommendations

Remote OpenMP offloading
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

OpenMP has a long and successful history in parallel programming for CPUs, and more recently GPUs through accelerator offloading.

In this work we show that the OpenMP accelerator offloading model is sufficient to seamlessly and efficiently utilize more ...
Read More
Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading
LLVM-HPC'17: Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC

The latest OpenMP standard offers automatic device offloading capabilities which facilitate GPU programming. Despite this, there remain many challenges. One of these is the unified memory feature introduced in recent GPUs. GPUs in current and future HPC ...
Read More
Performance analysis and optimization of Clang's OpenMP 4.5 GPU support
PMBS '16: Proceedings of the 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems

The Clang implementation of OpenMP^® 4.5 now provides full support for the specification, offering the only open source option for targeting NVIDIA^® GPUs. While using OpenMP allows portability across different architectures, matching native CUDA^® ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC
November 2015
74 pages
ISBN:9781450340052
DOI:10.1145/2833157
Conference Chair:
Hal Finkel
Argonne National Laboratory
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 November 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
LLVM '15 Paper Acceptance Rate7of12submissions,58%Overall Acceptance Rate16of22submissions,73%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 456
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Integrating GPU support for OpenMP offloading directives into Clang

LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC

ABSTRACT

References

Cited By

Index Terms

Recommendations

Remote OpenMP offloading

Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading

Performance analysis and optimization of Clang's OpenMP 4.5 GPU support

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Integrating GPU support for OpenMP offloading directives into Clang

LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC

ABSTRACT

References

Cited By

Index Terms

Recommendations

Remote OpenMP offloading

Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading

Performance analysis and optimization of Clang's OpenMP 4.5 GPU support

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media