research-article

Supporting multiple accelerators in high-level programming models

Authors:
Yonghong Yan

Oakland University

Oakland University
View Profile

,
Pei-Hung Lin

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Chunhua Liao

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Bronis R. de Supinski

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Daniel J. Quinlan

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and ManycoresFebruary 2015Pages 170–180https://doi.org/10.1145/2712386.2712405

Published:07 February 2015Publication History

PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores

Pages 170–180

ABSTRACT

Computational accelerators, such as manycore NVIDIA GPUs, Intel Xeon Phi and FPGAs, are becoming common in work-stations, servers and supercomputers for scientific and engineering applications. Efficiently exploiting the massive parallelism these accelerators provide requires the designs and implementations of productive programming models.

In this paper, we explore support of multiple accelerators in high-level programming models. We design novel language extensions to OpenMP to support offloading data and computation regions to multiple accelerators (devices). These extensions allow for distributing data and computation among a list of devices via easy-to-use interfaces, including specifying the distribution of multi-dimensional arrays and declaring shared data regions among accelerators. Computation distribution is realized by partitioning a loop iteration space among accelerators. We implement mechanisms to marshal/unmarshal and to move data of non-contiguous array subregions and shared regions between accelerators without involving CPUs. We design reduction techniques that work across multiple accelerators. Combined compiler and runtime support is designed to manage multiple GPUs using asynchronous operations and threading mechanisms. We implement our solutions for NVIDIA GPUs and demonstrate through example OpenMP codes the effectiveness of our solutions for performance improvement.

References

Global Arrays Toolkit. http://http://hpc.pnl.gov/globalarrays.Google Scholar
OpenACC: Directives for Accelerators. http://www.openacc-standard.org.Google Scholar
C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, 23(2):187--198, 2011. Google ScholarDigital Library
J. Bueno, J. Planas, A. Duran, R. M. Badia, X. Martorell, E. Ayguade, and J. Labarta. Productive Programming of GPU Clusters with OmpSs. In Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 557--568. IEEE, 2012. Google ScholarDigital Library
J. Enmyren and C. W. Kessler. Skepu: A multi-backend skeleton programming library for multi-gpu systems. In Proceedings of the Fourth International Workshop on High-level Parallel Programming and Applications, HLPP '10, pages 5--14, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
M. P. Forum. Mpi: A message-passing interface standard. Technical report, Knoxville, TN, USA, 1994. Google ScholarDigital Library
T. Gautier, J. V. Lima, N. Maillard, and B. Raffin. Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. Parallel and Distributed Processing Symposium, International, 0: 1299--1308, 2013. Google ScholarDigital Library
A. Hart, R. Ansaloni, and A. Gray. Porting and Scaling OpenACC Applications on Massively-parallel, GPU-accelerated Supercomputers. The European Physical Journal Special Topics, 210(1):5--16, 2012.Google ScholarCross Ref
T. Komada, S. Miwa, H. Nakamura, and N. Maruyama. Integrating Multi-GPU Execution in an OpenACC Compiler. In ICPP '13: Proceedings of the 42nd International Conference on Parallel Processing, pages 260--269, 2013. Google ScholarDigital Library
J. M. Levesque, R. Sankaran, and R. Grout. Hybridizing S3D into an Exascale Application using OpenACC: An approach for moving to multi-petaflops and beyond. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, page 15. IEEE Computer Society Press, 2012. Google ScholarDigital Library
C. Liao, Y. Yan, B. R. de Supinski, D. J. Quinlan, and B. Chapman. Early Experiences with the OpenMP Accelerator Model. In OpenMP in the Era of Low Power Devices and Accelerators (IWOMP'13), pages 84--98. Springer, 2013.Google ScholarCross Ref
T. Lutz, C. Fensch, and M. Cole. Partans: An autotuning framework for stencil computation on multi-gpu systems. ACM Trans. Archit. Code Optim., 9(4): 59:1--59:24, Jan. 2013. Google ScholarDigital Library
OpenMP Architecture Review Board. The OpenMP API Specification for Parallel Programming. http://www.openmp.org/.Google Scholar
G. Quintana-Ortí, F. D. Igual, E. S. Quintana-Ortí, and R. A. van de Geijn. Solving dense linear systems on platforms with multiple hardware accelerators. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '09, pages 121--130, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
R. Reyes, I. López-Rodríguez, J. J. Fumero, and F. de Sande. accULL: An OpenACC Implementation with CUDA and OpenCL Support. In Euro-Par 2012 Parallel Processing, pages 871--882. Springer, 2012. Google ScholarDigital Library
C. Rice University. High performance fortran language specification. SIGPLAN Fortran Forum, 12(4):1--86, Dec. 1993. Google ScholarDigital Library
F. Song and J. Dongarra. A scalable framework for heterogeneous gpu-based clusters. In Proceedings of the Twenty-fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, pages 91--100, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
M. Steuwer and S. Gorlatch. Skelcl: Enhancing opencl for high-level programming of multi-gpu systems. In V. Malyshkin, editor, Parallel Computing Technologies, volume 7979 of Lecture Notes in Computer Science, pages 258--272. Springer Berlin Heidelberg, 2013.Google ScholarDigital Library
X. Tian, R. Xu, Y. Yan, Z. Yun, S. Chandrasekaran, and B. Chapman. Compiling a High-Level Directive-Based Programming Model for GPGPUs, 2013.Google Scholar
R. Xu, S. Chandrasekaran, and B. Chapman. Exploring Programming Multi-GPUs using OpenMP & OpenACC-based Hybrid Model. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, pages 1169--1176. IEEE Computer Society, 2013. Google ScholarDigital Library
R. Xu, X. Tian, Y. Yan, S. Chandrasekaran, and B. Chapman. Reduction operations in parallel loops for gpgpus. In Proceedings of Programming Models and Applications on Multicores and Manycores, PMAM'14, pages 10:10--10:20, New York, NY, USA, 2007. ACM. Google ScholarDigital Library

Index Terms

Supporting multiple accelerators in high-level programming models
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features

Recommendations

Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond
SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis

Hybridization is the process of converting an application with a single level of parallelism to an application with multiple levels of parallelism. Over the past 15 years a majority of the applications that run on High Performance Computing systems have ...
Read More
Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond
SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Hybridization is the process of converting an application with a single level of parallelism to an application with multiple levels of parallelism. Over the past 15 years a majority of the applications that run on High Performance Computing systems have ...
Read More
Petascale computing with accelerators
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

A trend is developing in high performance computing in which commodity processors are coupled to various types of computational accelerators. Such systems are commonly called hybrid systems. In this paper, we describe our experience developing an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores
February 2015
186 pages
ISBN:9781450334044
DOI:10.1145/2712386
Editors:
Pavan Balaji
Argonne National Laboratory
,
Minyi Guo
Shanghai Jiao Tong University, China
,
Zhiyi Huang
University of Otago New Zealand
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 February 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
OpenMP
accelerators
data distribution
directives
implementation
Qualifiers
- research-article
Conference

Acceptance Rates
PMAM '15 Paper Acceptance Rate19of34submissions,56%Overall Acceptance Rate53of97submissions,55%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 239
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Supporting multiple accelerators in high-level programming models

PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond

Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond

Petascale computing with accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Supporting multiple accelerators in high-level programming models

PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond

Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond

Petascale computing with accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media