skip to main content
10.1145/2712386.2712405acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Supporting multiple accelerators in high-level programming models

Published:07 February 2015Publication History

ABSTRACT

Computational accelerators, such as manycore NVIDIA GPUs, Intel Xeon Phi and FPGAs, are becoming common in work-stations, servers and supercomputers for scientific and engineering applications. Efficiently exploiting the massive parallelism these accelerators provide requires the designs and implementations of productive programming models.

In this paper, we explore support of multiple accelerators in high-level programming models. We design novel language extensions to OpenMP to support offloading data and computation regions to multiple accelerators (devices). These extensions allow for distributing data and computation among a list of devices via easy-to-use interfaces, including specifying the distribution of multi-dimensional arrays and declaring shared data regions among accelerators. Computation distribution is realized by partitioning a loop iteration space among accelerators. We implement mechanisms to marshal/unmarshal and to move data of non-contiguous array subregions and shared regions between accelerators without involving CPUs. We design reduction techniques that work across multiple accelerators. Combined compiler and runtime support is designed to manage multiple GPUs using asynchronous operations and threading mechanisms. We implement our solutions for NVIDIA GPUs and demonstrate through example OpenMP codes the effectiveness of our solutions for performance improvement.

References

  1. Global Arrays Toolkit. http://http://hpc.pnl.gov/globalarrays.Google ScholarGoogle Scholar
  2. OpenACC: Directives for Accelerators. http://www.openacc-standard.org.Google ScholarGoogle Scholar
  3. C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, 23(2):187--198, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Bueno, J. Planas, A. Duran, R. M. Badia, X. Martorell, E. Ayguade, and J. Labarta. Productive Programming of GPU Clusters with OmpSs. In Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 557--568. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Enmyren and C. W. Kessler. Skepu: A multi-backend skeleton programming library for multi-gpu systems. In Proceedings of the Fourth International Workshop on High-level Parallel Programming and Applications, HLPP '10, pages 5--14, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. P. Forum. Mpi: A message-passing interface standard. Technical report, Knoxville, TN, USA, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Gautier, J. V. Lima, N. Maillard, and B. Raffin. Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. Parallel and Distributed Processing Symposium, International, 0: 1299--1308, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Hart, R. Ansaloni, and A. Gray. Porting and Scaling OpenACC Applications on Massively-parallel, GPU-accelerated Supercomputers. The European Physical Journal Special Topics, 210(1):5--16, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  9. T. Komada, S. Miwa, H. Nakamura, and N. Maruyama. Integrating Multi-GPU Execution in an OpenACC Compiler. In ICPP '13: Proceedings of the 42nd International Conference on Parallel Processing, pages 260--269, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. M. Levesque, R. Sankaran, and R. Grout. Hybridizing S3D into an Exascale Application using OpenACC: An approach for moving to multi-petaflops and beyond. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, page 15. IEEE Computer Society Press, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Liao, Y. Yan, B. R. de Supinski, D. J. Quinlan, and B. Chapman. Early Experiences with the OpenMP Accelerator Model. In OpenMP in the Era of Low Power Devices and Accelerators (IWOMP'13), pages 84--98. Springer, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  12. T. Lutz, C. Fensch, and M. Cole. Partans: An autotuning framework for stencil computation on multi-gpu systems. ACM Trans. Archit. Code Optim., 9(4): 59:1--59:24, Jan. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. OpenMP Architecture Review Board. The OpenMP API Specification for Parallel Programming. http://www.openmp.org/.Google ScholarGoogle Scholar
  14. G. Quintana-Ortí, F. D. Igual, E. S. Quintana-Ortí, and R. A. van de Geijn. Solving dense linear systems on platforms with multiple hardware accelerators. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '09, pages 121--130, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Reyes, I. López-Rodríguez, J. J. Fumero, and F. de Sande. accULL: An OpenACC Implementation with CUDA and OpenCL Support. In Euro-Par 2012 Parallel Processing, pages 871--882. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Rice University. High performance fortran language specification. SIGPLAN Fortran Forum, 12(4):1--86, Dec. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Song and J. Dongarra. A scalable framework for heterogeneous gpu-based clusters. In Proceedings of the Twenty-fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, pages 91--100, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Steuwer and S. Gorlatch. Skelcl: Enhancing opencl for high-level programming of multi-gpu systems. In V. Malyshkin, editor, Parallel Computing Technologies, volume 7979 of Lecture Notes in Computer Science, pages 258--272. Springer Berlin Heidelberg, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Tian, R. Xu, Y. Yan, Z. Yun, S. Chandrasekaran, and B. Chapman. Compiling a High-Level Directive-Based Programming Model for GPGPUs, 2013.Google ScholarGoogle Scholar
  20. R. Xu, S. Chandrasekaran, and B. Chapman. Exploring Programming Multi-GPUs using OpenMP & OpenACC-based Hybrid Model. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, pages 1169--1176. IEEE Computer Society, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Xu, X. Tian, Y. Yan, S. Chandrasekaran, and B. Chapman. Reduction operations in parallel loops for gpgpus. In Proceedings of Programming Models and Applications on Multicores and Manycores, PMAM'14, pages 10:10--10:20, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Supporting multiple accelerators in high-level programming models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores
      February 2015
      186 pages
      ISBN:9781450334044
      DOI:10.1145/2712386

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 February 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      PMAM '15 Paper Acceptance Rate19of34submissions,56%Overall Acceptance Rate53of97submissions,55%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader