skip to main content
10.1145/3152041.3152086acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages

Published:12 November 2017Publication History

ABSTRACT

With the shift to exascale computer systems, the importance of productive programming models for distributed systems is increasing. Partitioned Global Address Space (PGAS) programming models aim to reduce the complexity of writing distributed-memory parallel programs by introducing global operations on distributed arrays, distributed task parallelism, directed synchronization, and mutual exclusion. However, a key challenge in the application of PGAS programming models is the improvement of compilers and runtime systems. In particular, one open question is how runtime systems meet the requirement of exascale systems, where a large number of asynchronous tasks are executed.

While there are various tasking runtimes such as Qthreads, OCR, and HClib, there is no existing comparative study on PGAS tasking/threading runtime systems. To explore runtime systems for PGAS programming languages, we have implemented OCR-based and HClib-based Chapel runtimes and evaluated them with an initial focus on tasking and synchronization implementations. The results show that our OCR and HClib-based implementations can improve the performance of PGAS programs compared to the existing Qthreads backend of Chapel.

References

  1. Ben Albrecht and Michael Ferguson. 2016. Social Network Analysis on Twitter with Chapel. In Proceedings of the Chapel Implementers and Users Workshop (CHIUW '16).Google ScholarGoogle Scholar
  2. Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing Locality and Independence with Logical Regions. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA, Article 66, 11 pages. http://dl.acm.org/citation.cfm?id=2388996.2389086Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: an efficient multithreaded runtime system. (1995), 207--216. https://doi.org/10.1145/209936.209958Google ScholarGoogle Scholar
  4. Chapel. 2017. a Productive Parallel Programming Language. https://github.com/chapel-lang/chapel (Accessed 13 October 2017). (2017).Google ScholarGoogle Scholar
  5. Chapel. 2017. The Chapel Language Specification Version 0.983. http://chapel.cray.com/docs/latest/_downloads/chapelLanguageSpec.pdf. (April 2017).Google ScholarGoogle Scholar
  6. Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS Community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model (PGAS '10). ACM, New York, NY, USA, Article 2, 3 pages. https://doi.org/10.1145/2020373.2020375Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: an object-oriented approach to non-uniform cluster computing (OOPSLA'05). ACM, New York, NY, USA, 519--538.Google ScholarGoogle Scholar
  8. Sanjay Chatterjee, Sagnak Tasirlar, Zoran Budimlić, Vincent Cavé, Milind Chabbi, Max Grossman, Vivek Sarkar, and Yonghong Yan. 2013. Integrating Asynchronous Task Parallelism with MPI (IPDPS '13). IEEE Computer Society, Washington, DC, USA, 712--725. https://doi.org/10.1109/IPDPS.2013.78Google ScholarGoogle Scholar
  9. COMD. 2017. CoMD implementation in Chapel. https://github.com/LLNL/CoMD-Chapel (Accessed 13 October 2017). (2017).Google ScholarGoogle Scholar
  10. Jiri Dokulil, Martin Sandrieser, and Siegfried Benkner. 2015. OCR-Vx - An Alternative Implementation of the Open Community Runtime. In International Workshop on Runtime Systems for Extreme Scale Programming Models and Architecture (RESPA '15).Google ScholarGoogle Scholar
  11. Tarek El-Ghazawi, William W. Carlson, and Jesse M. Draper. 2003. UPC Language Specification V1.1.1. (October 2003).Google ScholarGoogle Scholar
  12. Sri Raj Paul et al. 2017. Chapel Tasking Runtimes with OCR and HClib. https://github.com/srirajpaul/chapel/tree/hclib_ocr (Accessed 13 October 2017). (2017).Google ScholarGoogle Scholar
  13. William Gropp, Ewing Lusk, and Anthony Skjellum. 1994. Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  14. Max Grossman, Vivek Kumar, Zoran Budimlić, and Vivek Sarkar. 2016. Integrating Asynchronous Task Parallelism with OpenSHMEM. In Workshop on OpenSHMEM and Related Technologies. Springer, 3--17.Google ScholarGoogle Scholar
  15. Riyaz Haque and David Richards. 2016. Optimizing PGAS Overhead in a Multi-locale Chapel Implementation of CoMD. In Proceedings of the First Workshop on PGAS Applications (PAW '16). IEEE Press, Piscataway, NJ, USA, 25--32. https://doi.org/10.1109/PAW.2016.9Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Intel. 2017. Open Community Runtime. [online] https://01.org/open-community-runtime (Accessed 13 October 2017). (2017).Google ScholarGoogle Scholar
  17. Vivek Kumar, Karthik Murthy, Vivek Sarkar, and Yili Zheng. 2016. Optimized Distributed Work-stealing. In Proceedings of the Sixth Workshop on Irregular Applications: Architectures and Algorithms (IA3 '16). IEEE Press, Piscataway, NJ, USA, 74--77. https://doi.org/10.1109/IA3.2016.19Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Vivek Kumar, Yili Zheng, Vincent Cavé, Zoran Budimlić, and Vivek Sarkar. 2014. HabaneroUPC++: A Compiler-free PGAS Library. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (PGAS '14). ACM, New York, NY, USA, Article 5, 10 pages. https://doi.org/10.1145/2676870.2676879Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. G. Mattson, R. Cledat, V. Cave, V. Sarkar, Z. Budimlic, S. Chatterjee, J. Fryman, I. Ganev, R. Knauerhase, Min Lee, B. Meister, B. Nickerson, N. Pepperling, B. Seshasayee, S. Tasirlar, J. Teller, and N. Vrvilo. 2016. The Open Community Runtime: A runtime system for extreme scale computing. In 2016 IEEE High Performance Extreme Computing Conference (HPEC). 1--7. https://doi.org/10.1109/HPEC.2016.7761580Google ScholarGoogle ScholarCross RefCross Ref
  20. John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21--65. https://doi.org/10.1145/103727.103729Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Robert W. Numrich and John Reid. 1998. Co-array Fortran for Parallel Programming. SIGPLAN Fortran Forum 17, 2 (Aug. 1998), 1--31. https://doi.org/10.1145/289918.289920Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Stephen Olivier, Jun Huan, Jinze Liu, Jan Prins, James Dinan, P. Sadayappan, and Chau-Wen Tseng. 2007. UTS: An Unbalanced Tree Search Benchmark. Springer Berlin Heidelberg, Berlin, Heidelberg, 235--250. https://doi.org/10.1007/978-3-540-72521-3_18Google ScholarGoogle Scholar
  23. Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. 2008. Phasers: A Unified Deadlock-free Construct for Collective and Point-to-point Synchronization. In Proceedings of the 22Nd Annual International Conference on Supercomputing (ICS '08). ACM, New York, NY, USA, 277--288. https://doi.org/10.1145/1375527.1375568Google ScholarGoogle Scholar
  24. Sean Treichler, Michael Bauer, and Alex Aiken. 2014. Realm: An Event-based Low-level Runtime for Distributed Memory Architectures. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14). ACM, New York, NY, USA, 263--276. https://doi.org/10.1145/2628071.2628084Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. B. Wheeler, R. C. Murphy, and D. Thain. 2008. Qthreads: An API for programming with millions of lightweight threads. In 2008 IEEE International Symposium on Parallel and Distributed Processing. 1--8. https://doi.org/10.1109/IPDPS.2008.4536359Google ScholarGoogle ScholarCross RefCross Ref
  26. Yili Zheng, Amir Kamil, Michael B. Driscoll, Hongzhang Shan, and Katherine Yelick. 2014. UPC++: A PGAS Extension for C++ (IPDPS '14). IEEE Computer Society, Washington, DC, USA, 1105--1114. https://doi.org/10.1109/IPDPS.2014.115Google ScholarGoogle Scholar

Index Terms

  1. Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ESPM2'17: Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware
          November 2017
          61 pages
          ISBN:9781450351331
          DOI:10.1145/3152041

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 November 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate5of10submissions,50%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader