ABSTRACT
With the shift to exascale computer systems, the importance of productive programming models for distributed systems is increasing. Partitioned Global Address Space (PGAS) programming models aim to reduce the complexity of writing distributed-memory parallel programs by introducing global operations on distributed arrays, distributed task parallelism, directed synchronization, and mutual exclusion. However, a key challenge in the application of PGAS programming models is the improvement of compilers and runtime systems. In particular, one open question is how runtime systems meet the requirement of exascale systems, where a large number of asynchronous tasks are executed.
While there are various tasking runtimes such as Qthreads, OCR, and HClib, there is no existing comparative study on PGAS tasking/threading runtime systems. To explore runtime systems for PGAS programming languages, we have implemented OCR-based and HClib-based Chapel runtimes and evaluated them with an initial focus on tasking and synchronization implementations. The results show that our OCR and HClib-based implementations can improve the performance of PGAS programs compared to the existing Qthreads backend of Chapel.
- Ben Albrecht and Michael Ferguson. 2016. Social Network Analysis on Twitter with Chapel. In Proceedings of the Chapel Implementers and Users Workshop (CHIUW '16).Google Scholar
- Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing Locality and Independence with Logical Regions. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA, Article 66, 11 pages. http://dl.acm.org/citation.cfm?id=2388996.2389086Google ScholarDigital Library
- Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: an efficient multithreaded runtime system. (1995), 207--216. https://doi.org/10.1145/209936.209958Google Scholar
- Chapel. 2017. a Productive Parallel Programming Language. https://github.com/chapel-lang/chapel (Accessed 13 October 2017). (2017).Google Scholar
- Chapel. 2017. The Chapel Language Specification Version 0.983. http://chapel.cray.com/docs/latest/_downloads/chapelLanguageSpec.pdf. (April 2017).Google Scholar
- Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS Community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model (PGAS '10). ACM, New York, NY, USA, Article 2, 3 pages. https://doi.org/10.1145/2020373.2020375Google ScholarDigital Library
- Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: an object-oriented approach to non-uniform cluster computing (OOPSLA'05). ACM, New York, NY, USA, 519--538.Google Scholar
- Sanjay Chatterjee, Sagnak Tasirlar, Zoran Budimlić, Vincent Cavé, Milind Chabbi, Max Grossman, Vivek Sarkar, and Yonghong Yan. 2013. Integrating Asynchronous Task Parallelism with MPI (IPDPS '13). IEEE Computer Society, Washington, DC, USA, 712--725. https://doi.org/10.1109/IPDPS.2013.78Google Scholar
- COMD. 2017. CoMD implementation in Chapel. https://github.com/LLNL/CoMD-Chapel (Accessed 13 October 2017). (2017).Google Scholar
- Jiri Dokulil, Martin Sandrieser, and Siegfried Benkner. 2015. OCR-Vx - An Alternative Implementation of the Open Community Runtime. In International Workshop on Runtime Systems for Extreme Scale Programming Models and Architecture (RESPA '15).Google Scholar
- Tarek El-Ghazawi, William W. Carlson, and Jesse M. Draper. 2003. UPC Language Specification V1.1.1. (October 2003).Google Scholar
- Sri Raj Paul et al. 2017. Chapel Tasking Runtimes with OCR and HClib. https://github.com/srirajpaul/chapel/tree/hclib_ocr (Accessed 13 October 2017). (2017).Google Scholar
- William Gropp, Ewing Lusk, and Anthony Skjellum. 1994. Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, MA.Google Scholar
- Max Grossman, Vivek Kumar, Zoran Budimlić, and Vivek Sarkar. 2016. Integrating Asynchronous Task Parallelism with OpenSHMEM. In Workshop on OpenSHMEM and Related Technologies. Springer, 3--17.Google Scholar
- Riyaz Haque and David Richards. 2016. Optimizing PGAS Overhead in a Multi-locale Chapel Implementation of CoMD. In Proceedings of the First Workshop on PGAS Applications (PAW '16). IEEE Press, Piscataway, NJ, USA, 25--32. https://doi.org/10.1109/PAW.2016.9Google ScholarDigital Library
- Intel. 2017. Open Community Runtime. [online] https://01.org/open-community-runtime (Accessed 13 October 2017). (2017).Google Scholar
- Vivek Kumar, Karthik Murthy, Vivek Sarkar, and Yili Zheng. 2016. Optimized Distributed Work-stealing. In Proceedings of the Sixth Workshop on Irregular Applications: Architectures and Algorithms (IA3 '16). IEEE Press, Piscataway, NJ, USA, 74--77. https://doi.org/10.1109/IA3.2016.19Google ScholarDigital Library
- Vivek Kumar, Yili Zheng, Vincent Cavé, Zoran Budimlić, and Vivek Sarkar. 2014. HabaneroUPC++: A Compiler-free PGAS Library. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (PGAS '14). ACM, New York, NY, USA, Article 5, 10 pages. https://doi.org/10.1145/2676870.2676879Google ScholarDigital Library
- T. G. Mattson, R. Cledat, V. Cave, V. Sarkar, Z. Budimlic, S. Chatterjee, J. Fryman, I. Ganev, R. Knauerhase, Min Lee, B. Meister, B. Nickerson, N. Pepperling, B. Seshasayee, S. Tasirlar, J. Teller, and N. Vrvilo. 2016. The Open Community Runtime: A runtime system for extreme scale computing. In 2016 IEEE High Performance Extreme Computing Conference (HPEC). 1--7. https://doi.org/10.1109/HPEC.2016.7761580Google ScholarCross Ref
- John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21--65. https://doi.org/10.1145/103727.103729Google ScholarDigital Library
- Robert W. Numrich and John Reid. 1998. Co-array Fortran for Parallel Programming. SIGPLAN Fortran Forum 17, 2 (Aug. 1998), 1--31. https://doi.org/10.1145/289918.289920Google ScholarDigital Library
- Stephen Olivier, Jun Huan, Jinze Liu, Jan Prins, James Dinan, P. Sadayappan, and Chau-Wen Tseng. 2007. UTS: An Unbalanced Tree Search Benchmark. Springer Berlin Heidelberg, Berlin, Heidelberg, 235--250. https://doi.org/10.1007/978-3-540-72521-3_18Google Scholar
- Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. 2008. Phasers: A Unified Deadlock-free Construct for Collective and Point-to-point Synchronization. In Proceedings of the 22Nd Annual International Conference on Supercomputing (ICS '08). ACM, New York, NY, USA, 277--288. https://doi.org/10.1145/1375527.1375568Google Scholar
- Sean Treichler, Michael Bauer, and Alex Aiken. 2014. Realm: An Event-based Low-level Runtime for Distributed Memory Architectures. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14). ACM, New York, NY, USA, 263--276. https://doi.org/10.1145/2628071.2628084Google ScholarDigital Library
- K. B. Wheeler, R. C. Murphy, and D. Thain. 2008. Qthreads: An API for programming with millions of lightweight threads. In 2008 IEEE International Symposium on Parallel and Distributed Processing. 1--8. https://doi.org/10.1109/IPDPS.2008.4536359Google ScholarCross Ref
- Yili Zheng, Amir Kamil, Michael B. Driscoll, Hongzhang Shan, and Katherine Yelick. 2014. UPC++: A PGAS Extension for C++ (IPDPS '14). IEEE Computer Society, Washington, DC, USA, 1105--1114. https://doi.org/10.1109/IPDPS.2014.115Google Scholar
Index Terms
- Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Recommendations
GPUIterator: bridging the gap between Chapel and GPU platforms
CHIUW 2019: Proceedings of the ACM SIGPLAN 6th on Chapel Implementers and Users WorkshopPGAS (Partitioned Global Address Space) programming models were originally designed to facilitate productive parallel programming at both the intra-node and inter-node levels in homogeneous parallel machines. However, there is a growing need to support ...
LLVM-based communication optimizations for PGAS programs
LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPCWhile Partitioned Global Address Space (PGAS) programming languages such as UPC/UPC++, CAF, Chapel and X10 provide high-level programming models for facilitating large-scale distributed-memory parallel programming, it is widely recognized that compiler ...
Towards Resilient Chapel: Design and implementation of a transparent resilience mechanism for Chapel
EASC '15: Proceedings of the 3rd International Conference on Exascale Applications and SoftwareThe exponential increase of components in modern High Performance Computing (HPC) systems poses a challenge on their resilience: predictions of time between failures on ExaScale systems range from hours to minutes, yet the prevalent HPC programming ...
Comments