ABSTRACT
The past decade has seen the advent of a number of parallel programming models such as Coarray Fortran (CAF), Unified Parallel C, X10, and Chapel. Despite the productivity gains promised by these models, most parallel scientific applications still rely on MPI as their data movement model. One reason for this trend is that it is hard for users to incrementally adopt these new programming models in existing MPI applications. Because each model use its own runtime system, they duplicate resources and are potentially error-prone. Such independent runtime systems were deemed necessary because MPI was considered insufficient in the past to play this role for these languages.
The recently released MPI-3, however, adds several new capabilities that now provide all of the functionality needed to act as a runtime, including a much more comprehensive one-sided communication framework. In this paper, we investigate how MPI-3 can form a runtime system for one example programming model, CAF, with a broader goal of enabling a single application to use both MPI and CAF with the highest level of interoperability.
- A. Petitet and R.C.Whaley and J.Dongara and A.Cleary. HPL - A Portable Implementation of the High-Performance Linpack. http://bit.ly/JtavrU, Sept. 2008.Google Scholar
- L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr. Comput. : Pract. Exper., 22(6):685--701, 2010. Google ScholarDigital Library
- R. C. Agarwal, F. G. Gustavson, and M. Zubair. A high performance parallel algorithm for 1-D FFT. In Proceedings of the 1994 Conference on Supercomputing, pages 34--40, Los Alamitos, CA, USA, 1994. Google ScholarDigital Library
- D. Bonachea. Active Messages over MPI. URL http://bit.ly/14VZNOs.Google Scholar
- D. Bonachea. GASNet specification, v1.1. Technical Report UCB/CSD-02-1207, University of California at Berkeley, Berkeley, CA, USA, 2002. Google Scholar
- D. Bonachea and J. Duell. Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations. Int. J. High Perform. Comput. Netw., 1(1-3):91--99, Aug. 2004. Google ScholarDigital Library
- B. Chamberlain, D. Callahan, and H. Zima. Parallel Programmability and the Chapel Language. Intl. J. of High Performance Computing Applications, 21(3):291--312, 2007. Google ScholarDigital Library
- J. Dinan, P. Balaji, E. L. Lusk, P. Sadayappan, and R. Thakur. Hybrid Parallel Programming with MPI and Unified Parallel C. In 7th ACM International Conference on Computing Frontiers, Bertinoro, Italy, Apr. 2010. Google ScholarDigital Library
- J. Dinan, P. Balaji, J. R. Hammond, S. Krishnamoorthy, and V. Tipparaju. Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication. In Proc. 26th Intl. Parallel and Distributed Processing Symp. (IPDPS), Shanghai, China, May 2012. Google ScholarDigital Library
- R. Gerstenberger, M. Besta, and T. Hoefler. Enabling Highly-scalable Remote Memory Access Programming with MPI- 3 One Sided. In Proceedings of Intl. Conf. for High Perf. Computing, Networking, Storage and Analysis, SC '13, pages 53:1--53:12, New York, NY, USA, 2013. URL http://bit.ly/1dCbxe2. Google ScholarDigital Library
- HPC Challenge Benchmark. HPC challenge benchmark, July 2010. URL http://icl.cs.utk.edu/hpcc.Google Scholar
- G. Jin, J. Mellor-Crummey, L. Adhianto, W. N. Scherer III, and C. Yang. Implementation and Performance Evaluation of the HPC Challenge Benchmarks in Coarray Fortran 2.0. In Proceedings of the 2011 IEEE Intl. Parallel & Distributed Processing Symposium, IPDPS '11, pages 1089--1100, Washington, DC, USA, 2011. Google ScholarDigital Library
- S. L. Johnsson and R. L. Krawitz. Cooley-Tukey FFT on the Connection Machine. In: Parallel Computing. Volume, 18: 1201--1221, 1991.Google Scholar
- J. Jose, M. Luo, S. Sur, and D. K. Panda. Unifying UPC and MPI runtimes: experience with MVAPICH. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, page 5. ACM, 2010. Google ScholarDigital Library
- Khronos OpenCL Working Group. The OpenCL Specification, Version 2.0, July 2013. URL http://bit.ly/15tR61M.Google Scholar
- J. Kim, K. P. Esler, J. McMinis, M. A. Morales, B. K. Clark, L. Shulenburger, and D. M. Ceperley. Quantum energy density: Improved efficiency for quantum Monte Carlo calculations. Physical Review B, 88(3), 2013.Google Scholar
- E. Lusk, S. Pieper, and R. Butler. More SCALABILITY, Less PAIN. SciDAC Review, (17):30--37, 2010. URL http://bit.ly/163sZtd.Google Scholar
- J. Mellor-Crummey, L. Adhianto, W. N. Scherer, III, and G. Jin. A new vision for Coarray Fortran. In Proceedings of the 3rd Conf. on Partitioned Global Address Space Programing Models, PGAS '09, pages 1--9, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- Message Passing Interface Forum. MPI Report 3.0, Sept. 2012. URL http://bit.ly/Ul0wY2.Google Scholar
- J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable Parallel Programming with CUDA. Queue, 6(2):40--53, Mar. 2008. ISSN 1542-7730. Google ScholarDigital Library
- OpenMP Architecture Review Board. OpenMP Application Program Interface Version 4.0, July 2013. URL http://bit.ly/13LNHtI.Google Scholar
- S. C. Pieper and R. B. Wiringa. QUANTUM MONTE CARLO CALCULATIONS OF LIGHT NUCLEI. Annual Review of Nuclear and Particle Science, 51(1):53--90, 2001. URL http://bit.ly/143fd6u.Google ScholarCross Ref
- R. Preissl, N. Wichmann, B. Long, J. Shalf, S. Ethier, and A. Koniges. Multithreaded Global Address Space Communication Techniques for Gyrokinetic Fusion Applications on Ultra-Scale Platforms. In Proceedings of 2011 Int. Conf. for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 78:1--78:11, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0771-0. Google ScholarDigital Library
- J. Reid. Coarrays in Fortran 2008. In Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, PGAS '09, pages 4:1--4:1, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-836-0. Google ScholarDigital Library
- V. A. Saraswat, B. Bloom, I. Peshansky, O. Tardieu, and D. Grove. X10 Language Specification, Version 2.2, Sept. 2011. URL http://bit.ly/1431tse.Google Scholar
- W. N. Scherer, III, L. Adhianto, G. Jin, J. Mellor-Crummey, and C. Yang. Hiding latency in coarray fortran 2.0. In Proceedings of the 4th Conf. on Partitioned Global Address Space Programming Model, PGAS '10, pages 14:1--14:9, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0461-0. Google ScholarDigital Library
- A. Stone, J. Dennis, and M. M. Strout. The CGPOP Miniapp, Version 1.0. Technical Report CS-11-103, Colorado State University, July 2011.Google Scholar
- D. Takahashi and Y. Kanada. High-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed- memory parallel computers. J. Supercomput., 15(2):207--228, 2000. Google ScholarDigital Library
- UPC Consortium. UPC language specifications v1. 2. Technical report, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, May 2005.Google Scholar
- T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser. Active messages: a mechanism for integrated com- munication and computation. SIGARCH Comput. Archit. News, 20:256--266, Apr. 1992. ISSN 0163-5964. Google ScholarDigital Library
- C. Yang. Function shipping in a scalable parallel programming model. Master's thesis, Department of Computer Science, Rice University, Houston, Texas, 2012.Google Scholar
- C. Yang, K. Murthy, and J. Mellor-Crummey. Managing asynchronous operations in coarray fortran 2.0. In Proceedings of the 2013 IEEE International Symposium on Parallel Distributed Processing, pages 1321--1332, 2013. Google ScholarDigital Library
- X. Zhao, P. Balaji, W. D. Gropp, and R. S. Thakur. MPI- Interoperable Generalized Active Messages. In IEEE International Conference on Parallel and Distributed Systems (IC- PADS), Dec. 2013.Google Scholar
- X. Zhao, D. Buntinas, J. A. Zounmevo, J. Dinan, D. Goodell, P. Balaji, R. Thakur, A. Afsahi, and W. Gropp. Toward Asynchronous and MPI-Interoperable Active Messages. In CCGRID'13, pages 87--94, 2013.Google Scholar
Index Terms
- Portable, MPI-interoperable coarray fortran
Recommendations
Portable, MPI-interoperable coarray fortran
PPoPP '14The past decade has seen the advent of a number of parallel programming models such as Coarray Fortran (CAF), Unified Parallel C, X10, and Chapel. Despite the productivity gains promised by these models, most parallel scientific applications still rely ...
Preliminary Implementation of Coarray Fortran Translator Based on Omni XcalableMP
PGAS '15: Proceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming ModelsXcalableMP (XMP) is a PGAS language for distributed memory environments. It employs Coarray Fortran (CAF) features as the local-view programming model. We implemented the main part of CAF in the form of a translator, i.e., a source-to-source compiler, ...
OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran
CLUSTER '15: Proceedings of the 2015 IEEE International Conference on Cluster ComputingLanguages and libraries based on the Partitioned Global Address Space (PGAS) programming model have emerged in recent years with a focus on addressing the programming challenges for scalable parallel systems. Among these, Coarray Fortran (CAF) is unique ...
Comments