research-article

Efficient, portable implementation of asynchronous multi-place programs

Authors:

Ganesh Bikshandi,

Jose G. Castanos,

Sreedhar B. Kodali,

V. Krishna Nandivada,

Igor Peshansky,

Vijay A. Saraswat,

Tong WenAuthors Info & Claims

PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 271 - 282

https://doi.org/10.1145/1504176.1504215

Published: 14 February 2009 Publication History

Abstract

The X10 programming language is organized around the notion of places (an encapsulation of data and activities operating on the data), partitioned global address space (PGAS), and asynchronous computation and communication.

This paper introduces an expressive subset of X10, Flat X10, designed to permit efficient execution across multiple single-threaded places with a simple runtime and without compromising on the productivity of X10. We present the design, implementation and evaluation of a compiler and runtime system for Flat X10. The Flat X10 compiler translates programs into C++ SPMD programs communicating using an active messaging infrastructure. It uses novel techniques to transform explicitly parallel programs into SPMD programs. The runtime system is based on IBM's LAPI (Low-level API) and is easily portable to other libraries such as GASNet and ARMCI.

Our implementation realizes performance comparable to hand-written MPI programs for well-known HPC benchmarks such as Random Access, Stream, and FFT, on a Federation-based cluster of Power5 SMPs (with hundreds of processors) and the Blue Gene (with thousands of processors). Submissions based on the work presented in this paper were co-winners of the 2007 and 2008 HPC Challenge Type II Awards.

References

[1]

Saman P. Amarasinghe and Monica S. Lam. Communication Optimization and Code Generation for Distributed Memory Machines. In Proceedings of the ACM SIGPLAN conference on Programming language design and implementation, pages 126--138. ACM, 1993.

Digital Library

[2]

Christopher Barton, CĆlin Casçaval, George Almási, Yili Zheng, Montse Farreras, Siddhartha Chatterje, and José Nelson Amaral. Shared memory programming for large scale machines. In Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, pages 108--117. ACM, 2006.

Digital Library

[3]

Stephen M. Blackburn, Richard L. Hudson, Ron Morrison, David S. Munro, and John Zigman. Starting with termination: A methodology for building distributed garbage collection algorithms. Aust. Comput. Sci. Commun, 23:2001, 2001.

Digital Library

[4]

Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: an efficient multithreaded runtime system. In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 207--216. ACM, 1995.

Digital Library

[5]

UPC Consortium. UPC language specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Laboratory, 2005.

[6]

Ron Cytron, Jim Lipkis, and Edith Schonberg. A Compiler-Assisted Approach to SPMD Execution. In Proceedings of the ACM/IEEE conference on Supercomputing, pages 398--406. IEEE Computer Society, 1990.

Digital Library

[7]

F. Darema-Rogers, D. A. George, V.A. Norton, and G.F. Pfister. A Single-Program-Multiple-Data Computational Model for EPEX/FORTRAN. Parallel Computing, 7:11--24, 1988.

[8]

F. Darema-Rogers, V. A. Norton, and G. F. Pfister. Using A Single-Program-Multiple-Data Computational Model for Parallel Execution of Scientific Applications. Technical Report RC 11552, IBM T. J. Watson Research Center, Yorktown Heights, NY, 1985.

[9]

Jeffrey Dean, David Grove, and Craig Chambers. Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis. In Proceedings of the European Conference on Object-Oriented Programming, pages 77--101. Springer-Verlag, 1995.

Digital Library

[10]

V. Saraswat et al. HPC challenge 07: X10, 2007.

[11]

R. Garg and Y. Sabharwal. MPI and Communication -- Software Routing and Aggregation of Messages to Optimize the Performance of HPCC RandomAccess Benchmark. In SuperComputing, Nov 2006.

Digital Library

[12]

Paul N. Hilfinger, Dan Bonachea, David Gay, Susan Graham, Ben Liblit, Geoff Pike, and Katherine Yelick. Titanium Language Reference Manual. Technical report, University of California at Berkeley, 2001.

Digital Library

[13]

IBM International Technical Support Organization Poughkeepsie Center. Overview of LAPI. www.redbooks.ibm.com/redbooks/pdfs/sg242080.pdf, 2008.

[14]

Eric Mohr, David A. Kranz, and Jr. Robert H. Halstead. Lazy task creation: a technique for increasing the granularity of parallel programs. In Proceedings of the 1990 ACM conference on LISP and functional programming, pages 185--197. ACM, 1990.

Digital Library

[15]

R. Numrich and J. Reid. Co-array fortran for parallel programming, 1998.

[16]

E. M. Paalvast, L. C. Breebart, and H. J. Sips. An expressive annotation model for generating SPMD programs. In Scalable High Performance Computing Conference, pages 208--211. IEEE Computer Society, 1992.

[17]

Edwin M. Paalvast, Arjan J. van Gemund, and Henk J. Sips. A method for parallel program generation with an application to the Booster language. SIGARCH Comput. Archit. News, 18(3b):457--469, 1990.

Digital Library

[18]

Vijay A. Saraswat. X10 Language Report. Technical report, IBM Research, 2004.

[19]

Chau-Wen Tseng. Compiler optimizations for eliminating barrier synchronization. In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 144--155. ACM, 1995.

Digital Library

[20]

Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. Active messages: a mechanism for integrated communication and computation. In Proceedings of the 19th annual international symposium on Computer architecture, pages 256--266. ACM, 1992.

Digital Library

[21]

Deborah A. Wallach, Wilson C. Hsieh, Kirk L. Johnson, M. Frans Kaashoek, and William E. Weihl. Optimistic active messages: a mechanism for scheduling communication with computation. In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 217--226. ACM, 1995.

Digital Library

Cited By

Slaughter ELee WTreichler SZhang WBauer MShipman GMcCormick PAiken AMohr BRaghavan P(2017)Control replicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126949(1-12)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3126908.3126949
Gupta SShrivastava RNandivada VGropp WBeckman PLi ZCazorla F(2017)Optimizing recursive task parallel programsProceedings of the International Conference on Supercomputing10.1145/3079079.3079102(1-11)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3079079.3079102
Freiberg OPalsberg JEslamimehr M(2016)Retargetable Communication for Distributed Programs2016 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA)10.1109/QoSA.2016.8(21-30)Online publication date: Apr-2016
https://doi.org/10.1109/QoSA.2016.8
Show More Cited By

Index Terms

Efficient, portable implementation of asynchronous multi-place programs
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Efficient, portable implementation of asynchronous multi-place programs
PPoPP '09

The X10 programming language is organized around the notion of places (an encapsulation of data and activities operating on the data), partitioned global address space (PGAS), and asynchronous computation and communication.

This paper introduces an ...
Preliminary Implementation of Coarray Fortran Translator Based on Omni XcalableMP
PGAS '15: Proceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming Models

XcalableMP (XMP) is a PGAS language for distributed memory environments. It employs Coarray Fortran (CAF) features as the local-view programming model. We implemented the main part of CAF in the form of a translator, i.e., a source-to-source compiler, ...
A Source-to-Source Translation of Coarray Fortran with MPI for High Performance
HPCAsia '18: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

Coarray Fortran (CAF) is a partitioned global address space (PGAS) language that is a part of standard Fortran 2008. We have implemented it as a source-to-source translator as a part of the Omni XcalebleMP compiler. Since the output is written in Fortran ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

February 2009

322 pages

ISBN:9781605583976

DOI:10.1145/1504176

General Chair:
Daniel Reed
Microsoft Research, USA
,
Program Chair:
Vivek Sarkar
Rice University, USA

ACM SIGPLAN Notices Volume 44, Issue 4
PPoPP '09
April 2009
294 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1594835
Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PPoPP09

Sponsor:

PPoPP09: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 14 - 18, 2009

NC, Raleigh, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
582
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Slaughter ELee WTreichler SZhang WBauer MShipman GMcCormick PAiken AMohr BRaghavan P(2017)Control replicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126949(1-12)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3126908.3126949
Gupta SShrivastava RNandivada VGropp WBeckman PLi ZCazorla F(2017)Optimizing recursive task parallel programsProceedings of the International Conference on Supercomputing10.1145/3079079.3079102(1-11)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3079079.3079102
Freiberg OPalsberg JEslamimehr M(2016)Retargetable Communication for Distributed Programs2016 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA)10.1109/QoSA.2016.8(21-30)Online publication date: Apr-2016
https://doi.org/10.1109/QoSA.2016.8
Zhao JLublinerman RBudimlić ZChaudhuri SSarkar V(2013)Isolation for nested task parallelismACM SIGPLAN Notices10.1145/2544173.250953448:10(571-588)Online publication date: 29-Oct-2013
https://dl.acm.org/doi/10.1145/2544173.2509534
Lifflander JMiller PKale L(2013)Adoption protocols for fanout-optimal fault-tolerant termination detectionACM SIGPLAN Notices10.1145/2517327.244251948:8(13-22)Online publication date: 23-Feb-2013
https://dl.acm.org/doi/10.1145/2517327.2442519
Zhao JLublinerman RBudimlić ZChaudhuri SSarkar VHosking AEugster PLopes C(2013)Isolation for nested task parallelismProceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications10.1145/2509136.2509534(571-588)Online publication date: 29-Oct-2013
https://dl.acm.org/doi/10.1145/2509136.2509534
Nandivada VShirako JZhao JSarkar V(2013)A Transformation Framework for Optimizing Task-Parallel ProgramsACM Transactions on Programming Languages and Systems10.1145/2450136.245013835:1(1-48)Online publication date: 1-Apr-2013
https://dl.acm.org/doi/10.1145/2450136.2450138
Lifflander JMiller PKale LNicolau AShen XAmarasinghe SVuduc R(2013)Adoption protocols for fanout-optimal fault-tolerant termination detectionProceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/2442516.2442519(13-22)Online publication date: 23-Feb-2013
https://dl.acm.org/doi/10.1145/2442516.2442519
Chavarria-Miranda DKrishnamoorthy SVishnu A(2012)Global FuturesProceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)10.1109/CCGrid.2012.105(393-401)Online publication date: 13-May-2012
https://dl.acm.org/doi/10.1109/CCGrid.2012.105
Zhang CXie CXiao ZChen H(2011)Evaluating the performance and scalability of mapreduce applications on X10Proceedings of the 9th international conference on Advanced parallel processing technologies10.5555/2042522.2042526(46-57)Online publication date: 26-Sep-2011
https://dl.acm.org/doi/10.5555/2042522.2042526
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten