skip to main content
10.1145/1504176.1504215acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Efficient, portable implementation of asynchronous multi-place programs

Published: 14 February 2009 Publication History

Abstract

The X10 programming language is organized around the notion of places (an encapsulation of data and activities operating on the data), partitioned global address space (PGAS), and asynchronous computation and communication.
This paper introduces an expressive subset of X10, Flat X10, designed to permit efficient execution across multiple single-threaded places with a simple runtime and without compromising on the productivity of X10. We present the design, implementation and evaluation of a compiler and runtime system for Flat X10. The Flat X10 compiler translates programs into C++ SPMD programs communicating using an active messaging infrastructure. It uses novel techniques to transform explicitly parallel programs into SPMD programs. The runtime system is based on IBM's LAPI (Low-level API) and is easily portable to other libraries such as GASNet and ARMCI.
Our implementation realizes performance comparable to hand-written MPI programs for well-known HPC benchmarks such as Random Access, Stream, and FFT, on a Federation-based cluster of Power5 SMPs (with hundreds of processors) and the Blue Gene (with thousands of processors). Submissions based on the work presented in this paper were co-winners of the 2007 and 2008 HPC Challenge Type II Awards.

References

[1]
Saman P. Amarasinghe and Monica S. Lam. Communication Optimization and Code Generation for Distributed Memory Machines. In Proceedings of the ACM SIGPLAN conference on Programming language design and implementation, pages 126--138. ACM, 1993.
[2]
Christopher Barton, CĆlin Casçaval, George Almási, Yili Zheng, Montse Farreras, Siddhartha Chatterje, and José Nelson Amaral. Shared memory programming for large scale machines. In Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, pages 108--117. ACM, 2006.
[3]
Stephen M. Blackburn, Richard L. Hudson, Ron Morrison, David S. Munro, and John Zigman. Starting with termination: A methodology for building distributed garbage collection algorithms. Aust. Comput. Sci. Commun, 23:2001, 2001.
[4]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: an efficient multithreaded runtime system. In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 207--216. ACM, 1995.
[5]
UPC Consortium. UPC language specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Laboratory, 2005.
[6]
Ron Cytron, Jim Lipkis, and Edith Schonberg. A Compiler-Assisted Approach to SPMD Execution. In Proceedings of the ACM/IEEE conference on Supercomputing, pages 398--406. IEEE Computer Society, 1990.
[7]
F. Darema-Rogers, D. A. George, V.A. Norton, and G.F. Pfister. A Single-Program-Multiple-Data Computational Model for EPEX/FORTRAN. Parallel Computing, 7:11--24, 1988.
[8]
F. Darema-Rogers, V. A. Norton, and G. F. Pfister. Using A Single-Program-Multiple-Data Computational Model for Parallel Execution of Scientific Applications. Technical Report RC 11552, IBM T. J. Watson Research Center, Yorktown Heights, NY, 1985.
[9]
Jeffrey Dean, David Grove, and Craig Chambers. Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis. In Proceedings of the European Conference on Object-Oriented Programming, pages 77--101. Springer-Verlag, 1995.
[10]
V. Saraswat et al. HPC challenge 07: X10, 2007.
[11]
R. Garg and Y. Sabharwal. MPI and Communication -- Software Routing and Aggregation of Messages to Optimize the Performance of HPCC RandomAccess Benchmark. In SuperComputing, Nov 2006.
[12]
Paul N. Hilfinger, Dan Bonachea, David Gay, Susan Graham, Ben Liblit, Geoff Pike, and Katherine Yelick. Titanium Language Reference Manual. Technical report, University of California at Berkeley, 2001.
[13]
IBM International Technical Support Organization Poughkeepsie Center. Overview of LAPI. www.redbooks.ibm.com/redbooks/pdfs/sg242080.pdf, 2008.
[14]
Eric Mohr, David A. Kranz, and Jr. Robert H. Halstead. Lazy task creation: a technique for increasing the granularity of parallel programs. In Proceedings of the 1990 ACM conference on LISP and functional programming, pages 185--197. ACM, 1990.
[15]
R. Numrich and J. Reid. Co-array fortran for parallel programming, 1998.
[16]
E. M. Paalvast, L. C. Breebart, and H. J. Sips. An expressive annotation model for generating SPMD programs. In Scalable High Performance Computing Conference, pages 208--211. IEEE Computer Society, 1992.
[17]
Edwin M. Paalvast, Arjan J. van Gemund, and Henk J. Sips. A method for parallel program generation with an application to the Booster language. SIGARCH Comput. Archit. News, 18(3b):457--469, 1990.
[18]
Vijay A. Saraswat. X10 Language Report. Technical report, IBM Research, 2004.
[19]
Chau-Wen Tseng. Compiler optimizations for eliminating barrier synchronization. In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 144--155. ACM, 1995.
[20]
Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. Active messages: a mechanism for integrated communication and computation. In Proceedings of the 19th annual international symposium on Computer architecture, pages 256--266. ACM, 1992.
[21]
Deborah A. Wallach, Wilson C. Hsieh, Kirk L. Johnson, M. Frans Kaashoek, and William E. Weihl. Optimistic active messages: a mechanism for scheduling communication with computation. In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 217--226. ACM, 1995.

Cited By

View all
  • (2017)Control replicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126949(1-12)Online publication date: 12-Nov-2017
  • (2017)Optimizing recursive task parallel programsProceedings of the International Conference on Supercomputing10.1145/3079079.3079102(1-11)Online publication date: 14-Jun-2017
  • (2016)Retargetable Communication for Distributed Programs2016 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA)10.1109/QoSA.2016.8(21-30)Online publication date: Apr-2016
  • Show More Cited By

Index Terms

  1. Efficient, portable implementation of asynchronous multi-place programs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
    February 2009
    322 pages
    ISBN:9781605583976
    DOI:10.1145/1504176
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 44, Issue 4
      PPoPP '09
      April 2009
      294 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1594835
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 February 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. apgas
    2. asynchrony
    3. compiler
    4. fft
    5. hpc
    6. hpc challenge
    7. pgas
    8. random access
    9. runtime
    10. spmd
    11. stream
    12. x10

    Qualifiers

    • Research-article

    Conference

    PPoPP09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Control replicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126949(1-12)Online publication date: 12-Nov-2017
    • (2017)Optimizing recursive task parallel programsProceedings of the International Conference on Supercomputing10.1145/3079079.3079102(1-11)Online publication date: 14-Jun-2017
    • (2016)Retargetable Communication for Distributed Programs2016 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA)10.1109/QoSA.2016.8(21-30)Online publication date: Apr-2016
    • (2013)Isolation for nested task parallelismACM SIGPLAN Notices10.1145/2544173.250953448:10(571-588)Online publication date: 29-Oct-2013
    • (2013)Adoption protocols for fanout-optimal fault-tolerant termination detectionACM SIGPLAN Notices10.1145/2517327.244251948:8(13-22)Online publication date: 23-Feb-2013
    • (2013)Isolation for nested task parallelismProceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications10.1145/2509136.2509534(571-588)Online publication date: 29-Oct-2013
    • (2013)A Transformation Framework for Optimizing Task-Parallel ProgramsACM Transactions on Programming Languages and Systems10.1145/2450136.245013835:1(1-48)Online publication date: 1-Apr-2013
    • (2013)Adoption protocols for fanout-optimal fault-tolerant termination detectionProceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/2442516.2442519(13-22)Online publication date: 23-Feb-2013
    • (2012)Global FuturesProceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)10.1109/CCGrid.2012.105(393-401)Online publication date: 13-May-2012
    • (2011)Evaluating the performance and scalability of mapreduce applications on X10Proceedings of the 9th international conference on Advanced parallel processing technologies10.5555/2042522.2042526(46-57)Online publication date: 26-Sep-2011
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media