skip to main content
10.1145/1375527.1375544acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

Published:07 June 2008Publication History

ABSTRACT

We present the architecture of the Deep Computing Messaging Framework (DCMF), a message passing runtime designed for the Blue Gene/P machine and other HPC architectures. DCMF has been designed to easily support several programming paradigms such as the Message Passing Interface (MPI), Aggregate Remote Memory Copy Interface (ARMCI), Charm++, and others. This support is made possible as DCMF provides an application programming interface (API) with active messages and non-blocking collectives. DCMF is being open sourced and has a layered component based architecture with multiple levels of abstraction, allowing the members of the community to contribute new components to its design at the various layers. The DCMF runtime can be extended to other architectures through the development of architecture specific implementations of interface classes. The production DCMF runtime on Blue Gene/P takes advantage of the direct memory access (DMA) hardware to offload message passing work and achieve good overlap of computation and communication. We take advantage of the fact that the Blue Gene/P node is a symmetric multi-processor with four cache-coherent cores and use multi-threading to optimize the performance on the collective network. We also present a performance evaluation of the DCMF runtime on Blue Gene/P and show that it delivers performance close to hardware limits.

References

  1. Open Fabrics Alliance. http://www.openfabrics.orgGoogle ScholarGoogle Scholar
  2. N. R. Adiga et al. Blue Gene/L torus interconnection network. IBM J. Res. Dev., 49:265--276, (2005) Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Almasi et al. Design and implementation of message-passing services for the Blue Gene/L supercomputer. IBM J. Res. Dev., 49:393--406, (2005) Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Banikazemi, R. Govindaraju, R. Blackmore, and D. K. Panda. MPI-LAPI: An efficient implementation of MPI for IBM RS/6000 SP systems. IEEE Transactions on Parallel and Distributed Systems, 12(10):1081--1093, 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Barton, C. Cascaval, S. Chatterjee, G. Almasi, Y. Zheng, M. Farreras, and J. Amaral. Shared memory programming for large scale machines. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. DCMF. http://dcmf.anl-external.org/wiki, 2008Google ScholarGoogle Scholar
  7. J. Dongarra, E. Strohmaier, H. Simon, and H. Meuer. www.top500.org, 2007. Date retrieved: 10 Jan 2008Google ScholarGoogle Scholar
  8. M. P. I. Forum. MPI-2: Extensions to the message-passing interface, 1997. http://www.mpiforum.org/docs/mpi-20-html/mpi2-report.htmlGoogle ScholarGoogle Scholar
  9. W. Gropp, E. Lusk, N. Doss, and A. Skjellum. Mpich: A high-performance, portable implementation of the mpi message passing interface standard. Parallel Computing, 22(6):789--828, September 1996 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. IBM Blue Gene Team. Overview of the Blue Gene/P project. IBM J. Res. Dev., 52(1/2), January (2008). http://www.research.ibm.com/journal/rd/521/team.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. V. Kale, M. Bhandarkar, N. Jagathesan, e S. Krishnan, and J. Yelon. Converse: An Interoperable Framework for Parallel Programming. In Proceedings of the 10th International Parallel Processing Symposium, pages 212--217, Honolulu, Hawaii, April 1996 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. V. Kale and S. Krishnan. Charm++: Parallel Programming with Message-Driven Objects. In G. V. Wilson and P. Lu, editors, Parallel Programming using C++, pages 175--213. MIT Press, 1996Google ScholarGoogle Scholar
  13. S. Kumar, C. Huang, G. Almasi, and L. V. Kale Achieving strong scaling with NAMD on Blue Gene/L. In Proceedings of IEEE International Parallel and Distributed Processing Symposium 2006, April 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Myrinet Inc. Myrinet Express (MX), A High Performance Low Level Message Passing Interface for Myrinet, January 2006Google ScholarGoogle Scholar
  15. J. Nieplocha and B. Carpenter. ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems. Lecture Notes in Computer Science, 1586, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Petrini, W. chun Feng, A. Hoisie, S. Coll, and E. Frachtenberg. The quadrics network: high-performance clustering technology. IEEE Micro, 22(1):46--57, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Gropp and E. Lusk. MPICH ADI Implementation Reference Manual, August 1995.Google ScholarGoogle Scholar

Index Terms

  1. The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICS '08: Proceedings of the 22nd annual international conference on Supercomputing
        June 2008
        390 pages
        ISBN:9781605581583
        DOI:10.1145/1375527

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 June 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate584of2,055submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader