research-article

Public Access

Why is MPI so slow?: analyzing the fundamental limits in implementing MPI-3.1

Authors:
Ken Raffenetti

Argonne National Laboratory

Argonne National Laboratory
View Profile

,
Abdelhalim Amer

Argonne National Laboratory

Argonne National Laboratory
View Profile

,
Lena Oden

Argonne National Laboratory

Argonne National Laboratory
View Profile

,
Charles Archer

Intel Corporation

Intel Corporation
View Profile

,
Wesley Bland

Intel Corporation

Intel Corporation
View Profile

,
Hajime Fujita

Intel Corporation

Intel Corporation
View Profile

,
Yanfei Guo

Argonne National Laboratory

Argonne National Laboratory
View Profile

,
Tomislav Janjusic

Mellanox Technologies

Mellanox Technologies
View Profile

,
Dmitry Durnov

Intel Corporation

Intel Corporation
View Profile

,
Michael Blocksome

Intel Corporation

Intel Corporation
View Profile

,
Min Si

Argonne National Laboratory

Argonne National Laboratory
View Profile

,
Sangmin Seo

Argonne National Laboratory

Argonne National Laboratory
View Profile

,
Akhil Langer

Intel Corporation

Intel Corporation
View Profile

,
Gengbin Zheng

Intel Corporation

Intel Corporation
View Profile

,
Masamichi Takagi

RIKEN Advanced Institute of Computational Science

RIKEN Advanced Institute of Computational Science
View Profile

,
Paul Coffman

Argonne National Laboratory

Argonne National Laboratory
View Profile

,
Jithin Jose

Intel Corporation

Intel Corporation
View Profile

,
Sayantan Sur

Intel Corporation

Intel Corporation
View Profile

,
Alexander Sannikov

Intel Corporation

Intel Corporation
View Profile

,
Sergey Oblomov

Intel Corporation

Intel Corporation
View Profile

,
Michael Chuvelev

Intel Corporation

Intel Corporation
View Profile

,
Masayuki Hatanaka

RIKEN Advanced Institute of Computational Science

RIKEN Advanced Institute of Computational Science
View Profile

,
Xin Zhao

Mellanox Technologies

Mellanox Technologies
View Profile

,
Paul Fischer

University of Illinois

University of Illinois
View Profile

,
Thilina Rathnayake

University of Illinois

University of Illinois
View Profile

,
Matt Otten

Cornell University

Cornell University
View Profile

,
Misun Min

Argonne National Laboratory

Argonne National Laboratory
View Profile

,
Pavan Balaji

Argonne National Laboratory

Argonne National Laboratory
View Profile

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2017Article No.: 62Pages 1–12https://doi.org/10.1145/3126908.3126963

Published:12 November 2017Publication History

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 1–12

ABSTRACT

This paper provides an in-depth analysis of the software overheads in the MPI performance-critical path and exposes mandatory performance overheads that are unavoidable based on the MPI-3.1 specification. We first present a highly optimized implementation of the MPI-3.1 standard in which the communication stack---all the way from the application to the low-level network communication API---takes only a few tens of instructions. We carefully study these instructions and analyze the root cause of the overheads based on specific requirements from the MPI standard that are unavoidable under the current MPI standard. We recommend potential changes to the MPI standard that can minimize these overheads. Our experimental results on a variety of network architectures and applications demonstrate significant benefits from our proposed changes.

References

2017. Center for Exascale Simulation of Advanced Reactors. https://cesar.mcs.anl.gov. (2017).Google Scholar
2017. Center for Exascale Simulation of Combustion in Turbulence. https://science.energy.gov/ascr/research/scidac/co-design/. (2017).Google Scholar
2017,. CORAL Benchmarks. https://asc.llnl.gov/CORAL-benchmarks. (2017,).Google Scholar
2017. Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH). https://codesign.llnl.gov/lulesh.php. (2017).Google Scholar
2017. Monte Carlo Benchmark (MCB). https://codesign.llnl.gov/mcb.php. (2017).Google Scholar
2017. NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html. (2017).Google Scholar
2017. Nekbone. https://cesar.mcs.anl.gov/content/software/thermal_hydraulics. (2017).Google Scholar
2017. QMCPack. http://qmcpack.org. (2017).Google Scholar
2017. The Local-Self-Consistent Mutliple-Scattering (LSMSL) Code. https://www.ccs.ornl.gov/mri/repository/LSMS/index.html. (2017).Google Scholar
Adnan Agbaria, Dong-In Kang, and Karandeep Singh. 2006. LMPI: MPI for heterogeneous embedded distributed systems. In Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on, Vol. 1. IEEE, 8--pp. Google ScholarDigital Library
Abdelhalim Amer, Pavan Balaji, Wesley Bland, William Gropp, Rob Latham, Huiwei Lu, Lena Oden, Antonio Pena, Ken Raffenetti, Sangmin Seo, et al. 2015. MPICH User's Guide. (2015).Google Scholar
Pavan Balaji, Darius Buntinas, D. Goodell, W. D. Gropp, and Rajeev Thakur. 2010. Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming. International Journal of High Performance Computing Applications (IJHPCA) 24 (2010), 49--57. Google ScholarDigital Library
Brian W Barrett, Ron Brightwell, Ryan Grant, Simon D Hammond, and K Scott Hemmert. 2014. An evaluation of MPI message rate on hybrid-core processors. (2014).Google Scholar
Surendra Byna, Xian-He Sun, Rajeev Thakur, and William Gropp. 2006. Automatic memory optimizations for improving MPI derived datatype performance. In European Parallel Virtual Machine/Message Passing Interface UsersâĂ&Zacute; Group Meeting. Springer, 238--246. Google ScholarDigital Library
James Dinan, Pavan Balaji, Dave Goodell, Doug Miller, Marc Snir, and Rajeev Thakur. 2013. Enabling MPI Interoperability through Flexible Communication Endpoints. In Proceedings of the 17th European MPI Users' Group Meeting Conference on Recent Advances in the Message Passing Interface (EuroMPI'13). Madrid, Spain, 13--18. Google ScholarDigital Library
P. Fischer, K. Heisey, and M. Min. 2015. Scaling Limits for PDE-Based Simulation (Invited). In 22nd AIAA Computational Fluid Dynamics Conference, AIAA Aviation. AIAA 2015--3049.Google Scholar
P. Fischer, J. Lottes, and S. Kerkemeier. 2008. Nek5000: Open source spectral element CFD solver. http://nek5000.mcs.anl.gov and https://github.com/Nek5000/nek5000. (2008).Google Scholar
P. F. Fischer and A. T. Patera. 1991. Parallel Spectral Element Solution of the Stokes Problem. J. Comput. Phys. 92 (1991), 380--421. Google ScholarDigital Library
Mario Flajslik, James Dinan, and Keith D Underwood. 2016. Mitigating MPI message matching misery. In International Conference on High Performance Computing. Springer, 281--299.Google ScholarCross Ref
William Gropp, Torsten Hoefler, Rajeev Thakur, and Ewing Lusk. 2004. Using Advanced MPI: Modern Features of the Message-Passing Interface. MIT Press. Google ScholarDigital Library
William Gropp, Ewing Lusk, and Rajeev Thakur. 1999. Using MPI-2: Advanced Features of the Message-Passing Interface. MIT Press. Google ScholarDigital Library
Yanfei Guo, Charles Archer, Michael Blocksome, Scott Parker, Wesley Bland, Kenneth J. Raffenetti, and Pavan Balaji. 2017. Memory Compression Techniques for Network Address Management in MPI. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). Orlando, Florida.Google Scholar
Salman Habib, Vitali Morozov, Nicholas Frontiere, Hal Finkel, Adrian Pope, and Katrin Heitmann. 2013. HACC: Extreme Scaling and Performance Across Diverse Architectures. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13). ACM, New York, NY, USA, Article 6, 10 pages. Google ScholarDigital Library
Michael A Heroux, Douglas W Doerler, Paul S Crozier, James M Willenbring, H Carter Edwards, Alan Williams, Mahesh Rajan, Eric R Keiter, Heidi K Thornquist, and Robert W Numrich. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories.Google Scholar
Torsten Hoefler, James Dinan, Rajeev Thakur, Brian Barrett, Pavan Balaji, William Gropp, and Keith Underwood. 2015. Remote Memory Access Programming in MPI-3. TOPC'15 (2015). Google ScholarDigital Library
MPI Forum. 2015. MPI: A Message Passing Interface Standard. (2015). http://www.mpi-forum.org/docs/docs.html.Google Scholar
M. Otten, J. Gong, A. Mametjanov, A. Vose, J. Levesque, P. Fischer, and M. Min. 2016. An MPI/OpenACC Implementation of a High Order Electromagnetics Solver with GPUDirect Communication. Int. J. High Perf. Comput. Appl. (2016).Google Scholar
Mohammad J Rashti and Ahmad Afsahi. 2008. Improving communication progress and overlap in mpi rendezvous protocol over rdma-enabled interconnects. In High Performance Computing Systems and Applications, 2008. HPCS 2008. 22nd International Symposium on. IEEE, 95--101. Google ScholarDigital Library
Xian-He Sun et al. 2003. Improving the performance of MPI derived datatypes by optimizing memory-access cost. In Cluster Computing, 2003. Proceedings. 2003 IEEE International Conference on. IEEE, 412--419.Google Scholar
Rajeev Thakur and William D Gropp. 2003. Improving the performance of collective operations in MPICH. In European Parallel Virtual Machine/Message Passing Interface UsersâĂ&Zacute; Group Meeting. Springer, 257--267.Google Scholar
H. M. Tufo and P. F. Fischer. 1999. Terascale Spectral Element Algorithms and Implementations. In Proc. of the ACM/IEEE SC99 Conf. on High Performance Networking and Computing, Gordon Bell Prize. IEEE Computer Soc., CDROM. Google ScholarDigital Library
Isaías A Comprés Ureña, Michael Riepen, and Michael Konow. 2011. RCKMPI-lightweight MPI implementation for IntelâĂ&Zacute;s Single-chip Cloud Computer (SCC). In European MPI Users' Group Meeting. Springer, 208--217. Google ScholarDigital Library
M. Valiev, E. J. Bylaska, N. Govind, K. Kowalski, T. P. Straatsma, H. J. J. Van Dam, D. Wang, J. Nieplocha, E. Apra, T. L. Windus, and W. A. de Jong. 2010. NWChem: A Comprehensive and Scalable Open-Source Solution for Large Scale Molecular Simulations. Computer Physics Communications 181, 9 (2010), 1477--1489.Google ScholarCross Ref

Index Terms

Why is MPI so slow?: analyzing the fundamental limits in implementing MPI-3.1
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent algorithms
  2. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms

Recommendations

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory

Hybrid parallel programming with the message passing interface (MPI) for internode communication in conjunction with a shared-memory programming model to manage intranode parallelism has become a dominant approach to scalable parallel programming. While ...
Read More
MPI: past, present and future
PVM/MPI'07: Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

This talk will trace the origins of MPI from the early message-passing, distributed memory, parallel computers in the 1980's, to today's parallel supercomputers. In these early days, parallel computing companies implemented proprietary message-passing ...
Read More
An Introduction to the MPI Standard
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2017
801 pages
ISBN:9781450351140
DOI:10.1145/3126908
General Chair:
Bernd Mohr
Jülich Supercomputing Center, Jülich, Germany
,
Program Chair:
Padma Raghavan
Vanderbilt University, Nashville, TN
Copyright © 2017 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
SC '17 Paper Acceptance Rate61of327submissions,19%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 1,178
  Total Downloads
- Downloads (Last 12 months)256
- Downloads (Last 6 weeks)51
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Why is MPI so slow?: analyzing the fundamental limits in implementing MPI-3.1

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory

MPI: past, present and future

An Introduction to the MPI Standard

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Why is MPI so slow?: analyzing the fundamental limits in implementing MPI-3.1

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory

MPI: past, present and future

An Introduction to the MPI Standard

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media