skip to main content
10.1145/3127024.3127036acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article
Public Access

MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU

Published:25 September 2017Publication History

ABSTRACT

MPI implementations are becoming increasingly complex and highly tunable, and thus scalability limitations can come from numerous sources. The MPI Tools Interface (MPI_T) introduced as part of the MPI 3.0 standard provides an opportunity for performance tools and external software to introspect and understand MPI runtime behavior at a deeper level to detect scalability issues. The interface also provides a mechanism to re-configure the MPI library dynamically at runtime to fine-tune performance. In this paper, we propose an infrastructure that extends existing components - TAU, MVAPICH2 and BEACON to take advantage of the MPI_T interface to offer runtime introspection, online monitoring, recommendation generation and autotuning capabilities. We validate our design by developing optimizations for a combination of production and synthetic applications. We use our infrastructure to implement an autotuning policy for AmberMD[1] that monitors and reduces MVAPICH2 library internal memory footprint by 20% without affecting performance. For applications where collective communication is latency sensitive such as MiniAMR[2], our infrastructure is able to generate recommendations to enable hardware offloading of collectives supported by MVAPICH2. By implementing this recommendation, we see a 5% improvement in application runtime.

References

  1. David A Case, Thomas E Cheatham, Tom Darden, Holger Gohlke, Ray Luo, Kenneth M Merz, Alexey Onufriev, Carlos Simmerling, Bing Wang, and Robert J Woods. The Amber biomolecular simulation programs. Journal of computational chemistry, 26(16):1668--1688, 2005. http://ambermd.org/.Google ScholarGoogle Scholar
  2. Michael A Heroux, Douglas W Doerfler, Paul S Crozier, James M Willenbring, H Carter Edwards, Alan Williams, Mahesh Rajan, Eric R Keiter, Heidi K Thornquist, and Robert W Numrich. Improving performance via mini-applications. Sandia National Laboratories, Tech. Rep. SAND2009-5574, 3, 2009. https://mantevo.org/.Google ScholarGoogle Scholar
  3. MPI Forum. MPI: A Message-Passing Interface Standard. Version 3.1, June 4th 2015. http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf (June. 2015).Google ScholarGoogle Scholar
  4. Sameer S. Shende and Allen D. Malony. The Tau Parallel Performance System. Int. J. High Perform. Comput. Appl., 20(2):287--311, May 2006. http://tau.uoregon.edu. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jiuxing Liu, Jiesheng Wu, Sushmitha P Kini, Pete Wyckoff, and Dhabaleswar K Panda. High performance RDMA-based MPI implementation over InfiniBand. In Proceedings of the 17th annual international conference on Supercomputing, pages 295--304. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Edgar Gabriel, Graham E Fagg, George Bosilca, Thara Angskun, Jack J Dongarra, Jeffrey M Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, et al. Open MPI: Goals, concept, and design of a next generation MPI implementation. In European Parallel Virtual Machine/Message Passing Interface Users' Group Meeting, pages 97--104. Springer, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  7. William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel computing, 22(6):789--828, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Marc Pérache, Hervé Jourdren, and Raymond Namyst. MPC: A Unified Parallel Runtime for Clusters of NUMA Machines. In Proceedings of the 14th International Euro-Par Conference on Parallel Processing, Euro-Par '08, page 78--88, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Rainer Keller, George Bosilca, Graham Fagg, Michael Resch, and Jack J. Dongarra. Implementation and Usage of the PERUSE-Interface in Open MPI. In Proceedings, 13th European PVM/MPI Users' Group Meeting, Lecture Notes in Computer Science, Bonn, Germany, September 2006. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tanzima Islam, Kathryn Mohror, and Martin Schulz. Exploring the Capabilities of the New MPI_T Interface. In Proceedings of the 21st European MPI Users' Group Meeting, EuroMPI/ASIA '14, pages 91:91--91:96, New York, NY, USA, 2014. ACM. https://computation.llnl.gov/projects/mpi_t/gyan. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Esthela Gallardo, Jerome Vienne, Leonardo Fialho, Patricia Teller, and James Browne. MPI Advisor: A Minimal Overhead Tool for MPI Library Performance Tuning. In Proceedings of the 22Nd European MPI Users' Group Meeting, EuroMPI '15, pages 6:1--6:10, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Esthela Gallardo, Jérôme Vienne, Leonardo Fialho, Patricia Teller, and James Browne. Employing MPI_T in MPI Advisor to optimize application performance. The International Journal of High Performance Computing Applications, 0(0):1094342016684005, 0.Google ScholarGoogle Scholar
  13. Jeffrey Vetter and Chris Chambreau. mpiP: Lightweight, scalable mpi profiling. 2005. http://mpip.sourceforge.net.Google ScholarGoogle Scholar
  14. Mohamad Chaarawi, Jeffrey M. Squyres, Edgar Gabriel, and Saber Feki. A Tool for Optimizing Runtime Parameters of Open MPI, pages 210--217. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008. https://www.open-mpi.org/projects/otpo/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Gerndt and M. Ott. Automatic Performance Analysis with Periscope. Concurr. Comput.: Pract. Exper., 22(6):736--748, April 2010. http://periscope.in.tum.de/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Anna Sikora, Eduardo César, Isaías Comprés, and Michael Gerndt. Autotuning of MPI Applications Using PTF. In Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications, SEM4HPC '16, pages 31--38, New York, NY, USA, 2016. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Simone Pellegrini, Thomas Fahringer, Herbert Jordan, and Hans Moritsch. Automatic Tuning of MPI Runtime Parameter Settings by Using Machine Learning. In Proceedings of the 7th ACM International Conference on Computing Frontiers, CF '10, pages 115--116, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kevin Huck, Sameer Shende, Allen Malony, Hartmut Kaiser, Allan Porterfield, Rob Fowler, and Ron Brightwell. An Early Prototype of an Autonomic Performance Environment for Exascale. In Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS '13, pages 8:1--8:8, New York, NY, USA, 2013. ACM. http://khuck.github.io/xpress-apex/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Swann Perarnau, Rinku Gupta, Pete Beckman, et. al. Argo: An Exascale Operating System and Runtime, 2015. http://sc15.supercomputing.org/sites/all/themes/SC15images/tech_poster/poster_files/post298s2-file2.pdf.Google ScholarGoogle Scholar
  20. Swann Perarnau, Rajeev Thakur, Kamil Iskra, Ken Raffenetti, Franck Cappello, Rinku Gupta, Pete Beckman, Marc Snir, Henry Hoffmann, Martin Schulz, and Barry Rountree. Distributed Monitoring and Management of Exascale Systems in the Argo Project. In Proceedings of the 15th IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems - Volume 9038, pages 173--178, NewYork, NY, USA, 2015. Springer-Verlag NewYork, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. TACC Stampede cluster. The University of Texas at Austin: http://www.tacc.utexas.edu.Google ScholarGoogle Scholar
  22. Richard L Graham, Devendar Bureddy, Pak Lui, Hal Rosenstock, Gilad Shainer, Gil Bloch, Dror Goldenerg, Mike Dubman, Sasha Kotchubievsky, Vladimir Koushnir, et al. Scalable hierarchical aggregation protocol (SHArP): a hardware architecture for efficient data reduction. In Proceedings of the First Workshop on Optimization of Communication in HPC, pages 1--10. IEEE Press, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Andreas Knüpfer, Holger Brunst, Jens Doleschal, Matthias Jurenz, Matthias Lieber, Holger Mickler, Matthias S Müller, and Wolfgang E Nagel. The vampir performance analysis tool-set. In Tools for High Performance Computing, pages 139--155. Springer, 2008. www.vampir.eu.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      EuroMPI '17: Proceedings of the 24th European MPI Users' Group Meeting
      September 2017
      169 pages
      ISBN:9781450348492
      DOI:10.1145/3127024

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 September 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      EuroMPI '17 Paper Acceptance Rate17of37submissions,46%Overall Acceptance Rate66of139submissions,47%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader