skip to main content
10.1145/582034.582068acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

On using SCALEA for performance analysis of distributed and parallel programs

Published:10 November 2001Publication History

ABSTRACT

In this paper we give an overview of SCALEA, which is a new performance analysis tool for OpenMP, MPI, HPF, and mixed parallel/distributed programs. SCALEA instruments, executes and measures programs and computes a variety of performance overheads based on a novel overhead classification. Source code and HW-profiling is combined in a single system which significantly extends the scope of possible overheads that can be measured and examined, ranging from HW-counters, such as the number of cache misses or floating point operations, to more complex performance metrics, such as control or loss of parallelism. Moreover, SCALEA uses a new representation of code regions, called the dynamic code region call graph, which enables detailed overhead analysis for arbitrary code regions. An instrumentation description file is used to relate performance information to code regions of the input program and to reduce instrumentation overhead. Several experiments with realistic codes that cover MPI, OpenMP, HPF, and mixed OpenMP/MPI codes demonstrate the usefulness of SCALEA.

References

  1. G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In AFIPS Conference, pages 483-485, 1967.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M.K. Bane and G.D. Riley. Automatic overheads profilers for openmp codes. In Second European Workshop on OpenMP proceedings (EWOMP 2000), Edinburgh, Scotland, September 2000.]]Google ScholarGoogle Scholar
  3. S. Benkner. VFC: The Vienna Fortran Compiler. Scientific Programming, IOS Press, The Netherlands, 7(1):67-81, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Blaha, K. Schwarz, and J. Luitz. WIEN97, Full-potential, linearized augmented plane wave package for calculating crystal properties. Institute of Technical Electrochemistry, Vienna University of Technology, Vienna, Austria, ISBN 3-9501031-0-4, 1999.]]Google ScholarGoogle Scholar
  5. S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'2000, November 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J.M. Bull. A hierarchical classification of overheads in parallel programs. In P. Croll I. Jelly, I. Gorton, editor, Proceedings of Firs IFIP TC10 International Workshop on Software Engineering for Parallel and Distributed Systems, pages 208-219. Chapman Hall, March 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Harold W. Cain, Barton P. Miller, and Brian J.N. Wylie. A callgraph-based search strategy for automated performance diagnosis. In Euro-Par 2000 Parallel Processing, pages 108-122, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Dockner and H. Moritsch. Pricing Constant Maturity Floaters with Embeeded Options Using Monte Carlo Simulation. Technical Report AuR_99-04, AURORA Technical Reports, University of Vienna, January 1999.]]Google ScholarGoogle Scholar
  9. T. Fahringer, B. Scholz, and X. Sun. Execution-Driven Performance Analysis for Distributed and Parallel Systems. In Proc. of the 2nd International ACM Sigmetrics Workshop on Software and Performance (WOSP'2000), Ottawa, Canada, September 2000. ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jay Fenlason and Richard Stallman. GNU gprof. Free Software Foundation, Inc., September 1997.]]Google ScholarGoogle Scholar
  11. Susan L. Graham, Peter B. Kessler, and Marshall K. McKusick. gprof: A call graph execution profiler. SIGPLAN Notices, 17(6):120-126, June 1982. Proceedings of the ACM SIGPLAN '82 Symposium on Compiler Construction.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789-828, September 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Hempel. The MPI standard for message passing. Lecture Notes in Computer Science, 797:247-252, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hewlett Packard. CXperf User's Guide, June 1998.]]Google ScholarGoogle Scholar
  15. High Performance Fortran Forum. High Performance Fortran Language Specification. Technical report, Rice University, Houston, TX, November 1994.]]Google ScholarGoogle Scholar
  16. Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis. Introduction to Parallel Computing:design and analysis of parallel algorithms. Benjamin/Cummings, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Allen Malony and Sameer Shende. Performance technology for complex parallel and distributed systems. In In G. Kotsis and P. Kacsuk (Eds.), Third International Austrian/Hungarian Workshop on Distributed and Parallel Systems (DAPSYS 2000), pages 37-46. Kluwer Academic Publishers, Sept. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Miller, M. Callaghan, J. Cargille, J. Hollingsworth, R. Irvin, K. Karavanic, K. Kunchithapadam, and T. Newhall. The paradyn parallel performance measurement tool. IEEE Computer, 28(11):37-46, November 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bernd Mohr, Allen Malony, Sameer Shende, and Felix Wolf. Towards a performance tool interface for openmp: An approach based on directive rewriting. In EWOMP'01 Third European Workshop on Open-MPI, Sept. 2001.]]Google ScholarGoogle Scholar
  20. W. E. Nagel, A. Arnold, M. Weber, H.-C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPI resources. Supercomputer, 12(1):69-80, January 1996.]]Google ScholarGoogle Scholar
  21. Pallas GmbH. Vampirtrace 2.0 Installation and User's Guide, November 1999.]]Google ScholarGoogle Scholar
  22. Sameer Shende, Allen Malony, Janice Cuny, Kathleen Lindlan, Peter Beckman, and Steve Karmesin. Portable profiling and tracing for parallel, scientific applications using C++. In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools (SPDT-98), pages 134-147, New York, August 3-4 1998. ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gescher system. http://gescher.vcpc.univie.ac.at.]]Google ScholarGoogle Scholar
  24. Hong-Linh Truong and Thomas Fahringer. Scalea --- a performance analysis system for distributed and parallel programs. Technical report, Institute for Software Science, University of Vienna, Liechtensteinstr. 22, A-1090 Vienna, Austria, April 2001.]]Google ScholarGoogle Scholar
  25. Hong-Linh Truong and Thomas Fahringer. Scalea version 1.0: User's guide. Technical report, Institute for Software Science, University of Vienna, Liechtensteinstr. 22, A-1090 Vienna, Austria, April 2001.]]Google ScholarGoogle Scholar
  26. T. Cortes V. Pillet, J. Labarta and S. Girona. Paraver: A tool to visualize and analyze parallel code. In WoTUG-18, pages 17-31, Manchester, April 1995.]]Google ScholarGoogle Scholar
  27. OpenMP Website. http://www.openmp.org.]]Google ScholarGoogle Scholar

Index Terms

  1. On using SCALEA for performance analysis of distributed and parallel programs

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  SC '01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing
                  November 2001
                  756 pages
                  ISBN:158113293X
                  DOI:10.1145/582034

                  Copyright © 2001 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 10 November 2001

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • Article

                  Acceptance Rates

                  SC '01 Paper Acceptance Rate60of240submissions,25%Overall Acceptance Rate1,516of6,373submissions,24%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader