ABSTRACT
In this paper we give an overview of SCALEA, which is a new performance analysis tool for OpenMP, MPI, HPF, and mixed parallel/distributed programs. SCALEA instruments, executes and measures programs and computes a variety of performance overheads based on a novel overhead classification. Source code and HW-profiling is combined in a single system which significantly extends the scope of possible overheads that can be measured and examined, ranging from HW-counters, such as the number of cache misses or floating point operations, to more complex performance metrics, such as control or loss of parallelism. Moreover, SCALEA uses a new representation of code regions, called the dynamic code region call graph, which enables detailed overhead analysis for arbitrary code regions. An instrumentation description file is used to relate performance information to code regions of the input program and to reduce instrumentation overhead. Several experiments with realistic codes that cover MPI, OpenMP, HPF, and mixed OpenMP/MPI codes demonstrate the usefulness of SCALEA.
- G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In AFIPS Conference, pages 483-485, 1967.]]Google ScholarDigital Library
- M.K. Bane and G.D. Riley. Automatic overheads profilers for openmp codes. In Second European Workshop on OpenMP proceedings (EWOMP 2000), Edinburgh, Scotland, September 2000.]]Google Scholar
- S. Benkner. VFC: The Vienna Fortran Compiler. Scientific Programming, IOS Press, The Netherlands, 7(1):67-81, 1999.]] Google ScholarDigital Library
- P. Blaha, K. Schwarz, and J. Luitz. WIEN97, Full-potential, linearized augmented plane wave package for calculating crystal properties. Institute of Technical Electrochemistry, Vienna University of Technology, Vienna, Austria, ISBN 3-9501031-0-4, 1999.]]Google Scholar
- S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'2000, November 2000.]] Google ScholarDigital Library
- J.M. Bull. A hierarchical classification of overheads in parallel programs. In P. Croll I. Jelly, I. Gorton, editor, Proceedings of Firs IFIP TC10 International Workshop on Software Engineering for Parallel and Distributed Systems, pages 208-219. Chapman Hall, March 1996.]] Google ScholarDigital Library
- Harold W. Cain, Barton P. Miller, and Brian J.N. Wylie. A callgraph-based search strategy for automated performance diagnosis. In Euro-Par 2000 Parallel Processing, pages 108-122, 2000.]] Google ScholarDigital Library
- E. Dockner and H. Moritsch. Pricing Constant Maturity Floaters with Embeeded Options Using Monte Carlo Simulation. Technical Report AuR_99-04, AURORA Technical Reports, University of Vienna, January 1999.]]Google Scholar
- T. Fahringer, B. Scholz, and X. Sun. Execution-Driven Performance Analysis for Distributed and Parallel Systems. In Proc. of the 2nd International ACM Sigmetrics Workshop on Software and Performance (WOSP'2000), Ottawa, Canada, September 2000. ACM Press.]] Google ScholarDigital Library
- Jay Fenlason and Richard Stallman. GNU gprof. Free Software Foundation, Inc., September 1997.]]Google Scholar
- Susan L. Graham, Peter B. Kessler, and Marshall K. McKusick. gprof: A call graph execution profiler. SIGPLAN Notices, 17(6):120-126, June 1982. Proceedings of the ACM SIGPLAN '82 Symposium on Compiler Construction.]] Google ScholarDigital Library
- W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789-828, September 1996.]] Google ScholarDigital Library
- R. Hempel. The MPI standard for message passing. Lecture Notes in Computer Science, 797:247-252, 1994.]] Google ScholarDigital Library
- Hewlett Packard. CXperf User's Guide, June 1998.]]Google Scholar
- High Performance Fortran Forum. High Performance Fortran Language Specification. Technical report, Rice University, Houston, TX, November 1994.]]Google Scholar
- Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis. Introduction to Parallel Computing:design and analysis of parallel algorithms. Benjamin/Cummings, 1994.]] Google ScholarDigital Library
- Allen Malony and Sameer Shende. Performance technology for complex parallel and distributed systems. In In G. Kotsis and P. Kacsuk (Eds.), Third International Austrian/Hungarian Workshop on Distributed and Parallel Systems (DAPSYS 2000), pages 37-46. Kluwer Academic Publishers, Sept. 2000.]] Google ScholarDigital Library
- B. Miller, M. Callaghan, J. Cargille, J. Hollingsworth, R. Irvin, K. Karavanic, K. Kunchithapadam, and T. Newhall. The paradyn parallel performance measurement tool. IEEE Computer, 28(11):37-46, November 1995.]] Google ScholarDigital Library
- Bernd Mohr, Allen Malony, Sameer Shende, and Felix Wolf. Towards a performance tool interface for openmp: An approach based on directive rewriting. In EWOMP'01 Third European Workshop on Open-MPI, Sept. 2001.]]Google Scholar
- W. E. Nagel, A. Arnold, M. Weber, H.-C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPI resources. Supercomputer, 12(1):69-80, January 1996.]]Google Scholar
- Pallas GmbH. Vampirtrace 2.0 Installation and User's Guide, November 1999.]]Google Scholar
- Sameer Shende, Allen Malony, Janice Cuny, Kathleen Lindlan, Peter Beckman, and Steve Karmesin. Portable profiling and tracing for parallel, scientific applications using C++. In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools (SPDT-98), pages 134-147, New York, August 3-4 1998. ACM Press.]] Google ScholarDigital Library
- Gescher system. http://gescher.vcpc.univie.ac.at.]]Google Scholar
- Hong-Linh Truong and Thomas Fahringer. Scalea --- a performance analysis system for distributed and parallel programs. Technical report, Institute for Software Science, University of Vienna, Liechtensteinstr. 22, A-1090 Vienna, Austria, April 2001.]]Google Scholar
- Hong-Linh Truong and Thomas Fahringer. Scalea version 1.0: User's guide. Technical report, Institute for Software Science, University of Vienna, Liechtensteinstr. 22, A-1090 Vienna, Austria, April 2001.]]Google Scholar
- T. Cortes V. Pillet, J. Labarta and S. Girona. Paraver: A tool to visualize and analyze parallel code. In WoTUG-18, pages 17-31, Manchester, April 1995.]]Google Scholar
- OpenMP Website. http://www.openmp.org.]]Google Scholar
Index Terms
- On using SCALEA for performance analysis of distributed and parallel programs
Recommendations
SCALEA: A Performance Analysis Tool for Distributed and Parallel Programs
Euro-Par '02: Proceedings of the 8th International Euro-Par Conference on Parallel ProcessingIn this paper we present SCALEA, which is a performance instrumentation, measurement, analysis, and visualization tool for parallel and distributed programs that supports post-mortem and online performance analysis. SCALEA currently focuses on ...
Modeling and detecting performance problems for distributed and parallel programs with JavaPSL
SC '01: Proceedings of the 2001 ACM/IEEE conference on SupercomputingIn this paper we present JavaPSL, a Performance Specification Language that can be used for a systematic and portable specification of large classes of experiment-related data and performance properties for distributed and parallel programs. Performance ...
Scaling applications to massively parallel machines using Projections performance analysis tool
Some of the most challenging applications to parallelize scalably are the ones that present a relatively small amount of computation per iteration. Multiple interacting performance challenges must be identified and solved to attain high parallel ...
Comments