research-article

Recovering logical structure from Charm++ event traces

Authors:
Katherine E. Isaacs

University of California, Davis, CA and Lawrence Livermore National Laboratory, Livermore, CA

University of California, Davis, CA and Lawrence Livermore National Laboratory, Livermore, CA
View Profile

,
Abhinav Bhatele

Lawrence Livermore National Laboratory, Livermore, CA

Lawrence Livermore National Laboratory, Livermore, CA
View Profile

,
Jonathan Lifflander

University of Illinois at Urbana-Champaign, Urbana, IL

University of Illinois at Urbana-Champaign, Urbana, IL
View Profile

,
David Böhme

Lawrence Livermore National Laboratory, Livermore, CA

Lawrence Livermore National Laboratory, Livermore, CA
View Profile

,
Todd Gamblin

Lawrence Livermore National Laboratory, Livermore, CA

Lawrence Livermore National Laboratory, Livermore, CA
View Profile

,
Martin Schulz

Lawrence Livermore National Laboratory, Livermore, CA

Lawrence Livermore National Laboratory, Livermore, CA
View Profile

,
Bernd Hamann

University of California, Davis, CA

University of California, Davis, CA
View Profile

,
Peer-Timo Bremer

Lawrence Livermore National Laboratory, Livermore, CA

Lawrence Livermore National Laboratory, Livermore, CA
View Profile

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2015Article No.: 49Pages 1–12https://doi.org/10.1145/2807591.2807634

Published:15 November 2015Publication History

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 1–12

ABSTRACT

Asynchrony and non-determinism in Charm++ programs present a significant challenge in analyzing their event traces. We present a new framework to organize event traces of parallel programs written in Charm++. Our reorganization allows one to more easily explore and analyze such traces by providing context through logical structure. We describe several heuristics to compensate for missing dependencies between events that currently cannot be easily recorded. We introduce a new task ordering that recovers logical structure from the non-deterministic execution order. Using the logical structure, we define several metrics to help guide developers to performance problems. We demonstrate our approach through two proxy applications written in Charm++. Finally, we discuss the applicability of this framework to other task-based runtimes and provide guidelines for tracing to support this form of analysis.

References

Hydrodynamics Challenge Problem, Lawrence Livermore National Laboratory. Technical Report LLNL-TR-490254.Google Scholar
Open Community Runtime. Intel Open Source, 01.org/projects/open-community-runtime, 2012.Google Scholar
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, and R. A. Fatoohi. The NAS parallel benchmarks. Int'l J. of Supercomputer Applications, 5(3):63--73, 1991. Google ScholarDigital Library
M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: Expressing locality and independence with logical regions. In Proc. ACM/IEEE Conf. on Supercomputing, SC '12, pages 66:1--66:11, 2012. Google ScholarDigital Library
D. Becker, R. Rabenseifner, and F. Wolf. Timestamp synchronization for event traces of large-scale message-passing applications. In Proc. European Conf. on Recent Advances in PVM and MPI, PVM/MPI'07, pages 315--325. Springer-Verlag, 2007. Google ScholarDigital Library
W. Blochinger, M. Kaufmann, and M. Siebenhaller. Visualization aided performance tuning of irregular task-parallel computations. Information Visualization, 5(2):81--94, 2006. Google ScholarDigital Library
J. Bueno, L. Martinell, A. Duran, M. Farreras, X. Martorell, R. M. Badia, E. Ayguade, and J. Labarta. Productive Cluster Programming with OmpSs. In Euro-Par 2011 Parallel Processing, volume 6852 of Euro-Par'11, pages 555--566. Springer-Verlag, 2011. Google ScholarDigital Library
J. C. de Kergommeaux, B. de Oliveira Stein, and B. P. E. Paje, an interactive visualization tool for tuning multi-threaded parallel applications. Parallel Comput., 26(10):1253--1274, Sept. 2000. Google ScholarDigital Library
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proc. ACM SIGPLAN 1998 Conf. on Prog. Lang. Design and Implementation, PLDI '98, pages 212--223, 1998. Google ScholarDigital Library
E. R. Gansner and S. C. North. An open graph visualization system and its applications to software engineering. Software : Pract. Exper., 30(11):1203--1233, 2000. Google ScholarDigital Library
M. Geimer, F. Wolf, B. J. N. Wylie, E. Ábrahám, D. Becker, and B. Mohr. The Scalasca performance toolset architecture. Concurr. Comput.: Pract. Exper., 22(6):702--719, Apr. 2010. Google ScholarDigital Library
K. E. Isaacs, P.-T. Bremer, I. Jusufi, T. Gamblin, A. Bhatele, M. Schulz, and B. Hamann. Combing the communication hairball: Visualizing large-scale parallel execution traces using logical time. IEEE Trans. on Vis. and Comp. Graphics, (InfoVis '14), 20(12):2349--2358, 2014.Google Scholar
K. E. Isaacs, T. Gamblin, A. Bhatele, M. Schulz, B. Hamann, and P.-T. Bremer. Ordering traces logically to identify lateness in message passing programs. IEEE Trans. on Parallel and Distrib. Systems, to appear.Google Scholar
L. Kalé and S. Krishnan. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In A. Paepcke, editor, Proceedings of OOPSLA'93, pages 91--108, Sept. 1993. Google ScholarDigital Library
L. V. Kale and A. Bhatele, editors. Parallel Science and Engineering Applications: The Charm++ Approach. CRC Press, Oct. 2013. Google ScholarDigital Library
L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar. Scaling applications to massively parallel machines using Projections performance analysis tool. In Future Generation Comp. Systems Special Issue on: Large-Scale System Perf. Modeling and Analysis, volume 22, pages 347--358, Feb. 2006. Google ScholarDigital Library
A. Knüpfer, C. Rössel, D. Mey, S. Biersdorff, K. Diethelm, D. Eschweiler, M. Geimer, M. Gerndt, D. Lorenz, A. Malony, W. Nagel, Y. Oleynik, P. Philippen, P. Saviankou, D. Schmidl, S. Shende, R. TschÃijter, M. Wagner, B. Wesarg, and F. Wolf. Score-P: A joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In H. Brunst, M. S. Müller, W. E. Nagel, and M. M. Resch, editors, Tools for High Performance Computing 2011, pages 79--91. Springer Berlin Heidelberg, 2011.Google Scholar
A. G. Landge, V. Pascucci, A. Gyulassy, J. C. Bennett, H. Kolla, J. Chen, and P.-T. Bremer. In-situ feature extraction of large scale combustion simulations using segmented merge trees. Proc. ACM/IEEE Conf. on Supercomputing, SC'14. Nov. 2014. Google ScholarDigital Library
T. J. LeBlanc, J. M. Mellor-Crummey, and R. J. Fowler. Analyzing parallel program executions using multiple views. J. Parallel Distrib. Comput., 9(2):203--217, June 1990. Google ScholarDigital Library
C. W. Lee. Techniques in Scalable and Effective Parallel Performance Analysis. PhD thesis, Dept. of Computer Science, University of Illinois, Urbana-Champaign, Dec. 2009. Google ScholarDigital Library
C. W. Lee, C. Mendes, and L. V. Kalé. Towards Scalable Performance Analysis and Visualization through Data Reduction. In Int'l Workshop on High-Level Parallel Prog. Models and Supportive Environments, Apr. 2008.Google Scholar
B. McCandless. Lassen. codesign.llnl.gov/lassen.php, 2013.Google Scholar
W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPI resources. Supercomputer, 12(1):69--80, 1996.Google Scholar
V. Pillet, J. Labarta, T. Cortes, and S. Girona. Paraver: A tool to visualize and analyze parallel code. Technical report UPC-CEPBA 95-3, 1995.Google Scholar
R. Rabenseifner. The controlled logical clock - a global time for trace based software monitoring of parallel applications in workstation clusters. In In Proc. EUROMICRO Workshop on Parallel and Distrib. Processing, PDP, pages 477--484, 1997.Google Scholar
C. Schaubschläger, D. Kranzlmüller, and J. Volkert. Event-based program analysis with DeWiz. In Proc. Int'l Workshop on Automated Debugging AADEBUG2003, 2003.Google Scholar
K. B. Wheeler and D. Thain. Visualizing massively multithreaded applications with threadscope. Concurr. Comput. : Pract. Exper., 22(1):45--67, Jan. 2010. Google ScholarDigital Library
O. Zaki, E. Lusk, W. Gropp, and D. Swider. Toward scalable performance visualization with Jumpshot. HPC Applications, 13(2):277--288, Fall 1999. Google ScholarDigital Library

Index Terms

Recovering logical structure from Charm++ event traces

Recommendations

Stratified sampling of execution traces: Execution phases serving as strata

The understanding of the behavioral aspects of a software system is an important enabler for many reverse engineering activities. The behavior of software is typically represented in the form of execution traces. Traces, however, can be overwhelmingly ...
Read More
Using Model-Based Traces as Runtime Models

Software engineers typically use code-level tracing to capture a running system's behavior. An alternative is to generate and analyze model-based traces, which contain rich semantic information about the system's runs at the abstraction level that its ...
Read More
Extracting logical structure and identifying stragglers in parallel execution traces
PPoPP '14

We introduce a new approach to automatically extract an idealized logical structure from a parallel execution trace. We use this structure to define intuitive metrics such as the lateness of a process involved in a parallel execution. By analyzing and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2015
985 pages
ISBN:9781450337236
DOI:10.1145/2807591
General Chair:
Jackie Kern
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Jeffrey S. Vetter
Oak Ridge National Laboratory and Georgia Institute of Technology, Oak Ridge, Tennessee
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 November 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
asynchrony
performance
task-based models
trace analysis
Qualifiers
- research-article
Conference

Acceptance Rates
SC '15 Paper Acceptance Rate79of358submissions,22%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 165
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Recovering logical structure from Charm++ event traces

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stratified sampling of execution traces: Execution phases serving as strata

Using Model-Based Traces as Runtime Models

Extracting logical structure and identifying stragglers in parallel execution traces

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Recovering logical structure from Charm++ event traces

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stratified sampling of execution traces: Execution phases serving as strata

Using Model-Based Traces as Runtime Models

Extracting logical structure and identifying stragglers in parallel execution traces

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media