ABSTRACT
Asynchrony and non-determinism in Charm++ programs present a significant challenge in analyzing their event traces. We present a new framework to organize event traces of parallel programs written in Charm++. Our reorganization allows one to more easily explore and analyze such traces by providing context through logical structure. We describe several heuristics to compensate for missing dependencies between events that currently cannot be easily recorded. We introduce a new task ordering that recovers logical structure from the non-deterministic execution order. Using the logical structure, we define several metrics to help guide developers to performance problems. We demonstrate our approach through two proxy applications written in Charm++. Finally, we discuss the applicability of this framework to other task-based runtimes and provide guidelines for tracing to support this form of analysis.
- Hydrodynamics Challenge Problem, Lawrence Livermore National Laboratory. Technical Report LLNL-TR-490254.Google Scholar
- Open Community Runtime. Intel Open Source, 01.org/projects/open-community-runtime, 2012.Google Scholar
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, and R. A. Fatoohi. The NAS parallel benchmarks. Int'l J. of Supercomputer Applications, 5(3):63--73, 1991. Google ScholarDigital Library
- M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: Expressing locality and independence with logical regions. In Proc. ACM/IEEE Conf. on Supercomputing, SC '12, pages 66:1--66:11, 2012. Google ScholarDigital Library
- D. Becker, R. Rabenseifner, and F. Wolf. Timestamp synchronization for event traces of large-scale message-passing applications. In Proc. European Conf. on Recent Advances in PVM and MPI, PVM/MPI'07, pages 315--325. Springer-Verlag, 2007. Google ScholarDigital Library
- W. Blochinger, M. Kaufmann, and M. Siebenhaller. Visualization aided performance tuning of irregular task-parallel computations. Information Visualization, 5(2):81--94, 2006. Google ScholarDigital Library
- J. Bueno, L. Martinell, A. Duran, M. Farreras, X. Martorell, R. M. Badia, E. Ayguade, and J. Labarta. Productive Cluster Programming with OmpSs. In Euro-Par 2011 Parallel Processing, volume 6852 of Euro-Par'11, pages 555--566. Springer-Verlag, 2011. Google ScholarDigital Library
- J. C. de Kergommeaux, B. de Oliveira Stein, and B. P. E. Paje, an interactive visualization tool for tuning multi-threaded parallel applications. Parallel Comput., 26(10):1253--1274, Sept. 2000. Google ScholarDigital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proc. ACM SIGPLAN 1998 Conf. on Prog. Lang. Design and Implementation, PLDI '98, pages 212--223, 1998. Google ScholarDigital Library
- E. R. Gansner and S. C. North. An open graph visualization system and its applications to software engineering. Software : Pract. Exper., 30(11):1203--1233, 2000. Google ScholarDigital Library
- M. Geimer, F. Wolf, B. J. N. Wylie, E. Ábrahám, D. Becker, and B. Mohr. The Scalasca performance toolset architecture. Concurr. Comput.: Pract. Exper., 22(6):702--719, Apr. 2010. Google ScholarDigital Library
- K. E. Isaacs, P.-T. Bremer, I. Jusufi, T. Gamblin, A. Bhatele, M. Schulz, and B. Hamann. Combing the communication hairball: Visualizing large-scale parallel execution traces using logical time. IEEE Trans. on Vis. and Comp. Graphics, (InfoVis '14), 20(12):2349--2358, 2014.Google Scholar
- K. E. Isaacs, T. Gamblin, A. Bhatele, M. Schulz, B. Hamann, and P.-T. Bremer. Ordering traces logically to identify lateness in message passing programs. IEEE Trans. on Parallel and Distrib. Systems, to appear.Google Scholar
- L. Kalé and S. Krishnan. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In A. Paepcke, editor, Proceedings of OOPSLA'93, pages 91--108, Sept. 1993. Google ScholarDigital Library
- L. V. Kale and A. Bhatele, editors. Parallel Science and Engineering Applications: The Charm++ Approach. CRC Press, Oct. 2013. Google ScholarDigital Library
- L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar. Scaling applications to massively parallel machines using Projections performance analysis tool. In Future Generation Comp. Systems Special Issue on: Large-Scale System Perf. Modeling and Analysis, volume 22, pages 347--358, Feb. 2006. Google ScholarDigital Library
- A. Knüpfer, C. Rössel, D. Mey, S. Biersdorff, K. Diethelm, D. Eschweiler, M. Geimer, M. Gerndt, D. Lorenz, A. Malony, W. Nagel, Y. Oleynik, P. Philippen, P. Saviankou, D. Schmidl, S. Shende, R. TschÃijter, M. Wagner, B. Wesarg, and F. Wolf. Score-P: A joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In H. Brunst, M. S. Müller, W. E. Nagel, and M. M. Resch, editors, Tools for High Performance Computing 2011, pages 79--91. Springer Berlin Heidelberg, 2011.Google Scholar
- A. G. Landge, V. Pascucci, A. Gyulassy, J. C. Bennett, H. Kolla, J. Chen, and P.-T. Bremer. In-situ feature extraction of large scale combustion simulations using segmented merge trees. Proc. ACM/IEEE Conf. on Supercomputing, SC'14. Nov. 2014. Google ScholarDigital Library
- T. J. LeBlanc, J. M. Mellor-Crummey, and R. J. Fowler. Analyzing parallel program executions using multiple views. J. Parallel Distrib. Comput., 9(2):203--217, June 1990. Google ScholarDigital Library
- C. W. Lee. Techniques in Scalable and Effective Parallel Performance Analysis. PhD thesis, Dept. of Computer Science, University of Illinois, Urbana-Champaign, Dec. 2009. Google ScholarDigital Library
- C. W. Lee, C. Mendes, and L. V. Kalé. Towards Scalable Performance Analysis and Visualization through Data Reduction. In Int'l Workshop on High-Level Parallel Prog. Models and Supportive Environments, Apr. 2008.Google Scholar
- B. McCandless. Lassen. codesign.llnl.gov/lassen.php, 2013.Google Scholar
- W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPI resources. Supercomputer, 12(1):69--80, 1996.Google Scholar
- V. Pillet, J. Labarta, T. Cortes, and S. Girona. Paraver: A tool to visualize and analyze parallel code. Technical report UPC-CEPBA 95-3, 1995.Google Scholar
- R. Rabenseifner. The controlled logical clock - a global time for trace based software monitoring of parallel applications in workstation clusters. In In Proc. EUROMICRO Workshop on Parallel and Distrib. Processing, PDP, pages 477--484, 1997.Google Scholar
- C. Schaubschläger, D. Kranzlmüller, and J. Volkert. Event-based program analysis with DeWiz. In Proc. Int'l Workshop on Automated Debugging AADEBUG2003, 2003.Google Scholar
- K. B. Wheeler and D. Thain. Visualizing massively multithreaded applications with threadscope. Concurr. Comput. : Pract. Exper., 22(1):45--67, Jan. 2010. Google ScholarDigital Library
- O. Zaki, E. Lusk, W. Gropp, and D. Swider. Toward scalable performance visualization with Jumpshot. HPC Applications, 13(2):277--288, Fall 1999. Google ScholarDigital Library
Index Terms
- Recovering logical structure from Charm++ event traces
Recommendations
Stratified sampling of execution traces: Execution phases serving as strata
The understanding of the behavioral aspects of a software system is an important enabler for many reverse engineering activities. The behavior of software is typically represented in the form of execution traces. Traces, however, can be overwhelmingly ...
Using Model-Based Traces as Runtime Models
Software engineers typically use code-level tracing to capture a running system's behavior. An alternative is to generate and analyze model-based traces, which contain rich semantic information about the system's runs at the abstraction level that its ...
Extracting logical structure and identifying stragglers in parallel execution traces
PPoPP '14We introduce a new approach to automatically extract an idealized logical structure from a parallel execution trace. We use this structure to define intuitive metrics such as the lateness of a process involved in a parallel execution. By analyzing and ...
Comments