skip to main content
10.1145/3295500.3356219acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

Hatchet: pruning the overgrowth in parallel profiles

Published: 17 November 2019 Publication History

Abstract

Performance analysis is critical for eliminating scalability bottlenecks in parallel codes. There are many profiling tools that can instrument codes and gather performance data. However, analytics and visualization tools that are general, easy to use, and programmable are limited. In this paper, we focus on the analytics of structured profiling data, such as that obtained from calling context trees or nested region timers in code. We present a set of techniques and operations that build on the pandas data analysis library to enable analysis of parallel profiles. We have implemented these techniques in a Python-based library called Hatchet that allows structured data to be filtered, aggregated, and pruned. Using performance datasets obtained from profiling parallel codes, we demonstrate performing common performance analysis tasks reproducibly with a few lines of Hatchet code. Hatchet brings the power of modern data science tools to bear on performance analysis.

References

[1]
[n.d.]. Hydrodynamics Challenge Problem, Lawrence Livermore National Laboratory. Technical Report LLNL-TR-490254.
[2]
[n.d.]. Kripke. https://codesign.llnl.gov/kripke.php.
[3]
Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R Tallent. 2010. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685--701.
[4]
D. Böehme, D. Beckingsale, and M. Schulz. 2017. Flexible Data Aggregation for Performance Profiling. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). 419--428.
[5]
David Boehme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Gimenez, Matthew LeGendre, Olga Pearce, and Martin Schulz. 2016. Caliper: Performance Introspection for HPC Software Stacks. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Computer Society, Article 47, 11 pages. http://dl.acm.org/citation.cfm?id=3014904.3014967 LLNL-CONF-699263.
[6]
Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler, and Daniel A. Reed. 2008. Scalable Load-Balance Measurement for SPMD Codes. In Supercomputing 2008 (SC'08). Austin, Texas. http://www.cs.unc.edu/~tgamblin/pubs/wavelet-sc08.pdf LLNL-CONF-406045.
[7]
Susan L Graham, Peter B Kessler, and Marshall K Mckusick. 1982. Gprof: A call graph execution profiler. SIGPLAN Not. 17, 6 (1982), 120--126.
[8]
Brendan Gregg. [n.d.]. Flame Graphs. https://github.com/brendangregg/FlameGraph.
[9]
Brendan Gregg. 2015. Flame graphs. Online. http://www.brendangregg.com/Slides/FreeBSD2014_FlameGraphs.pdf.
[10]
Kevin Huck, Allen D. Malony, R Bell, L Li, and A Morris. 2005. PerfDMF: Design and implementation of a parallel performance data management framework. In International Conference on Parallel Processing (ICPP'05).
[11]
Kevin A. Huck and Allen D. Malony. 2005. PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing. In Supercomputing 2005 (SC'05). Seattle, WA, 41.
[12]
Andreas Knüpfer, Christian Rössel, Dieter an Mey, Scott Biersdorff, Kai Diethelm, Dominic Eschweiler, Markus Geimer, Michael Gerndt, Daniel Lorenz, Allen Malony, Wolfgang E. Nagel, Yury Oleynik, Peter Philippen, Pavel Saviankou, Dirk Schmidl, Sameer Shende, Ronny Tschüter, Michael Wagner, Bert Wesarg, and Felix Wolf. 2012. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In Tools for High Performance Computing 2011, Holger Brunst, Matthias S. Müller, Wolfgang E. Nagel, and Michael M. Resch (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 79--91.
[13]
AJ Kunen, TS Bailey, and PN Brown. 2015. KRIPKE-A massively parallel transport mini-app. Lawrence Livermore National Laboratory (LLNL), Livermore, CA, Tech. Rep (2015).
[14]
P. E. McKenney. 1995. Differential profiling. In MASCOTS '95. Proceedings of the Third International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. 237--241.
[15]
Wes McKinney. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Stéfan van der Walt and Jarrod Millman (Eds.). 51 -- 56.
[16]
Wes McKinney. 2017. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O'Reilly Media. https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1491957662?SubscriptionId=AKIAIOBINVZYXZQZ2U3A&tag=chimbori05-20&linkCode=xm2&camp=2025&creative=165953&creativeASIN=1491957662
[17]
J. Mellor-Crummey, R. Fowler, and G. Marin. 2002. HPCView: A tool for top-down analysis of node performance. The Journal of Supercomputing 23 (2002), 81--101.
[18]
H. T. Nguyen, L. Wei, A. Bhatele, T. Gamblin, D. Boehme, M. Schulz, K. Ma, and P. Bremer. 2016. VIPACT: A Visualization Interface for Analyzing Calling Context Trees. In 2016 Third Workshop on Visual Performance Analysis (VPA). 25--28.
[19]
R Core Team. 2013. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
[20]
Martin Schulz and Bronis R. de Supinski. 2007. Practical Differential Profiling. In Euro-Par 2007 Parallel Processing, Anne-Marie Kermarrec, Luc Bougé, and Thierry Priol (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 97--106.
[21]
S. Shende and A. D. Malony. 2006. The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20, 2 (2006), 287--311.
[22]
N. Tallent, J. Mellor-Crummey, M. Franco, R. Landrum, and L. Adhianto. 2011. Scalable Fine-grained Call Path Tracing.
[23]
Nathan R. Tallent, Laksono Adhianto, and John M. Mellor-Crummey. 2010. Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles.
[24]
Nathan R. Tallent, John M. Mellor-Crummey, Laksono Adhianto, Michael W. Fagan, and Mark Krentel. 2011. Diagnosing performance bottlenecks in emerging petascale applications.
[25]
The Open|SpeedShop Team. [n.d.]. Open|SpeedShop for Linux. http://www.openspeedshop.org
[26]
David Wheeler. 2012. SLOCCount. http://www.dwheeler.com/sloccount

Cited By

View all
  • (2025)Leveraging Graph Analysis to Pinpoint Root Causes of Scalability Issues for Parallel ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.348578936:2(308-325)Online publication date: Feb-2025
  • (2024)TinyProf: Towards Continuous Performance Introspection through Scalable Parallel I/OISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528932(1-12)Online publication date: May-2024
  • (2024)Refining HPCToolkit for application performance analysis at exascaleThe International Journal of High Performance Computing Applications10.1177/1094342024127783938:6(612-632)Online publication date: 30-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2019
1921 pages
ISBN:9781450362290
DOI:10.1145/3295500
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. call graph
  2. calling context tree
  3. graph analytics
  4. parallel profile
  5. performance analysis
  6. tool

Qualifiers

  • Research-article

Conference

SC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)351
  • Downloads (Last 6 weeks)19
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Leveraging Graph Analysis to Pinpoint Root Causes of Scalability Issues for Parallel ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.348578936:2(308-325)Online publication date: Feb-2025
  • (2024)TinyProf: Towards Continuous Performance Introspection through Scalable Parallel I/OISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528932(1-12)Online publication date: May-2024
  • (2024)Refining HPCToolkit for application performance analysis at exascaleThe International Journal of High Performance Computing Applications10.1177/1094342024127783938:6(612-632)Online publication date: 30-Aug-2024
  • (2024)Design Concerns for Integrated Scripting and Interactive Visualization in Notebook EnvironmentsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.335456130:9(6572-6585)Online publication date: Sep-2024
  • (2024)Graph-Centric Performance Analysis for Large-Scale Parallel ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339684935:7(1221-1238)Online publication date: Jul-2024
  • (2024)ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC62374.2024.00030(221-231)Online publication date: 18-Dec-2024
  • (2024)Real‐time XFEL data analysis at SLAC and NERSC: A trial run of nascent exascale experimental data analysisConcurrency and Computation: Practice and Experience10.1002/cpe.801936:12Online publication date: 13-Feb-2024
  • (2023)Finding the forest in the treesInternational Journal of High Performance Computing Applications10.1177/1094342023117568737:3-4(434-441)Online publication date: 1-Jul-2023
  • (2023)FROOM: A Framework of Operators for OTF2 ModificationProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624209(1403-1411)Online publication date: 12-Nov-2023
  • (2023)Enabling Agile Analysis of I/O Performance Data with PyDarshanProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624207(1380-1391)Online publication date: 12-Nov-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media