research-article

Open access

Hatchet: pruning the overgrowth in parallel profiles

Authors:

Abhinav Bhatele,

Stephanie Brink,

Todd GamblinAuthors Info & Claims

SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 20, Pages 1 - 21

https://doi.org/10.1145/3295500.3356219

Published: 17 November 2019 Publication History

Abstract

Performance analysis is critical for eliminating scalability bottlenecks in parallel codes. There are many profiling tools that can instrument codes and gather performance data. However, analytics and visualization tools that are general, easy to use, and programmable are limited. In this paper, we focus on the analytics of structured profiling data, such as that obtained from calling context trees or nested region timers in code. We present a set of techniques and operations that build on the pandas data analysis library to enable analysis of parallel profiles. We have implemented these techniques in a Python-based library called Hatchet that allows structured data to be filtered, aggregated, and pruned. Using performance datasets obtained from profiling parallel codes, we demonstrate performing common performance analysis tasks reproducibly with a few lines of Hatchet code. Hatchet brings the power of modern data science tools to bear on performance analysis.

References

[1]

[n.d.]. Hydrodynamics Challenge Problem, Lawrence Livermore National Laboratory. Technical Report LLNL-TR-490254.

[2]

[n.d.]. Kripke. https://codesign.llnl.gov/kripke.php.

[3]

Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R Tallent. 2010. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685--701.

[4]

D. Böehme, D. Beckingsale, and M. Schulz. 2017. Flexible Data Aggregation for Performance Profiling. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). 419--428.

[5]

David Boehme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Gimenez, Matthew LeGendre, Olga Pearce, and Martin Schulz. 2016. Caliper: Performance Introspection for HPC Software Stacks. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Computer Society, Article 47, 11 pages. http://dl.acm.org/citation.cfm?id=3014904.3014967 LLNL-CONF-699263.

[6]

Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler, and Daniel A. Reed. 2008. Scalable Load-Balance Measurement for SPMD Codes. In Supercomputing 2008 (SC'08). Austin, Texas. http://www.cs.unc.edu/~tgamblin/pubs/wavelet-sc08.pdf LLNL-CONF-406045.

[7]

Susan L Graham, Peter B Kessler, and Marshall K Mckusick. 1982. Gprof: A call graph execution profiler. SIGPLAN Not. 17, 6 (1982), 120--126.

Digital Library

[8]

Brendan Gregg. [n.d.]. Flame Graphs. https://github.com/brendangregg/FlameGraph.

[9]

Brendan Gregg. 2015. Flame graphs. Online. http://www.brendangregg.com/Slides/FreeBSD2014_FlameGraphs.pdf.

[10]

Kevin Huck, Allen D. Malony, R Bell, L Li, and A Morris. 2005. PerfDMF: Design and implementation of a parallel performance data management framework. In International Conference on Parallel Processing (ICPP'05).

Digital Library

[11]

Kevin A. Huck and Allen D. Malony. 2005. PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing. In Supercomputing 2005 (SC'05). Seattle, WA, 41.

[12]

Andreas Knüpfer, Christian Rössel, Dieter an Mey, Scott Biersdorff, Kai Diethelm, Dominic Eschweiler, Markus Geimer, Michael Gerndt, Daniel Lorenz, Allen Malony, Wolfgang E. Nagel, Yury Oleynik, Peter Philippen, Pavel Saviankou, Dirk Schmidl, Sameer Shende, Ronny Tschüter, Michael Wagner, Bert Wesarg, and Felix Wolf. 2012. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In Tools for High Performance Computing 2011, Holger Brunst, Matthias S. Müller, Wolfgang E. Nagel, and Michael M. Resch (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 79--91.

[13]

AJ Kunen, TS Bailey, and PN Brown. 2015. KRIPKE-A massively parallel transport mini-app. Lawrence Livermore National Laboratory (LLNL), Livermore, CA, Tech. Rep (2015).

[14]

P. E. McKenney. 1995. Differential profiling. In MASCOTS '95. Proceedings of the Third International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. 237--241.

[15]

Wes McKinney. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Stéfan van der Walt and Jarrod Millman (Eds.). 51 -- 56.

[16]

Wes McKinney. 2017. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O'Reilly Media. https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1491957662?SubscriptionId=AKIAIOBINVZYXZQZ2U3A&tag=chimbori05-20&linkCode=xm2&camp=2025&creative=165953&creativeASIN=1491957662

[17]

J. Mellor-Crummey, R. Fowler, and G. Marin. 2002. HPCView: A tool for top-down analysis of node performance. The Journal of Supercomputing 23 (2002), 81--101.

Digital Library

[18]

H. T. Nguyen, L. Wei, A. Bhatele, T. Gamblin, D. Boehme, M. Schulz, K. Ma, and P. Bremer. 2016. VIPACT: A Visualization Interface for Analyzing Calling Context Trees. In 2016 Third Workshop on Visual Performance Analysis (VPA). 25--28.

[19]

R Core Team. 2013. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/

[20]

Martin Schulz and Bronis R. de Supinski. 2007. Practical Differential Profiling. In Euro-Par 2007 Parallel Processing, Anne-Marie Kermarrec, Luc Bougé, and Thierry Priol (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 97--106.

[21]

S. Shende and A. D. Malony. 2006. The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20, 2 (2006), 287--311.

Digital Library

[22]

N. Tallent, J. Mellor-Crummey, M. Franco, R. Landrum, and L. Adhianto. 2011. Scalable Fine-grained Call Path Tracing.

[23]

Nathan R. Tallent, Laksono Adhianto, and John M. Mellor-Crummey. 2010. Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles.

[24]

Nathan R. Tallent, John M. Mellor-Crummey, Laksono Adhianto, Michael W. Fagan, and Mark Krentel. 2011. Diagnosing performance bottlenecks in emerging petascale applications.

[25]

The Open|SpeedShop Team. [n.d.]. Open|SpeedShop for Linux. http://www.openspeedshop.org

[26]

David Wheeler. 2012. SLOCCount. http://www.dwheeler.com/sloccount

Cited By

Jin YWang HTang XGuo ZZhao YHoefler TLiu TLiu XZhai J(2025)Leveraging Graph Analysis to Pinpoint Root Causes of Scalability Issues for Parallel ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.348578936:2(308-325)Online publication date: Feb-2025
https://doi.org/10.1109/TPDS.2024.3485789
Fan KKesavan SPetruzza SKumar S(2024)TinyProf: Towards Continuous Performance Introspection through Scalable Parallel I/OISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528932(1-12)Online publication date: May-2024
https://doi.org/10.23919/ISC.2024.10528932
Adhianto LAnderson JBarnett RGrbic DIndic VKrentel MLiu YMilaković SPhan WMellor-Crummey J(2024)Refining HPCToolkit for application performance analysis at exascaleThe International Journal of High Performance Computing Applications10.1177/1094342024127783938:6(612-632)Online publication date: 30-Aug-2024
https://doi.org/10.1177/10943420241277839
Show More Cited By

Index Terms

Hatchet: pruning the overgrowth in parallel profiles
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools

Recommendations

Accurate, efficient, and adaptive calling context profiling
PLDI '06: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation

Calling context profiles are used in many inter-procedural code optimizations and in overall program understanding. Unfortunately, the collection of profile information is highly intrusive due to the high frequency of method calls in most applications. ...
Impact of Memory Size on Bigdata Processing based on Hadoop and Spark
RACS '17: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Hadoop and Spark are well-known big data processing platforms. The main technologies of Hadoop are Hadoop Distributed File System and MapReduce processing. Hadoop stores intermediary data on Hadoop Distributed File System, which is a disk-based ...
Comparative Evaluation of Call Graph Generation by Profiling Tools
High Performance Computing
Abstract
Call graphs generated by profiling tools are critical to dissecting the performance of parallel programs. Although many mature and sophisticated profiling tools record call graph data, each tool is different in its runtime overheads, memory ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2019

1921 pages

ISBN:9781450362290

DOI:10.1145/3295500

General Chair:
Michela Taufer,
Program Chairs:
Pavan Balaji,
Antonio J. Peña

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC '19

Sponsor:

SIGHPC

SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis

November 17 - 19, 2019

Colorado, Denver

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
1,347
Total Downloads

Downloads (Last 12 months)351
Downloads (Last 6 weeks)19

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jin YWang HTang XGuo ZZhao YHoefler TLiu TLiu XZhai J(2025)Leveraging Graph Analysis to Pinpoint Root Causes of Scalability Issues for Parallel ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.348578936:2(308-325)Online publication date: Feb-2025
https://doi.org/10.1109/TPDS.2024.3485789
Fan KKesavan SPetruzza SKumar S(2024)TinyProf: Towards Continuous Performance Introspection through Scalable Parallel I/OISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528932(1-12)Online publication date: May-2024
https://doi.org/10.23919/ISC.2024.10528932
Adhianto LAnderson JBarnett RGrbic DIndic VKrentel MLiu YMilaković SPhan WMellor-Crummey J(2024)Refining HPCToolkit for application performance analysis at exascaleThe International Journal of High Performance Computing Applications10.1177/1094342024127783938:6(612-632)Online publication date: 30-Aug-2024
https://doi.org/10.1177/10943420241277839
Scully-Allison CLumsden IWilliams KBartels JTaufer MBrink SBhatele APearce OIsaacs K(2024)Design Concerns for Integrated Scripting and Interactive Visualization in Notebook EnvironmentsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.335456130:9(6572-6585)Online publication date: Sep-2024
https://doi.org/10.1109/TVCG.2024.3354561
Jin YWang HZhong RZhang CLiao XZhang FZhai J(2024)Graph-Centric Performance Analysis for Large-Scale Parallel ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339684935:7(1221-1238)Online publication date: Jul-2024
https://doi.org/10.1109/TPDS.2024.3396849
Xu YSivaraman PDevarajan HMohror KBhatele A(2024)ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC62374.2024.00030(221-231)Online publication date: 18-Dec-2024
https://doi.org/10.1109/HiPC62374.2024.00030
Blaschke JBrewster APaley DMendez DBhowmick ASauter NKröger WShankar MEnders BBard D(2024)Real‐time XFEL data analysis at SLAC and NERSC: A trial run of nascent exascale experimental data analysisConcurrency and Computation: Practice and Experience10.1002/cpe.801936:12Online publication date: 13-Feb-2024
https://doi.org/10.1002/cpe.8019
Dongarra JTourancheau BPearce OBrink S(2023)Finding the forest in the treesInternational Journal of High Performance Computing Applications10.1177/1094342023117568737:3-4(434-441)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1177/10943420231175687
Frenzel JKulkarni ADöbel SWesarg BKnespel MBrunst H(2023)FROOM: A Framework of Operators for OTF2 ModificationProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624209(1403-1411)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624209
Luettgau JSnyder SReddy TAwtrey NHarms KBez JWang RLatham RCarns P(2023)Enabling Agile Analysis of I/O Performance Data with PyDarshanProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624207(1380-1391)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624207
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten