research-article

The influence of HPCToolkit and Score-p on hardware performance counters

Authors:
Jan-Patrick Lehr

TU Darmstadt, Germany

TU Darmstadt, Germany
View Profile

,
Christian Iwainsky

TU Darmstadt, Germany

TU Darmstadt, Germany
View Profile

,
Christian Bischof

TU Darmstadt, Germany

TU Darmstadt, Germany
View Profile

SEPS 2017: Proceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel SystemsOctober 2017Pages 21–30https://doi.org/10.1145/3141865.3141869

Published:23 October 2017Publication History

SEPS 2017: Proceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems

Pages 21–30

ABSTRACT

Performance measurement and analysis are commonly carried out tasks for high-performance computing applications. Both sampling and instrumentation approaches for performance measurement can capture hardware performance counter (HWPC) metrics to asses the software's ability to use the functional units of the processor. Since the measurement software usually executes on the same processor, it necessarily competes with the target application for hardware resources. Consequently, the measurement system perturbs the target application, which often results in runtime overhead. While the runtime overhead of different measurement techniques has been previously studied, it has not been thoroughly examined to what extent HWPC values are perturbed by the measurement process. In this paper, we investigate the influence of the two widely-used performance measurement systems HPCToolkit (sampling) and Score-P (instrumentation) w.r.t. their influence on HWPC. Our experiments on the SPEC CPU 2006 C/C++ benchmarks show that, while Score-P's default instrumentation can massively increase runtime, it does not always heavily perturb relevant HWPC. On the other hand, HPCToolkit shows no significant runtime overhead, but significantly influences some relevant HWPC. We conclude that for every performance experiment sufficient baseline measurements are essential to identify the HWPC that remain valid indicators of performance for a given measurement technique. Thus, performance analysis tools need to offer easily accessible means to automate the baseline and validation functionality.

References

Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R Tallent. 2010. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience. 22, 6 (2010), 685–701.Google ScholarCross Ref
Dieter an Mey, Scott Biersdorf, Christian Bischof, Kai Diethelm, Dominic Eschweiler, Michael Gerndt, et al. 2011. Score-P: A Unified Performance Measurement System for Petascale Applications. In Competence in High Performance Computing 2010. Springer Science + Business Media, 85–97.Google Scholar
Christian Bischof, Dieter an Mey, and Christian Iwainsky. 2011. Brainware for green HPC. Computer Science - Research and Development. 27, 4 (2011), 227–233.Google ScholarDigital Library
Shirley Browne. 2000. A Portable Programming Interface for Performance Evaluation on Modern Processors. Intl. Journal of High Performance Computing Applications. 14, 3 (2000), 189–204. Google ScholarDigital Library
Luiz DeRose and Heidi Poxon. 2009. A paradigm change: from performance monitoring to performance analysis. In 21st International Symposium on Computer Architecture and High Performance Computing, 2009. SBAC-PAD’09 . IEEE, 119–126. Google ScholarDigital Library
Markus Geimer, Felix Wolf, Brian J. N. Wylie, Erika Ábrahám, Daniel Becker, and Bernd Mohr. 2010. The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience 22, 6 (2010), 702–719. Google ScholarDigital Library
Oscar Hernandez, Fengguang Song, Barbara Chapman, Jack Dongarra, Bernd Mohr, Shirley Moore, and Felix Wolf. 2008. Performance instrumentation and compiler optimizations for MPI/OpenMP applications. In OpenMP Shared Memory Parallel Programming. Springer, 267–278. Google ScholarCross Ref
Intel. 2016. Intel 64 and IA-32 Architectures Optimization Reference Manual.Google Scholar
Christian Iwainsky. 2015. InstRO: A Component-Based Tool For Performance Instrumentation. Ph.D. Dissertation. Technische Universität Darmstadt.Google Scholar
Christian Iwainsky, Ralph Altenfeld, Dieter an Mey, and Christian Bischof. 2011. Enhancing brainware productivity through a performance tuning workflow. In Euro-Par 2011: Parallel Processing Workshops. Springer, 198–207.Google Scholar
Christian Iwainsky and Christian Bischof. 2016. Call Tree Controlled Instrumentation for Low-Overhead Survey Measurements. In 2016 IEEE Intl. Parallel and Distributed Processing Symposium Workshops (IPDPSW 2016). Google ScholarCross Ref
Christian Iwainsky, Jan-Patrick Lehr, and Christian Bischof. 2014. Compiler Supported Sampling through Minimalistic Instrumentation. In 2014 43rd Intl. Conf. on Parallel Processing Workshops. Institute of Electrical & Electronics Engineers (IEEE). Google ScholarDigital Library
Wendy Korn, Patricia J. Teller, and Gilbert Castillo. 2001. Just how accurate are performance counters?. In IEEE Intl. Conf. on Performance, Computing, and Communications, 2001. 303–310.Google ScholarCross Ref
Mark W. Krentel. 2013. Libmonitor: A tool for first-party monitoring. Parallel Comput. 39, 3 (2013), 114–119. Google ScholarDigital Library
Allen D. Malony, Daniel A. Reed, and Harry A. G. Wijshoff. 1992. Performance measurement intrusion and perturbation analysis. IEEE Transactions on Parallel and Distributed Systems. 3, 4 (1992), 433–450. Google ScholarDigital Library
Michael E. Maxwell, Patricia P. Teller, Leonardo Salayandia, and Shirley Moore. 2002. Accuracy of performance monitoring hardware. In Proc. of the Los Alamos Computer Science Institute Symposium (LACSI’02). Citeseer.Google Scholar
Jan Mußler, Daniel Lorenz, and Felix Wolf. 2011. Reducing the Overhead of Direct Application Instrumentation Using Prior Static Analysis. In Euro-Par 2011 Parallel Processing. Springer. Google ScholarCross Ref
Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2007. Understanding Measurement Perturbation in Tracebased Data. In IEEE Intl. Parallel and Distributed Processing Symposium, IPDPS. 1–6.Google Scholar
Ventsislav Petkov, Michael Gerndt, and Michael Firbach. 2013. PAThWay: Performance Analysis and Tuning Using Workflows. In 2013 IEEE 10th Intl. Conf. on High Performance Computing and Communications & 2013 IEEE Intl. Conf. on Embedded and Ubiquitous Computing. Institute of Electrical & Electronics Engineers (IEEE).Google Scholar
Aashish Phansalkar, Ajay Joshi, and Lizy K. John. 2007. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. ACM SIGARCH Computer Architecture News 35, 2 (2007), 412– 423. Google ScholarDigital Library
Dirk Schmidl, Peter Philippen, Daniel Lorenz, Christian Rössel, Markus Geimer, Dieter an Mey, Bernd Mohr, and Felix Wolf. 2012. Performance Analysis Techniques for Task-Based OpenMP Applications. In OpenMP in a Heterogeneous World. Springer Science + Business Media, 196–209. Google ScholarDigital Library
Zoltán Szebenyi, Todd Gamblin, Martin Schulz, Bronis R. de Supinski, Felix Wolf, and Brian J.N. Wylie. 2011. Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs. In 2011 IEEE Intl. Parallel & Distributed Processing Symposium. Institute of Electrical & Electronics Engineers (IEEE). Google ScholarDigital Library
Vincent M. Weaver and Sally A. McKee. 2008. Can hardware performance counters be trusted?. In IEEE Intl. Symp. on Workload Characterization, 2008. IISWC 2008. 141–150.Google Scholar
Dimitrijs Zaparanuks, Milan Jovic, and Matthias Hauswirth. 2009. Accuracy of performance counter measurements. In Proc. IEEE Int. Symp. Performance Analysis of Systems and Software. 23–32. Google ScholarCross Ref

Index Terms

The influence of HPCToolkit and Score-p on hardware performance counters
1. General and reference
  1. Cross-computing tools and techniques
2. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools
  2. Software organization and properties
    1. Extra-functional properties
      1. Software performance

Recommendations

Detecting Memory-Boundedness with Hardware Performance Counters
ICPE '17: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering

Modern processors incorporate several performance monitoring units, which can be used to count events that occur within different components of the processor. They provide access to information on hardware resource usage and can therefore be used to ...
Read More
Hardware Performance Counters: Ready-Made vs Tailor-Made
Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021
Micro-architectural footprints can be used to distinguish one application from another. Most modern processors feature hardware performance counters to monitor the various micro-architectural events when an application is executing. These ready-made ...
Read More
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing

Tuning supercomputer application performance often requires analyzing the interaction of the application and the underlying architecture. In this paper, we describe support in the MIPS R10000 for non-intrusively monitoring a variety of processor events -...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SEPS 2017: Proceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems
October 2017
47 pages
ISBN:9781450355179
DOI:10.1145/3141865
General Chairs:
Ali Jannesari,
Pablo de Oliveira Castro,
Yukinori Sato,
Tim Mattson
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HPC
Hardware Performance Counters
PAPI
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 189
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The influence of HPCToolkit and Score-p on hardware performance counters

SEPS 2017: Proceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detecting Memory-Boundedness with Hardware Performance Counters

Hardware Performance Counters: Ready-Made vs Tailor-Made

Performance analysis using the MIPS R10000 performance counters