ABSTRACT
Performance measurement and analysis are commonly carried out tasks for high-performance computing applications. Both sampling and instrumentation approaches for performance measurement can capture hardware performance counter (HWPC) metrics to asses the software's ability to use the functional units of the processor. Since the measurement software usually executes on the same processor, it necessarily competes with the target application for hardware resources. Consequently, the measurement system perturbs the target application, which often results in runtime overhead. While the runtime overhead of different measurement techniques has been previously studied, it has not been thoroughly examined to what extent HWPC values are perturbed by the measurement process. In this paper, we investigate the influence of the two widely-used performance measurement systems HPCToolkit (sampling) and Score-P (instrumentation) w.r.t. their influence on HWPC. Our experiments on the SPEC CPU 2006 C/C++ benchmarks show that, while Score-P's default instrumentation can massively increase runtime, it does not always heavily perturb relevant HWPC. On the other hand, HPCToolkit shows no significant runtime overhead, but significantly influences some relevant HWPC. We conclude that for every performance experiment sufficient baseline measurements are essential to identify the HWPC that remain valid indicators of performance for a given measurement technique. Thus, performance analysis tools need to offer easily accessible means to automate the baseline and validation functionality.
- Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R Tallent. 2010. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience. 22, 6 (2010), 685–701.Google ScholarCross Ref
- Dieter an Mey, Scott Biersdorf, Christian Bischof, Kai Diethelm, Dominic Eschweiler, Michael Gerndt, et al. 2011. Score-P: A Unified Performance Measurement System for Petascale Applications. In Competence in High Performance Computing 2010. Springer Science + Business Media, 85–97.Google Scholar
- Christian Bischof, Dieter an Mey, and Christian Iwainsky. 2011. Brainware for green HPC. Computer Science - Research and Development. 27, 4 (2011), 227–233.Google ScholarDigital Library
- Shirley Browne. 2000. A Portable Programming Interface for Performance Evaluation on Modern Processors. Intl. Journal of High Performance Computing Applications. 14, 3 (2000), 189–204. Google ScholarDigital Library
- Luiz DeRose and Heidi Poxon. 2009. A paradigm change: from performance monitoring to performance analysis. In 21st International Symposium on Computer Architecture and High Performance Computing, 2009. SBAC-PAD’09 . IEEE, 119–126. Google ScholarDigital Library
- Markus Geimer, Felix Wolf, Brian J. N. Wylie, Erika Ábrahám, Daniel Becker, and Bernd Mohr. 2010. The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience 22, 6 (2010), 702–719. Google ScholarDigital Library
- Oscar Hernandez, Fengguang Song, Barbara Chapman, Jack Dongarra, Bernd Mohr, Shirley Moore, and Felix Wolf. 2008. Performance instrumentation and compiler optimizations for MPI/OpenMP applications. In OpenMP Shared Memory Parallel Programming. Springer, 267–278. Google ScholarCross Ref
- Intel. 2016. Intel 64 and IA-32 Architectures Optimization Reference Manual.Google Scholar
- Christian Iwainsky. 2015. InstRO: A Component-Based Tool For Performance Instrumentation. Ph.D. Dissertation. Technische Universität Darmstadt.Google Scholar
- Christian Iwainsky, Ralph Altenfeld, Dieter an Mey, and Christian Bischof. 2011. Enhancing brainware productivity through a performance tuning workflow. In Euro-Par 2011: Parallel Processing Workshops. Springer, 198–207.Google Scholar
- Christian Iwainsky and Christian Bischof. 2016. Call Tree Controlled Instrumentation for Low-Overhead Survey Measurements. In 2016 IEEE Intl. Parallel and Distributed Processing Symposium Workshops (IPDPSW 2016). Google ScholarCross Ref
- Christian Iwainsky, Jan-Patrick Lehr, and Christian Bischof. 2014. Compiler Supported Sampling through Minimalistic Instrumentation. In 2014 43rd Intl. Conf. on Parallel Processing Workshops. Institute of Electrical & Electronics Engineers (IEEE). Google ScholarDigital Library
- Wendy Korn, Patricia J. Teller, and Gilbert Castillo. 2001. Just how accurate are performance counters?. In IEEE Intl. Conf. on Performance, Computing, and Communications, 2001. 303–310.Google ScholarCross Ref
- Mark W. Krentel. 2013. Libmonitor: A tool for first-party monitoring. Parallel Comput. 39, 3 (2013), 114–119. Google ScholarDigital Library
- Allen D. Malony, Daniel A. Reed, and Harry A. G. Wijshoff. 1992. Performance measurement intrusion and perturbation analysis. IEEE Transactions on Parallel and Distributed Systems. 3, 4 (1992), 433–450. Google ScholarDigital Library
- Michael E. Maxwell, Patricia P. Teller, Leonardo Salayandia, and Shirley Moore. 2002. Accuracy of performance monitoring hardware. In Proc. of the Los Alamos Computer Science Institute Symposium (LACSI’02). Citeseer.Google Scholar
- Jan Mußler, Daniel Lorenz, and Felix Wolf. 2011. Reducing the Overhead of Direct Application Instrumentation Using Prior Static Analysis. In Euro-Par 2011 Parallel Processing. Springer. Google ScholarCross Ref
- Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2007. Understanding Measurement Perturbation in Tracebased Data. In IEEE Intl. Parallel and Distributed Processing Symposium, IPDPS. 1–6.Google Scholar
- Ventsislav Petkov, Michael Gerndt, and Michael Firbach. 2013. PAThWay: Performance Analysis and Tuning Using Workflows. In 2013 IEEE 10th Intl. Conf. on High Performance Computing and Communications & 2013 IEEE Intl. Conf. on Embedded and Ubiquitous Computing. Institute of Electrical & Electronics Engineers (IEEE).Google Scholar
- Aashish Phansalkar, Ajay Joshi, and Lizy K. John. 2007. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. ACM SIGARCH Computer Architecture News 35, 2 (2007), 412– 423. Google ScholarDigital Library
- Dirk Schmidl, Peter Philippen, Daniel Lorenz, Christian Rössel, Markus Geimer, Dieter an Mey, Bernd Mohr, and Felix Wolf. 2012. Performance Analysis Techniques for Task-Based OpenMP Applications. In OpenMP in a Heterogeneous World. Springer Science + Business Media, 196–209. Google ScholarDigital Library
- Zoltán Szebenyi, Todd Gamblin, Martin Schulz, Bronis R. de Supinski, Felix Wolf, and Brian J.N. Wylie. 2011. Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs. In 2011 IEEE Intl. Parallel & Distributed Processing Symposium. Institute of Electrical & Electronics Engineers (IEEE). Google ScholarDigital Library
- Vincent M. Weaver and Sally A. McKee. 2008. Can hardware performance counters be trusted?. In IEEE Intl. Symp. on Workload Characterization, 2008. IISWC 2008. 141–150.Google Scholar
- Dimitrijs Zaparanuks, Milan Jovic, and Matthias Hauswirth. 2009. Accuracy of performance counter measurements. In Proc. IEEE Int. Symp. Performance Analysis of Systems and Software. 23–32. Google ScholarCross Ref
Index Terms
- The influence of HPCToolkit and Score-p on hardware performance counters
Recommendations
Detecting Memory-Boundedness with Hardware Performance Counters
ICPE '17: Proceedings of the 8th ACM/SPEC on International Conference on Performance EngineeringModern processors incorporate several performance monitoring units, which can be used to count events that occur within different components of the processor. They provide access to information on hardware resource usage and can therefore be used to ...
Hardware Performance Counters: Ready-Made vs Tailor-Made
Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021Micro-architectural footprints can be used to distinguish one application from another. Most modern processors feature hardware performance counters to monitor the various micro-architectural events when an application is executing. These ready-made ...
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on SupercomputingTuning supercomputer application performance often requires analyzing the interaction of the application and the underlying architecture. In this paper, we describe support in the MIPS R10000 for non-intrusively monitoring a variety of processor events -...
Comments