skip to main content
10.1145/3141865.3141869acmconferencesArticle/Chapter ViewAbstractPublication PagessepsConference Proceedingsconference-collections
research-article

The influence of HPCToolkit and Score-p on hardware performance counters

Published:23 October 2017Publication History

ABSTRACT

Performance measurement and analysis are commonly carried out tasks for high-performance computing applications. Both sampling and instrumentation approaches for performance measurement can capture hardware performance counter (HWPC) metrics to asses the software's ability to use the functional units of the processor. Since the measurement software usually executes on the same processor, it necessarily competes with the target application for hardware resources. Consequently, the measurement system perturbs the target application, which often results in runtime overhead. While the runtime overhead of different measurement techniques has been previously studied, it has not been thoroughly examined to what extent HWPC values are perturbed by the measurement process. In this paper, we investigate the influence of the two widely-used performance measurement systems HPCToolkit (sampling) and Score-P (instrumentation) w.r.t. their influence on HWPC. Our experiments on the SPEC CPU 2006 C/C++ benchmarks show that, while Score-P's default instrumentation can massively increase runtime, it does not always heavily perturb relevant HWPC. On the other hand, HPCToolkit shows no significant runtime overhead, but significantly influences some relevant HWPC. We conclude that for every performance experiment sufficient baseline measurements are essential to identify the HWPC that remain valid indicators of performance for a given measurement technique. Thus, performance analysis tools need to offer easily accessible means to automate the baseline and validation functionality.

References

  1. Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R Tallent. 2010. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience. 22, 6 (2010), 685–701.Google ScholarGoogle ScholarCross RefCross Ref
  2. Dieter an Mey, Scott Biersdorf, Christian Bischof, Kai Diethelm, Dominic Eschweiler, Michael Gerndt, et al. 2011. Score-P: A Unified Performance Measurement System for Petascale Applications. In Competence in High Performance Computing 2010. Springer Science + Business Media, 85–97.Google ScholarGoogle Scholar
  3. Christian Bischof, Dieter an Mey, and Christian Iwainsky. 2011. Brainware for green HPC. Computer Science - Research and Development. 27, 4 (2011), 227–233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Shirley Browne. 2000. A Portable Programming Interface for Performance Evaluation on Modern Processors. Intl. Journal of High Performance Computing Applications. 14, 3 (2000), 189–204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Luiz DeRose and Heidi Poxon. 2009. A paradigm change: from performance monitoring to performance analysis. In 21st International Symposium on Computer Architecture and High Performance Computing, 2009. SBAC-PAD’09 . IEEE, 119–126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Markus Geimer, Felix Wolf, Brian J. N. Wylie, Erika Ábrahám, Daniel Becker, and Bernd Mohr. 2010. The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience 22, 6 (2010), 702–719. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Oscar Hernandez, Fengguang Song, Barbara Chapman, Jack Dongarra, Bernd Mohr, Shirley Moore, and Felix Wolf. 2008. Performance instrumentation and compiler optimizations for MPI/OpenMP applications. In OpenMP Shared Memory Parallel Programming. Springer, 267–278. Google ScholarGoogle ScholarCross RefCross Ref
  8. Intel. 2016. Intel 64 and IA-32 Architectures Optimization Reference Manual.Google ScholarGoogle Scholar
  9. Christian Iwainsky. 2015. InstRO: A Component-Based Tool For Performance Instrumentation. Ph.D. Dissertation. Technische Universität Darmstadt.Google ScholarGoogle Scholar
  10. Christian Iwainsky, Ralph Altenfeld, Dieter an Mey, and Christian Bischof. 2011. Enhancing brainware productivity through a performance tuning workflow. In Euro-Par 2011: Parallel Processing Workshops. Springer, 198–207.Google ScholarGoogle Scholar
  11. Christian Iwainsky and Christian Bischof. 2016. Call Tree Controlled Instrumentation for Low-Overhead Survey Measurements. In 2016 IEEE Intl. Parallel and Distributed Processing Symposium Workshops (IPDPSW 2016). Google ScholarGoogle ScholarCross RefCross Ref
  12. Christian Iwainsky, Jan-Patrick Lehr, and Christian Bischof. 2014. Compiler Supported Sampling through Minimalistic Instrumentation. In 2014 43rd Intl. Conf. on Parallel Processing Workshops. Institute of Electrical & Electronics Engineers (IEEE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Wendy Korn, Patricia J. Teller, and Gilbert Castillo. 2001. Just how accurate are performance counters?. In IEEE Intl. Conf. on Performance, Computing, and Communications, 2001. 303–310.Google ScholarGoogle ScholarCross RefCross Ref
  14. Mark W. Krentel. 2013. Libmonitor: A tool for first-party monitoring. Parallel Comput. 39, 3 (2013), 114–119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Allen D. Malony, Daniel A. Reed, and Harry A. G. Wijshoff. 1992. Performance measurement intrusion and perturbation analysis. IEEE Transactions on Parallel and Distributed Systems. 3, 4 (1992), 433–450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Michael E. Maxwell, Patricia P. Teller, Leonardo Salayandia, and Shirley Moore. 2002. Accuracy of performance monitoring hardware. In Proc. of the Los Alamos Computer Science Institute Symposium (LACSI’02). Citeseer.Google ScholarGoogle Scholar
  17. Jan Mußler, Daniel Lorenz, and Felix Wolf. 2011. Reducing the Overhead of Direct Application Instrumentation Using Prior Static Analysis. In Euro-Par 2011 Parallel Processing. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  18. Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2007. Understanding Measurement Perturbation in Tracebased Data. In IEEE Intl. Parallel and Distributed Processing Symposium, IPDPS. 1–6.Google ScholarGoogle Scholar
  19. Ventsislav Petkov, Michael Gerndt, and Michael Firbach. 2013. PAThWay: Performance Analysis and Tuning Using Workflows. In 2013 IEEE 10th Intl. Conf. on High Performance Computing and Communications & 2013 IEEE Intl. Conf. on Embedded and Ubiquitous Computing. Institute of Electrical & Electronics Engineers (IEEE).Google ScholarGoogle Scholar
  20. Aashish Phansalkar, Ajay Joshi, and Lizy K. John. 2007. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. ACM SIGARCH Computer Architecture News 35, 2 (2007), 412– 423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dirk Schmidl, Peter Philippen, Daniel Lorenz, Christian Rössel, Markus Geimer, Dieter an Mey, Bernd Mohr, and Felix Wolf. 2012. Performance Analysis Techniques for Task-Based OpenMP Applications. In OpenMP in a Heterogeneous World. Springer Science + Business Media, 196–209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zoltán Szebenyi, Todd Gamblin, Martin Schulz, Bronis R. de Supinski, Felix Wolf, and Brian J.N. Wylie. 2011. Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs. In 2011 IEEE Intl. Parallel & Distributed Processing Symposium. Institute of Electrical & Electronics Engineers (IEEE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Vincent M. Weaver and Sally A. McKee. 2008. Can hardware performance counters be trusted?. In IEEE Intl. Symp. on Workload Characterization, 2008. IISWC 2008. 141–150.Google ScholarGoogle Scholar
  24. Dimitrijs Zaparanuks, Milan Jovic, and Matthias Hauswirth. 2009. Accuracy of performance counter measurements. In Proc. IEEE Int. Symp. Performance Analysis of Systems and Software. 23–32. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The influence of HPCToolkit and Score-p on hardware performance counters

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SEPS 2017: Proceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems
              October 2017
              47 pages
              ISBN:9781450355179
              DOI:10.1145/3141865

              Copyright © 2017 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 23 October 2017

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader