research-article

Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

Authors:
Ye Jin

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

,
Xiaosong Ma

Qatar Computing Research Institute, Doha, PQ, Qatar

Qatar Computing Research Institute, Doha, PQ, Qatar
View Profile

,
Mingliang Liu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Qing Liu

Oak Ridge National Laboratory, Oak Ridge, TN, USA

Oak Ridge National Laboratory, Oak Ridge, TN, USA
View Profile

,
Jeremy Logan

Oak Ridge National Laboratory, Oak Ridge, TN, USA

Oak Ridge National Laboratory, Oak Ridge, TN, USA
View Profile

,
Norbert Podhorszki

Oak Ridge National Laboratory, Oak Ridge, TN, USA

Oak Ridge National Laboratory, Oak Ridge, TN, USA
View Profile

,
Jong Youl Choi

Oak Ridge National Laboratory, Oak Ridge, TN, USA

Oak Ridge National Laboratory, Oak Ridge, TN, USA
View Profile

,
Scott Klasky

Oak Ridge National Laboratory, Oak Ridge, TN, USA

Oak Ridge National Laboratory, Oak Ridge, TN, USA
View Profile

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SystemsJune 2015Pages 309–320https://doi.org/10.1145/2745844.2745876

Published:15 June 2015Publication History

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

Pages 309–320

ABSTRACT

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time- and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPrime, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters to create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPrime benchmarks. They retain the original applications' performance characteristics, in particular their relative performance across platforms. Also, the result benchmarks, already released online, are much more compact and easy-to-port compared to the original applications.

References

APPrime Website. http://www.apprimecodes.org/.Google Scholar
DOE INCITE. http://www.doeleadershipcomputing.org/awards/.Google Scholar
OLCF Titan. https://www.olcf.ornl.gov/titan/.Google Scholar
H. Abbasi, M. Wolf, G. Eisenhauer, S. Klasky, K. Schwan, and F. Zheng. DataStager: Scalable Data Staging Services for Petascale Applications. In Cluster Computing, 2010. Google ScholarDigital Library
H. Adalsteinsson, S. Cranford, D. A. Evensky, J. P. Kenny, J. Mayo, A. Pinar, and C. L. Janssen. A Simulator for Large-Scale Parallel Computer Architectures. In International Journal of Distributed Systems and Technologies, 2010. Google ScholarDigital Library
L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. Hpctoolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 2010. Google ScholarDigital Library
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The nas parallel benchmark: Summary and preliminary results. In Supercomputing, pages 158--165, New York, NY, USA, 1991. Google ScholarDigital Library
H. Brunst, H.-C. Hoppe, W. E. Nagel, and M. Winkler. Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach. In International Conference on Computational Science, 2001. Google ScholarDigital Library
M. Casas, R. M. Badia, and J. Labarta. Automatic Phase Detection and Structure Extraction of MPI Applications. International Journal of High Performance Computing Applications, 2010. Google ScholarDigital Library
D. P. Doane. Aesthetic Frequency Classifications. The American Statistician, 1976.Google Scholar
J. Dujmović. Automatic Generation of Benchmark and Test Workloads. In WOSP/SIPEW, 2010. Google ScholarDigital Library
E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S. Woodall. Open MPI: Goals, concept, and design of a next generation MPI implementation. In 11th European PVM/MPI Users' Group Meeting, pages 97--104, Budapest, Hungary, September 2004.Google ScholarCross Ref
M. Geimer, F. Wolf, B. J. N. Wylie, E. Ábrahám, D. Becker, and B. Mohr. The Scalasca Performance Toolset Architecture. In Concurrency and Computation: Practice and Experience, 2010. Google ScholarDigital Library
GTC2link. GTC-benchmark in NERSC-8 suite, 2013.Google Scholar
C. L. Janssen, H. Adalsteinsson, and J. P. Kenny. Using Simulation to Design Extremescale Applications and Architectures: Programming Model Exploration. ACM SIGMETRICS, 2011. Google ScholarDigital Library
A. M. Joshi, L. Eeckhout, and L. K. John. The Return of Synthetic Benchmarks. In SPEC Benchmark Workshop, 2008.Google Scholar
J. P. Kenny, G. Hendry, B. Allan, and D. Zhang. Dumpi: The mpi profiler from the sst simulator suite. https://bitbucket.org/jpkenny/dumpi, 2011.Google Scholar
A. Knupfer, R. Brendel, H. Brunst, H. Mix, and W. Nagel. Introducing the Open Trace Format (OTF). In V. Alexandrov, G. Albada, P. Sloot, and J. Dongarra, editors, International Conference on Computational Science, 2006. Google ScholarDigital Library
R. Latham, C. Daley, W. keng Liao, K. Gao, R. Ross, A. Dubey, and A. Choudhary. A case study for scientific i/o: improving the flash astrophysics code. CSD, 5(1):015001, 2012.Google Scholar
J. Logan, S. Klasky, H. Abbasi, Q. Liu, G. Ostrouchov, M. Parashar, N. Podhorszki, Y. Tian, and M. Wolf. Understanding I/O Performance Using I/O Skeletal Applications. In Euro-Par, 2012. Google ScholarDigital Library
M. Noeth, F. Mueller, M. Schulz, and B. de Supinski. Scalable compression and replay of communication traces in massively parallel environments. In IPDPS, 2007.Google Scholar
M. Noeth, P. Ratn, F. Mueller, M. Schulz, and B. R. de Supinski. ScalaTrace: Scalable Compression and Replay of Communication Traces for High-Performance Computing. J. Parallel Distrib. Comput., 2009. Google ScholarDigital Library
F. Pachet, P. Roy, and G. Barbieri. Finite-length Markov Processes with Constraints. In IJCAI, 2011. Google ScholarDigital Library
M. Pedersoli, A. Vedaldi, and J. Gonzalez. A coarse-to-fine approach for fast deformable object detection. In Computer Vision and Pattern Recognition, pages 1353--1360, 2011. Google ScholarDigital Library
L. R. Rabiner and B. H. Juang. An introduction to hidden Markov models. ASSP Magazine, pages 4--15, January 1986.Google ScholarCross Ref
S. Ku, C. S. Chang, and P. H. Diamond. Full-f Gyrokinetic Particle Simulation of Centrally Heated Global ITG Turbulence from Magnetic Axis to Edge Pedestal Top in A Realistic Tokamak Geometry. Nuclear Fusion, 2009.Google Scholar
M. Seltzer, D. Krinsky, K. Smith, and X. Zhang. The Case for Application-Specific Benchmarking. In Hot Topics in Operating Systems, 1999. Google ScholarDigital Library
S. Shao, A. K. Jones, and R. Melhem. A Compiler-based Communication Analysis Approach for Multiprocessor Systems. In IPDPS, 2006. Google ScholarDigital Library
S. Shende and A. D. Malony. TAU: The tau parallel performance system. International Journal of High Performance Computing Applications, 20(2), 2006. Google ScholarDigital Library
H. A. Sturges. The Choice of a Class Interval. Journal of the American Statistical Association, 1926.Google ScholarCross Ref
G. Vahala, M. Soe, B. Zhang, J. Yepez, L. Vahala, J. Carter, and S. Ziegeler. Unitary Qubit Lattice Simulations of Multiscale Phenomena in Quantum Turbulence. In Supercomputing, 2011. Google ScholarDigital Library
L. Van Ertvelde and L. Eeckhout. Dispersing Proprietary Applications as Benchmarks Through Code Mutation. ACM SIGOPS OSR, 2008. Google ScholarDigital Library
J. Vetter and F. Mueller. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In IPDPS, 2002. Google ScholarDigital Library
J. S. Vetter and M. O. McCracken. Statistical Scalability Analysis of Communication Operations in Distributed Applications. ACM SIGPLAN, 2001. Google ScholarDigital Library
W. X. Wang and Z. Lin and W. M. Tang and W. W. Lee and S. Ethier and J. L. V. Lewandowski and G. Rewoldt and T. S. Hahm and J. Manickam. Gyro-kinetic Simulation of Global Turbulent Transport Properties in Tokamak Experiments. Physics of Plasmas, 2006.Google Scholar
X. Wu, V. Deshpande, and F. Mueller. ScalaBenchGen: Auto-Generation of Communication Benchmarks Traces. In IPDPS, 2012. Google ScholarDigital Library
X. Wu and F. Mueller. ScalaExtrap: Trace-based Communication Extrapolation for SPMD Programs. In ACM PPoPP, 2011. Google ScholarDigital Library
X. Wu, K. Vijayakumar, F. Mueller, X. Ma, and P. Roth. Probabilistic communication and i/o tracing with deterministic replay at scale. In ICPP, 2011. Google ScholarDigital Library
Q. Xu and J. Subhlok. Construction and Evaluation of Coordinated Performance Skeletons. In HiPC, 2008. Google ScholarDigital Library
Q. Xu, J. Subhlok, R. Zheng, and S. Voss. Logicalization of Communication Traces from Parallel Execution. In IISWC, 2009. Google ScholarDigital Library
L. T. Yang, X. Ma, and F. Mueller. Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution. In Supercomputing, 2005. Google ScholarDigital Library
F. Yu, M. Alkhalaf, and T. Bultan. Stranger: An Automata-Based String Analysis Tool for PHP. In J. Esparza and R. Majumdar, editors, Lecture Notes in Computer Science. 2010. Google ScholarDigital Library
J. Zhai, J. Hu, X. Tang, X. Ma, and W. Chen. Cypress: Combining static and dynamic analysis for top-down communication trace compression. In Supercomputing, 2014. Google ScholarDigital Library
J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng. FACT: Fast Communication Trace Collection for Parallel Applications Through Program Slicing. In Supercomputing, 2009. Google ScholarDigital Library

Index Terms

Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies
2. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Metrics

Recommendations

Auto-generation of communication benchmark traces

Benchmarks are essential for evaluating HPC hardware and software for petascale machines and beyond. But benchmark creation is a tedious manual process. As a result, benchmarks tend to lag behind the development of complex scientific codes. Our work ...
Read More
Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation
Performance evaluation review

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel ...
Read More
Combining phase identification and statistic modeling for automated parallel benchmark generation
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
June 2015
488 pages
ISBN:9781450334860
DOI:10.1145/2745844
General Chairs:
Bill Lin
University of California, San Diego
,
Jun (Jim) Xu
Georgia Tech
,
Program Chairs:
Sudipta Sengupta
Microsoft Research
,
Devavrat Shah
Massachusetts Institute of Technology
ACM SIGMETRICS Performance Evaluation Review Volume 43, Issue 1
Performance evaluation review
June 2015
468 pages
ISSN:0163-5999
DOI:10.1145/2796314
Editors:
Derek Eager
University of Saskatchewan
,
Carey Williamson
University of Calgary
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
asynchronous i/o
benchmark generation
hpc applications
markov chain model
phase identification
traces
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMETRICS '15 Paper Acceptance Rate32of239submissions,13%Overall Acceptance Rate459of2,691submissions,17%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 250
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Auto-generation of communication benchmark traces

Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

Combining phase identification and statistic modeling for automated parallel benchmark generation