skip to main content
10.1145/2745844.2745876acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

Authors Info & Claims
Published:15 June 2015Publication History

ABSTRACT

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time- and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPrime, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters to create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPrime benchmarks. They retain the original applications' performance characteristics, in particular their relative performance across platforms. Also, the result benchmarks, already released online, are much more compact and easy-to-port compared to the original applications.

References

  1. APPrime Website. http://www.apprimecodes.org/.Google ScholarGoogle Scholar
  2. DOE INCITE. http://www.doeleadershipcomputing.org/awards/.Google ScholarGoogle Scholar
  3. OLCF Titan. https://www.olcf.ornl.gov/titan/.Google ScholarGoogle Scholar
  4. H. Abbasi, M. Wolf, G. Eisenhauer, S. Klasky, K. Schwan, and F. Zheng. DataStager: Scalable Data Staging Services for Petascale Applications. In Cluster Computing, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Adalsteinsson, S. Cranford, D. A. Evensky, J. P. Kenny, J. Mayo, A. Pinar, and C. L. Janssen. A Simulator for Large-Scale Parallel Computer Architectures. In International Journal of Distributed Systems and Technologies, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. Hpctoolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The nas parallel benchmark: Summary and preliminary results. In Supercomputing, pages 158--165, New York, NY, USA, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Brunst, H.-C. Hoppe, W. E. Nagel, and M. Winkler. Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach. In International Conference on Computational Science, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Casas, R. M. Badia, and J. Labarta. Automatic Phase Detection and Structure Extraction of MPI Applications. International Journal of High Performance Computing Applications, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. P. Doane. Aesthetic Frequency Classifications. The American Statistician, 1976.Google ScholarGoogle Scholar
  11. J. Dujmović. Automatic Generation of Benchmark and Test Workloads. In WOSP/SIPEW, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S. Woodall. Open MPI: Goals, concept, and design of a next generation MPI implementation. In 11th European PVM/MPI Users' Group Meeting, pages 97--104, Budapest, Hungary, September 2004.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. Geimer, F. Wolf, B. J. N. Wylie, E. Ábrahám, D. Becker, and B. Mohr. The Scalasca Performance Toolset Architecture. In Concurrency and Computation: Practice and Experience, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. GTC2link. GTC-benchmark in NERSC-8 suite, 2013.Google ScholarGoogle Scholar
  15. C. L. Janssen, H. Adalsteinsson, and J. P. Kenny. Using Simulation to Design Extremescale Applications and Architectures: Programming Model Exploration. ACM SIGMETRICS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. M. Joshi, L. Eeckhout, and L. K. John. The Return of Synthetic Benchmarks. In SPEC Benchmark Workshop, 2008.Google ScholarGoogle Scholar
  17. J. P. Kenny, G. Hendry, B. Allan, and D. Zhang. Dumpi: The mpi profiler from the sst simulator suite. https://bitbucket.org/jpkenny/dumpi, 2011.Google ScholarGoogle Scholar
  18. A. Knupfer, R. Brendel, H. Brunst, H. Mix, and W. Nagel. Introducing the Open Trace Format (OTF). In V. Alexandrov, G. Albada, P. Sloot, and J. Dongarra, editors, International Conference on Computational Science, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Latham, C. Daley, W. keng Liao, K. Gao, R. Ross, A. Dubey, and A. Choudhary. A case study for scientific i/o: improving the flash astrophysics code. CSD, 5(1):015001, 2012.Google ScholarGoogle Scholar
  20. J. Logan, S. Klasky, H. Abbasi, Q. Liu, G. Ostrouchov, M. Parashar, N. Podhorszki, Y. Tian, and M. Wolf. Understanding I/O Performance Using I/O Skeletal Applications. In Euro-Par, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Noeth, F. Mueller, M. Schulz, and B. de Supinski. Scalable compression and replay of communication traces in massively parallel environments. In IPDPS, 2007.Google ScholarGoogle Scholar
  22. M. Noeth, P. Ratn, F. Mueller, M. Schulz, and B. R. de Supinski. ScalaTrace: Scalable Compression and Replay of Communication Traces for High-Performance Computing. J. Parallel Distrib. Comput., 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. Pachet, P. Roy, and G. Barbieri. Finite-length Markov Processes with Constraints. In IJCAI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Pedersoli, A. Vedaldi, and J. Gonzalez. A coarse-to-fine approach for fast deformable object detection. In Computer Vision and Pattern Recognition, pages 1353--1360, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. R. Rabiner and B. H. Juang. An introduction to hidden Markov models. ASSP Magazine, pages 4--15, January 1986.Google ScholarGoogle ScholarCross RefCross Ref
  26. S. Ku, C. S. Chang, and P. H. Diamond. Full-f Gyrokinetic Particle Simulation of Centrally Heated Global ITG Turbulence from Magnetic Axis to Edge Pedestal Top in A Realistic Tokamak Geometry. Nuclear Fusion, 2009.Google ScholarGoogle Scholar
  27. M. Seltzer, D. Krinsky, K. Smith, and X. Zhang. The Case for Application-Specific Benchmarking. In Hot Topics in Operating Systems, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Shao, A. K. Jones, and R. Melhem. A Compiler-based Communication Analysis Approach for Multiprocessor Systems. In IPDPS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Shende and A. D. Malony. TAU: The tau parallel performance system. International Journal of High Performance Computing Applications, 20(2), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. A. Sturges. The Choice of a Class Interval. Journal of the American Statistical Association, 1926.Google ScholarGoogle ScholarCross RefCross Ref
  31. G. Vahala, M. Soe, B. Zhang, J. Yepez, L. Vahala, J. Carter, and S. Ziegeler. Unitary Qubit Lattice Simulations of Multiscale Phenomena in Quantum Turbulence. In Supercomputing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Van Ertvelde and L. Eeckhout. Dispersing Proprietary Applications as Benchmarks Through Code Mutation. ACM SIGOPS OSR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Vetter and F. Mueller. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In IPDPS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. S. Vetter and M. O. McCracken. Statistical Scalability Analysis of Communication Operations in Distributed Applications. ACM SIGPLAN, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. X. Wang and Z. Lin and W. M. Tang and W. W. Lee and S. Ethier and J. L. V. Lewandowski and G. Rewoldt and T. S. Hahm and J. Manickam. Gyro-kinetic Simulation of Global Turbulent Transport Properties in Tokamak Experiments. Physics of Plasmas, 2006.Google ScholarGoogle Scholar
  36. X. Wu, V. Deshpande, and F. Mueller. ScalaBenchGen: Auto-Generation of Communication Benchmarks Traces. In IPDPS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. X. Wu and F. Mueller. ScalaExtrap: Trace-based Communication Extrapolation for SPMD Programs. In ACM PPoPP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. X. Wu, K. Vijayakumar, F. Mueller, X. Ma, and P. Roth. Probabilistic communication and i/o tracing with deterministic replay at scale. In ICPP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Q. Xu and J. Subhlok. Construction and Evaluation of Coordinated Performance Skeletons. In HiPC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Q. Xu, J. Subhlok, R. Zheng, and S. Voss. Logicalization of Communication Traces from Parallel Execution. In IISWC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. L. T. Yang, X. Ma, and F. Mueller. Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution. In Supercomputing, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. F. Yu, M. Alkhalaf, and T. Bultan. Stranger: An Automata-Based String Analysis Tool for PHP. In J. Esparza and R. Majumdar, editors, Lecture Notes in Computer Science. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Zhai, J. Hu, X. Tang, X. Ma, and W. Chen. Cypress: Combining static and dynamic analysis for top-down communication trace compression. In Supercomputing, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng. FACT: Fast Communication Trace Collection for Parallel Applications Through Program Slicing. In Supercomputing, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
          June 2015
          488 pages
          ISBN:9781450334860
          DOI:10.1145/2745844

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 June 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGMETRICS '15 Paper Acceptance Rate32of239submissions,13%Overall Acceptance Rate459of2,691submissions,17%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader