research-article

Scalable I/O tracing and analysis

Authors:
Karthik Vijayakumar

North Carolina State University, Raleigh, NC

North Carolina State University, Raleigh, NC
View Profile

,
Frank Mueller

North Carolina State University, Raleigh, NC

North Carolina State University, Raleigh, NC
View Profile

,
Xiaosong Ma

North Carolina State University, Raleigh, NC and Oak Ridge National Laboratory, Oak Ridge, TN

North Carolina State University, Raleigh, NC and Oak Ridge National Laboratory, Oak Ridge, TN
View Profile

,
Philip C. Roth

Oak Ridge National Laboratory, Oak Ridge, TN

Oak Ridge National Laboratory, Oak Ridge, TN
View Profile

PDSW '09: Proceedings of the 4th Annual Workshop on Petascale Data StorageNovember 2009Pages 26–31https://doi.org/10.1145/1713072.1713080

Published:14 November 2009Publication History

PDSW '09: Proceedings of the 4th Annual Workshop on Petascale Data Storage

Pages 26–31

ABSTRACT

As supercomputer performance approached and then surpassed the petaflop level, I/O performance has become a major performance bottleneck for many scientific applications. Several tools exist to collect I/O traces to assist in the analysis of I/O performance problems. However, these tools either produce extremely large trace files that complicate performance analysis, or sacrifice accuracy to collect high-level statistical information. We propose a multi-level trace generator tool, ScalaIOTrace, that collects traces at several levels in the HPC I/O stack. ScalaIOTrace features aggressive trace compression that generates trace files of near constant size for regular I/O patterns and orders of magnitudes smaller for less regular ones. This enables the collection of I/O and communication traces of applications running on thousands of processors.

Our contributions also include automated trace analysis to collect selected statistical information of I/O calls by parsing the compressed trace on-the-fly and time-accurate replay of communication events with MPI-IO calls. We evaluated our approach with the Parallel Ocean Program (POP) climate simulation and the FLASH parallel I/O benchmark. POP uses NetCDF as an I/O library while FLASH I/O uses the parallel HDF5 I/O library, which internally maps onto MPI-IO. We collected MPI-IO and low-level POSIX I/O traces to study application I/O behavior. Our results show constant size trace files of only 145KB irrespective of the number of nodes for FLASH I/O benchmark, which exhibits regular I/O and communication pattern. For POP, we observe up to two orders of magnitude reduction in trace file sizes compared to flat traces. Statistical information gathered reveals insight on the number of I/O and communication calls issued in the POP and FLASH I/O. Such concise traces are unprecedented for isolated I/O and combined I/O plus communication tracing.

References

FLASH I/O benchmark routine. http://www.ucolick.org/zingale/flash_benchmark_io.Google Scholar
Hierarchical data format. http://www.hdfgroup.org/HDF5.Google Scholar
network common data form. http://www.unidata.ucar.edu/software/netcdf/.Google Scholar
X. Gao, A. Snavely, and L. Carter. Path grammar guided trace compression and trace approximation. High-Performance Distributed Computing, International Symposium on, 0:57--68, 2006.Google Scholar
M. Geimer, F. Wolf, B. J. N. Wylie, E. Abraham, D. Becker, and B. Mohr. The scalasca performance toolset architecture. In International Workshop on Scalable Tools for High-End Computing, June 2008.Google Scholar
P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350--360, July 1991. Google ScholarDigital Library
P. W. Jones, P. H. Worley, Y. Yoshida, J. B. White, III, and J. Levesque. Practical performance portability in the parallel ocean program (pop): Research articles. Concurr. Comput. Pract. Exper., 17(10):1317--1327, 2005. Google ScholarDigital Library
A. Knüpfer, R. Brendel, H. Brunst, H. Mix, and W. E. Nagel. Introducing the open trace format (OTF). In International Conference on Computational Science, pages 526--533, May 2006. Google ScholarDigital Library
P. Lu and K. Shen. Multi-layer event trace analysis for parallel i/o performance tuning. In ICPP '07: Proceedings of the 2007 International Conference on Parallel Processing, page 12, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
J. Marathe and F. Mueller. Detecting memory performance bottlenecks via binary rewriting. In Workshop on Binary Translation, Sept. 2002.Google Scholar
MPI-2: Extensions to the message-passing interface. July 1997.Google Scholar
W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPI resources. Supercomputer, 12(1):69--80, 1996.Google Scholar
M. Noeth, F. Mueller, M. Schulz, and B. R. de Supinski. Scalable compression and replay of communication traces in massively parallel environments. In International Parallel and Distributed Processing Symposium, Apr. 2007.Google Scholar
M. Noeth, F. Mueller, M. Schulz, and B. R. de Supinski. Scalatrace: Scalable compression and replay of communication traces in high performance computing. Journal of Parallel Distributed Computing, 69(8):969--710, Aug. 2009. Google ScholarDigital Library
V. Pillet, J. Labarta, T. Cortes, and S. Girona. PARAVER: A tool to visualise and analyze parallel code. In Proceedings of WoTUG-18: Transputer and occam Developments, volume 44 of Transputer and Occam Engineering, pages 17--31, Apr. 1995.Google Scholar
P. Ratn, F. Mueller, B. R. de Supinski, and M. Schulz. Preserving time in large-scale communication traces. In International Conference on Supercomputing, pages 46--55, June 2008. Google ScholarDigital Library
S. S. Shende and A. D. Malony. The tau parallel performance system. Int. J. High Perform. Comput. Appl., 20(2):287--311, 2006. Google ScholarDigital Library
J. S. Vetter and B. R. de Supinski. Dynamic software testing of mpi applications with umpire. In Supercomputing, page 51, 2000. Google ScholarDigital Library

Index Terms

Scalable I/O tracing and analysis
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Communications management
        Input / output
    2. Extra-functional properties
      1. Software performance

Recommendations

Scalable communication tracing for performance analysis of parallel applications
Read More
Scalable public-key tracing and revoking
PODC '03: Proceedings of the twenty-second annual symposium on Principles of distributed computing

Traitor Tracing Schemes constitute a very useful tool against piracy in the context of digital content broadcast. In such multi-recipient encryption schemes, each decryption key is fingerprinted and when a pirate decoder is discovered, the authorities ...
Read More
Scalable fine-grained call path tracing
ICS '11: Proceedings of the international conference on Supercomputing

Applications must scale well to make efficient use of even medium-scale parallel systems. Because scaling problems are often difficult to diagnose, there is a critical need for scalable tools that guide scientists to the root causes of performance ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PDSW '09: Proceedings of the 4th Annual Workshop on Petascale Data Storage
November 2009
58 pages
ISBN:9781605588834
DOI:10.1145/1713072
Conference Chair:
Garth A. Gibson
Carnegie Mellon University and Panasas Inc.
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 November 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate17of41submissions,41%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 48
  Total Citations
  View Citations
- 333
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable I/O tracing and analysis

PDSW '09: Proceedings of the 4th Annual Workshop on Petascale Data Storage

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scalable communication tracing for performance analysis of parallel applications

Scalable public-key tracing and revoking

Scalable fine-grained call path tracing