research-article

Challenges and Solutions for Tracing Storage Systems: A Case Study with Spectrum Scale

Authors:
Marc-André Vef

Johannes Gutenberg University Mainz, Data Center (ZDV), Mainz, Germany

Johannes Gutenberg University Mainz, Data Center (ZDV), Mainz, Germany
View Profile

,
Vasily Tarasov

IBM Research, San Jose, CA

IBM Research, San Jose, CA
View Profile

,
Dean Hildebrand

IBM Research, San Jose, CA

IBM Research, San Jose, CA
View Profile

,
André Brinkmann

Johannes Gutenberg University Mainz, Data Center (ZDV), Mainz, Germany

Johannes Gutenberg University Mainz, Data Center (ZDV), Mainz, Germany
View Profile

Authors Info & Claims

ACM Transactions on Storage Volume 14 Issue 2Article No.: 18pp 1–24https://doi.org/10.1145/3149376

Published:12 April 2018Publication History

ACM Transactions on Storage

Abstract

IBM Spectrum Scale’s parallel file system General Parallel File System (GPFS) has a 20-year development history with over 100 contributing developers. Its ability to support strict POSIX semantics across more than 10K clients leads to a complex design with intricate interactions between the cluster nodes. Tracing has proven to be a vital tool to understand the behavior and the anomalies of such a complex software product. However, the necessary trace information is often buried in hundreds of gigabytes of by-product trace records. Further, the overhead of tracing can significantly impact running applications and file system performance, limiting the use of tracing in a production system.

In this research article, we discuss the evolution of the mature and highly scalable GPFS tracing tool and present the exploratory study of GPFS’ new tracing interface, FlexTrace, which allows developers and users to accurately specify what to trace for the problem they are trying to solve. We evaluate our methodology and prototype, demonstrating that the proposed approach has negligible overhead, even under intensive I/O workloads and with low-latency storage devices.

References

A. Aggarwal, W. Scott, E. Rustici, D. Bucciero, A. Haskins, and W. Matthews. 1997. Method and apparatus for determining a communications path between two nodes in an Internet Protocol (IP) network. (Oct. 7 1997). Patent No. 5,675,741.Google Scholar
Matt Ahrens, Val Henson Jeff Bonwick, Mark Maybee, and Mark Shellenbaum. 2003. The zettabyte file system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’03).Google Scholar
Akshat Aranya, Charles P. Wright, and Erez Zadok. 2004. Tracefs: A file system to trace them all. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’04). 129--145. Google ScholarDigital Library
Daniel Becker, Felix Wolf, Wolfgang Frings, Markus Geimer, Brian JN Wylie, and Bernd Mohr. 2007. Automatic trace-based performance analysis of metacomputing applications. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2007 (IPDPS’07). IEEE, 1--10.Google ScholarCross Ref
BeeGFS configurations 2017. BeeGFS logging configurations. Retrieved from https://git.beegfs.com/pub/v6/blob/6.1/fhgfs_client_module/build/dist/etc/beegfs-client.conf#L224.Google Scholar
John Bent, Garth Gibson, Gary Grider, Ben McClelland, Paul Nowoczynski, James Nunez, Milo Polte, and Meghan Wingate. 2009. PLFS: A checkpoint filesystem for parallel applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, 21. Google ScholarDigital Library
Dean Michael Berris, Alistair Veitch, Nevin Heintze, Eric Anderson, and Ning Wang. 2016. XRay: A Function Call Tracing System. Technical Report.Google Scholar
Peter J. Braam and Philip Schwan. 2002. Lustre: The intergalactic file system. In Proceedings of the Ottawa Linux Symposium. 50--54.Google Scholar
Holger Brunst, Manuela Winkler, Wolfgang E Nagel, and Hans-Christian Hoppe. 2001. Performance optimization for large scale computing: The scalable VAMPIR approach. In Proceedings of the International Conference on Computational Science. Springer, 751--760. Google ScholarDigital Library
Bryan Cantrill, Michael W Shapiro, Adam H Leventhal, and others. 2004. Dynamic instrumentation of production systems. In Proceedings of the USENIX Annual Technical Conference, General Track. 15--28. Google ScholarDigital Library
Philip Carns, Robert Latham, Robert Ross, Kamil Iskra, Samuel Lang, and Katherine Riley. 2009. 24/7 characterization of petascale I/O workloads. In Proceedings of the IEEE International Conference on Cluster Computing and Workshops, 2009 (CLUSTER’09). IEEE, 1--10.Google ScholarCross Ref
Philip Carns, Yushu Yao, Kevin Harms, Robert Latham, Robert Ross, and Katie Antypas. 2013. Production I/O characterization on the cray XE6. In Proceedings of the Cray User Group Meeting, Vol. 2013.Google Scholar
Ceph tracing 2016. Ceph Logging and Debugging. Retrieved from http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/?highlight=dout.Google Scholar
W. Cohen. 2005. Gaining insight into the linux kernel with kprobes. RedHat Mag.5 (March 2005).Google Scholar
Jonathan Corbet. 2013. BF tracing filters. Retrieved from https://lwn.net/Articles/575531/.Google Scholar
Mathieu Desnoyers and Michel R. Dagenais. 2006. The LTTng tracer : A low impact performance and behavior monitor for GNU/Linux. In Proceedings of the Ottawa Linux Symposium. 209--223.Google Scholar
Frank Ch. Eigler. 2006. Problem solving with systemtap. In Proceedings of the Ottawa Linux Symposium. 261--268.Google Scholar
D. Ellard, J. Ledlie, P. Malkani, and M. Seltzer. 2003. Passive NFS tracing of email and research workloads. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’03). USENIX Association, San Francisco, CA. Google ScholarDigital Library
D. Ellard and M. Seltzer. 2003. New NFS tracing tools and techniques for system analysis. In Proceedings of the Annual USENIX Conference on Large Installation Systems Administration. USENIX Association, San Diego, CA. Google ScholarDigital Library
Úlfar Erlingsson, Marcus Peinado, Simon Peter, Mihai Budiu, and Gloria Mainar-Ruiz. 2012. Fay: Extensible distributed tracing from kernels to clusters. ACM Trans. Comput. Syst. 30, 4 (2012), 13. Google ScholarDigital Library
Rodrigo Fonseca, George Porter, Randy H Katz, Scott Shenker, and Ion Stoica. 2007. X-trace: A pervasive network tracing framework. In Proceedings of the 4th USENIX Conference on Networked Systems Design 8 Implementation. USENIX Association, 20. Google ScholarDigital Library
Wolfgang Frings, Felix Wolf, and Ventsislav Petkov. 2009. Scalable massively parallel I/O to task-local files. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. IEEE, 1--11. Google ScholarDigital Library
Dennis Michael Geels, Gautam Altekar, Scott Shenker, and Ion Stoica. 2006. Replay Debugging for Distributed Applications. Ph.D. thesis. University of California, Berkeley.Google Scholar
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003 (SOSP’03). 29--43. Google ScholarDigital Library
Gluster 2017. Gluster storage. Retrieved from https://www.redhat.com/en/technologies/storage/gluster.Google Scholar
Gluster storage logging 2017. Managing Red Hat storage logs. Retrieved from https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/chap-Managing_Red_Hat_Storage_Logs.html.Google Scholar
Google. 2015. Glog -- C++ implementation of the Google logging module. Retrieved from https://github.com/google/glog.Google Scholar
Brendan Gregg and Jim Mauro. 2011. DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD. Prentice Hall Computer. Google ScholarDigital Library
Roger L. Haskin and Frank B. Schmuck. 1996. The tiger shark file system. In Proceedings of the Conference on Technologies for the Information Superhighway (Compcon’96). IEEE, 226--231. Google ScholarDigital Library
Jan Heichler. 2014. An introduction to BeeGFS. (2014).Google Scholar
Dean Hildebrand and Frank Schmuck. 2014. Chapter 9—GPFS. In High Performance Parallel I/O, Quincey Koziol Prabhat (Ed.). CRC Press.Google Scholar
D. Hildebrand and F. Schmuck. 2015. On making GPFS truly general. ;login: The USENIX Mag. 40, 3 (June 2015), 16--19.Google Scholar
IBM. 2016. IBM Spectrum Scale Version 4 Release 2.0. Administration and Programming Reference.Google Scholar
ibmessflash 2017. IBM Elastic Storage Server delivers flash-based storage models and Power I/O and server enhancements. Retrieved from https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=an%&subtype===ca&appname===gpateam&supplier===877&letternum===ENUSZG17-0068.Google Scholar
Kernel summit 2016. Kernel Summit 2011 Summary. Retrieved from http://lwn.net/Articles/464268/.Google Scholar
Seong Jo Kim, Seung Woo Son, Wei-keng Liao, Mahmut Kandemir, Rajeev Thakur, and Alok Choudhary. 2012. IOPin: Runtime profiling of parallel I/O in HPC systems. In Proceedings of the Conference on High Performance Computing, Networking, Storage and Analysis (SCC’12). IEEE, 18--23. Google ScholarDigital Library
R. Krishnakumar. 2005. Kernel korner: kprobes—A kernel debugger. Linux J. 2005, 133 (2005), 11. Google ScholarDigital Library
S. C. Lee and C. Shields. 2001. Tracing the source of network attack: A technical, legal and societal problem. In Proceedings of the 2001 IEEE Workshop on Information Assurance and Security, Vol. 6. Citeseer.Google Scholar
Linux tracepoints 2016. Using the Linux Kernel Tracepoints. Retrieved from https://www.kernel.org/doc/Documentation/trace/tracepoints.txt.Google Scholar
Lmbench 2013. lmbench. Retrieved from https://sourceforge.net/projects/lmbench/.Google Scholar
Lmbench-cache 2015. lmbench cache benchmark. Retrieved from http://lmbench.sourceforge.net/cgi-bin/man?keyword=cache&section===8.Google Scholar
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In ACM SIGPLAN Notices, Vol. 40. ACM, 190--200. Google ScholarDigital Library
Xiaoqing Luo, Frank Mueller, Philip Carns, Jonathan Jenkins, Robert Latham, Robert Ross, and Shane Snyder. 2017. ScalaIOExtrap: Elastic I/O tracing and extrapolation. In Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS’17). IEEE, 585--594.Google ScholarCross Ref
Lustre 2016. Lustre File System. Retrieved from http://www.lustre.org.Google Scholar
Lustre tracing 2016. Lustre Diagnostic and Debugging Tools. Retrieved from http://wiki.lustre.org/index.php/Diagnostic_and_Debugging_Tools.Google Scholar
Huong Luu, Marianne Winslett, William Gropp, Robert Ross, Philip Carns, Kevin Harms, Mr Prabhat, Suren Byna, and Yushu Yao. 2015. A multiplatform study of I/O behavior on petascale supercomputers. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. ACM, 33--44. Google ScholarDigital Library
Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. 2015. Pivot tracing: Dynamic causal monitoring for distributed systems. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 378--393. Google ScholarDigital Library
Mdtest 2015. MDtest metadata benchmark. Retrieved from https://github.com/MDTEST-LANL/mdtest.Google Scholar
John Meehan, Cansu Aslantas, Stan Zdonik, Nesime Tatbul, and Jiang Du. 2017. Data ingestion for the connected world. In Proceedings of the Conference on Innovative Data Systems Research (CIDR’17).Google Scholar
Michael P. Mesnier, Matthew Wachs, Raja R. Simbasivan, Julio Lopez, James Hendricks, Gregory R. Ganger, and David R. O’hallaron. 2007. Trace: Parallel trace replay with approximate causal events. USENIX. Google ScholarDigital Library
Michael Noeth, Prasun Ratn, Frank Mueller, Martin Schulz, and Bronis R. de Supinski. 2009. ScalaTrace: Scalable compression and replay of communication traces for high-performance computing. J. Parallel Distrib. Comput. 69, 8 (2009), 696--710. Google ScholarDigital Library
Sarp Oral and Gautam Shah. 2016. Spectrum Scale Enhancements for CORAL. Presentation slides at Supercomputing’16. Retrieved from http://files.gpfsug.org/presentations/2016/SC16/11_Sarp_Oral_Gautam_Shah_Spectrum_Scale_Enhancements_for_CORAL_v2.pdf.Google Scholar
Swapnil Patil and Garth A Gibson. 2011. Scale and concurrency of GIGA+: File system directories with millions of files.. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’11), Vol. 11. 13--13. Google ScholarDigital Library
Swapnil Patil, Kai Ren, and Garth Gibson. 2012. A case for scaling HPC metadata performance through de-specialization. In Proceedings of the High Performance Computing, Networking, Storage and Analysis (SCC’12). IEEE, 30--35.Google ScholarCross Ref
Perf 2015. Linux Perf. Retrieved from https://perf.wiki.kernel.org/index.php/Main_Page.Google Scholar
K. Pollack and A. Veitch. 2005. I/O Traces, Tools and Analysis. Retrieved from www.usenix.org/events/fast05/bofs.html#io.Google Scholar
Xuanjia Qiu, Hongxing Li, Chuan Wu, Zongpeng Li, and Francis CM Lau. 2015. Cost-minimizing dynamic migration of content distribution services into hybrid clouds. IEEE Trans. Parallel Distrib. Syst. 26, 12 (2015), 3330--3345. Google ScholarDigital Library
Kai Ren, Qing Zheng, Swapnil Patil, and Garth Gibson. 2014. IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE, 237--248. Google ScholarDigital Library
Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The linux B-tree filesystem. ACM Trans. Stor. 9, 3 (2013), 9. Google ScholarDigital Library
Steven Rostedt. 2009. Finding origins of latencies using ftrace. 11th Real-Time Linux Workshop. 117--130.Google Scholar
s3fs 2017. s3fs home page. Retrieved from https://github.com/s3fs-fuse/s3fs-fuse.Google Scholar
Raja R. Sambasivan, Alice X. Zheng, Michael De Rosa, Elie Krevat, Spencer Whitman, Michael Stroucken, William Wang, Lianghong Xu, and Gregory R. Ganger. 2011. Diagnosing performance changes by comparing request flows. In Proceedings of the USENIX Symposium on Neworked Systems Design and Implementation (NSDI’11). 43--56. Google ScholarDigital Library
Scale developers 2016 (private communication).Google Scholar
F. Schmuck and R. Haskin. 2002. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the First USENIX Conference on File and Storage Technologies (FAST’02). USENIX Association, 231--244. Google ScholarDigital Library
Suchakrapani Datt Sharma and Michel Dagenais. 2016. Enhanced userspace and in-kernel trace filtering for production systems. J. Comput. Sci. Technol. 31, 6 (2016), 1161--1178.Google ScholarCross Ref
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10). IEEE, 1--10. Google ScholarDigital Library
Shane Snyder, Philip Carns, Kevin Harms, Robert Ross, Glenn K. Lockwood, and Nicholas J. Wright. 2016. Modular HPC I/O characterization with darshan. In Proceedings of the 5th Workshop on Extreme-Scale Programming Tools. IEEE Press, 9--17. Google ScholarDigital Library
Strace 2001. strace software home page. Retrieved from https://strace.io/.Google Scholar
Sysdig 2016. Sysdig. Retrieved from http://www.sysdig.org/.Google Scholar
Vasily Tarasov, Santhosh Kumar, Jack Ma, Dean Hildebrand, Anna Povzner, Geoff Kuenning, and Erez Zadok. 2012. Extracting flexible, replayable models from large block traces. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 22. Google ScholarDigital Library
Eno Thereska, Brandon Salmon, John Strunk, Matthew Wachs, Michael Abd-El-Malek, Julio Lopez, and Gregory R Ganger. 2006. Stardust: Tracking activity in a distributed storage system. In ACM SIGMETRICS Performance Evaluation Review, Vol. 34. ACM, 3--14. Google ScholarDigital Library
Tracing 2016. Tracing on Linux. Retrieved from https://events.linuxfoundation.org/images/stories/pdf/lceu2012_zannoni.pdf.Google Scholar
Andrew Uselton, Mark Howison, Nicholas J. Wright, David Skinner, Noel Keen, John Shalf, Karen L. Karavanic, and Leonid Oliker. 2010. Parallel I/O performance: From events to ensembles. In Proceedings of the 2010 IEEE International Symposium on Parallel 8 Distributed Processing (IPDPS’10). IEEE, 1--11.Google ScholarCross Ref
Marc-André Vef. 2016. Analyzing File Create Performance in IBM Spectrum Scale. Master’s thesis. Johannes Gutenberg University Mainz. Retrieved from http://www.staff.uni-mainz.de/vef/pubs/vef2016thesis.pdf.Google Scholar
Marc-André Vef, Vasily Tarasov, Dean Hildebrand, and André Brinkmann. 2016. Tracing of Complex Production Systems: Obstacles and Solutions. System Analytics and Characterization. Retrieved from https://drive.google.com/open?id=0B-75gd4swZPMZ1pOUFBJeWxfVjQ.Google Scholar
Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation. USENIX Association, 307--320. Google ScholarDigital Library
Steven A. Wright, Simon D. Hammond, Simon J. Pennycook, Robert F. Bird, J. A. Herdman, Ian Miller, A. Vadgama, Abhir Bhalerao, and Stephen A. Jarvis. 2012. Parallel file system analysis through application I/O tracing. Comput. J. 56, 2 (2012), 141--155. Google ScholarDigital Library
Jing Xing, Jin Xiong, Ninghui Sun, and Jie Ma. 2009. Adaptive and scalable metadata management to support a trillion files. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, 26. Google ScholarDigital Library
Shuangyang Yang, Walter B. Ligon III, and Elaine C. Quarles. 2011. Scalable distributed directory implementation on orange file system. Proceedings of the IEEE International Workshope on Storage Network Architecture and Parallel I/Os (SNAPI’11).Google Scholar
S. Zhou, H. Da Costa, and A. J. Smith. 1984. A file system tracing package for berkeley UNIX. In Proceedings of the USENIX Summer Conference. Portland, OR, 407--419.Google Scholar
Yingjin Qian, Xi Li, Shuichi Ihara, Lingfang Zeng, Jürgen Kaiser, Tim Süß, and André Brinkmann. 2017. A configurable rule based classful token bucket filter network request scheduler for the lustre file system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC, Denver, CO, USA, 6:1--6:12. Google ScholarDigital Library

Index Terms

Challenges and Solutions for Tracing Storage Systems: A Case Study with Spectrum Scale
1. Social and professional topics
  1. Professional topics
    1. History of computing
      1. History of software
    2. Management of computing and information systems
      1. File systems management
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        File systems management

Recommendations

Large files, small writes, and pNFS
ICS '06: Proceedings of the 20th annual international conference on Supercomputing

Workload characterization studies highlight the prevalence of small and sequential data requests in scientific applications. Parallel file systems excel at large data transfers but sometimes at the expense of small I/O performance. pNFS is an NFSv4.1 ...
Read More
MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS
SC '01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing

MPI-IO/GPFS is an optimized prototype implementation of the I/O chapter of the Message Passing Interface (MPI) 2 standard. It uses the IBM General Parallel File System (GPFS) Release 3 as the underlying file system. This paper describes optimization ...
Read More
Tape storage solutions: meeting growing data demands
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Storage Volume 14, Issue 2
May 2018
210 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3208078
Editor:
Sam H. Noh
Ulsan National Institute of Science and Technology, Ulsan, Korea
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 April 2018
Published in tos Volume 14, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPFS
Parallel file system
performance
trace analysis
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 403
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Challenges and Solutions for Tracing Storage Systems: A Case Study with Spectrum Scale

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

Large files, small writes, and pNFS

MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS

Tape storage solutions: meeting growing data demands

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Challenges and Solutions for Tracing Storage Systems: A Case Study with Spectrum Scale

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

Large files, small writes, and pNFS

MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS

Tape storage solutions: meeting growing data demands

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media