skip to main content
research-article

Challenges and Solutions for Tracing Storage Systems: A Case Study with Spectrum Scale

Published:12 April 2018Publication History
Skip Abstract Section

Abstract

IBM Spectrum Scale’s parallel file system General Parallel File System (GPFS) has a 20-year development history with over 100 contributing developers. Its ability to support strict POSIX semantics across more than 10K clients leads to a complex design with intricate interactions between the cluster nodes. Tracing has proven to be a vital tool to understand the behavior and the anomalies of such a complex software product. However, the necessary trace information is often buried in hundreds of gigabytes of by-product trace records. Further, the overhead of tracing can significantly impact running applications and file system performance, limiting the use of tracing in a production system.

In this research article, we discuss the evolution of the mature and highly scalable GPFS tracing tool and present the exploratory study of GPFS’ new tracing interface, FlexTrace, which allows developers and users to accurately specify what to trace for the problem they are trying to solve. We evaluate our methodology and prototype, demonstrating that the proposed approach has negligible overhead, even under intensive I/O workloads and with low-latency storage devices.

References

  1. A. Aggarwal, W. Scott, E. Rustici, D. Bucciero, A. Haskins, and W. Matthews. 1997. Method and apparatus for determining a communications path between two nodes in an Internet Protocol (IP) network. (Oct. 7 1997). Patent No. 5,675,741.Google ScholarGoogle Scholar
  2. Matt Ahrens, Val Henson Jeff Bonwick, Mark Maybee, and Mark Shellenbaum. 2003. The zettabyte file system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’03).Google ScholarGoogle Scholar
  3. Akshat Aranya, Charles P. Wright, and Erez Zadok. 2004. Tracefs: A file system to trace them all. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’04). 129--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Daniel Becker, Felix Wolf, Wolfgang Frings, Markus Geimer, Brian JN Wylie, and Bernd Mohr. 2007. Automatic trace-based performance analysis of metacomputing applications. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2007 (IPDPS’07). IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  5. BeeGFS configurations 2017. BeeGFS logging configurations. Retrieved from https://git.beegfs.com/pub/v6/blob/6.1/fhgfs_client_module/build/dist/etc/beegfs-client.conf#L224.Google ScholarGoogle Scholar
  6. John Bent, Garth Gibson, Gary Grider, Ben McClelland, Paul Nowoczynski, James Nunez, Milo Polte, and Meghan Wingate. 2009. PLFS: A checkpoint filesystem for parallel applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, 21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dean Michael Berris, Alistair Veitch, Nevin Heintze, Eric Anderson, and Ning Wang. 2016. XRay: A Function Call Tracing System. Technical Report.Google ScholarGoogle Scholar
  8. Peter J. Braam and Philip Schwan. 2002. Lustre: The intergalactic file system. In Proceedings of the Ottawa Linux Symposium. 50--54.Google ScholarGoogle Scholar
  9. Holger Brunst, Manuela Winkler, Wolfgang E Nagel, and Hans-Christian Hoppe. 2001. Performance optimization for large scale computing: The scalable VAMPIR approach. In Proceedings of the International Conference on Computational Science. Springer, 751--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bryan Cantrill, Michael W Shapiro, Adam H Leventhal, and others. 2004. Dynamic instrumentation of production systems. In Proceedings of the USENIX Annual Technical Conference, General Track. 15--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Philip Carns, Robert Latham, Robert Ross, Kamil Iskra, Samuel Lang, and Katherine Riley. 2009. 24/7 characterization of petascale I/O workloads. In Proceedings of the IEEE International Conference on Cluster Computing and Workshops, 2009 (CLUSTER’09). IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  12. Philip Carns, Yushu Yao, Kevin Harms, Robert Latham, Robert Ross, and Katie Antypas. 2013. Production I/O characterization on the cray XE6. In Proceedings of the Cray User Group Meeting, Vol. 2013.Google ScholarGoogle Scholar
  13. Ceph tracing 2016. Ceph Logging and Debugging. Retrieved from http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/?highlight=dout.Google ScholarGoogle Scholar
  14. W. Cohen. 2005. Gaining insight into the linux kernel with kprobes. RedHat Mag.5 (March 2005).Google ScholarGoogle Scholar
  15. Jonathan Corbet. 2013. BF tracing filters. Retrieved from https://lwn.net/Articles/575531/.Google ScholarGoogle Scholar
  16. Mathieu Desnoyers and Michel R. Dagenais. 2006. The LTTng tracer : A low impact performance and behavior monitor for GNU/Linux. In Proceedings of the Ottawa Linux Symposium. 209--223.Google ScholarGoogle Scholar
  17. Frank Ch. Eigler. 2006. Problem solving with systemtap. In Proceedings of the Ottawa Linux Symposium. 261--268.Google ScholarGoogle Scholar
  18. D. Ellard, J. Ledlie, P. Malkani, and M. Seltzer. 2003. Passive NFS tracing of email and research workloads. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’03). USENIX Association, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Ellard and M. Seltzer. 2003. New NFS tracing tools and techniques for system analysis. In Proceedings of the Annual USENIX Conference on Large Installation Systems Administration. USENIX Association, San Diego, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Úlfar Erlingsson, Marcus Peinado, Simon Peter, Mihai Budiu, and Gloria Mainar-Ruiz. 2012. Fay: Extensible distributed tracing from kernels to clusters. ACM Trans. Comput. Syst. 30, 4 (2012), 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rodrigo Fonseca, George Porter, Randy H Katz, Scott Shenker, and Ion Stoica. 2007. X-trace: A pervasive network tracing framework. In Proceedings of the 4th USENIX Conference on Networked Systems Design 8 Implementation. USENIX Association, 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Wolfgang Frings, Felix Wolf, and Ventsislav Petkov. 2009. Scalable massively parallel I/O to task-local files. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. IEEE, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dennis Michael Geels, Gautam Altekar, Scott Shenker, and Ion Stoica. 2006. Replay Debugging for Distributed Applications. Ph.D. thesis. University of California, Berkeley.Google ScholarGoogle Scholar
  24. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003 (SOSP’03). 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Gluster 2017. Gluster storage. Retrieved from https://www.redhat.com/en/technologies/storage/gluster.Google ScholarGoogle Scholar
  26. Gluster storage logging 2017. Managing Red Hat storage logs. Retrieved from https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/chap-Managing_Red_Hat_Storage_Logs.html.Google ScholarGoogle Scholar
  27. Google. 2015. Glog -- C++ implementation of the Google logging module. Retrieved from https://github.com/google/glog.Google ScholarGoogle Scholar
  28. Brendan Gregg and Jim Mauro. 2011. DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD. Prentice Hall Computer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Roger L. Haskin and Frank B. Schmuck. 1996. The tiger shark file system. In Proceedings of the Conference on Technologies for the Information Superhighway (Compcon’96). IEEE, 226--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jan Heichler. 2014. An introduction to BeeGFS. (2014).Google ScholarGoogle Scholar
  31. Dean Hildebrand and Frank Schmuck. 2014. Chapter 9—GPFS. In High Performance Parallel I/O, Quincey Koziol Prabhat (Ed.). CRC Press.Google ScholarGoogle Scholar
  32. D. Hildebrand and F. Schmuck. 2015. On making GPFS truly general. ;login: The USENIX Mag. 40, 3 (June 2015), 16--19.Google ScholarGoogle Scholar
  33. IBM. 2016. IBM Spectrum Scale Version 4 Release 2.0. Administration and Programming Reference.Google ScholarGoogle Scholar
  34. ibmessflash 2017. IBM Elastic Storage Server delivers flash-based storage models and Power I/O and server enhancements. Retrieved from https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=an%&subtype===ca&appname===gpateam&supplier===877&letternum===ENUSZG17-0068.Google ScholarGoogle Scholar
  35. Kernel summit 2016. Kernel Summit 2011 Summary. Retrieved from http://lwn.net/Articles/464268/.Google ScholarGoogle Scholar
  36. Seong Jo Kim, Seung Woo Son, Wei-keng Liao, Mahmut Kandemir, Rajeev Thakur, and Alok Choudhary. 2012. IOPin: Runtime profiling of parallel I/O in HPC systems. In Proceedings of the Conference on High Performance Computing, Networking, Storage and Analysis (SCC’12). IEEE, 18--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Krishnakumar. 2005. Kernel korner: kprobes—A kernel debugger. Linux J. 2005, 133 (2005), 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. C. Lee and C. Shields. 2001. Tracing the source of network attack: A technical, legal and societal problem. In Proceedings of the 2001 IEEE Workshop on Information Assurance and Security, Vol. 6. Citeseer.Google ScholarGoogle Scholar
  39. Linux tracepoints 2016. Using the Linux Kernel Tracepoints. Retrieved from https://www.kernel.org/doc/Documentation/trace/tracepoints.txt.Google ScholarGoogle Scholar
  40. Lmbench 2013. lmbench. Retrieved from https://sourceforge.net/projects/lmbench/.Google ScholarGoogle Scholar
  41. Lmbench-cache 2015. lmbench cache benchmark. Retrieved from http://lmbench.sourceforge.net/cgi-bin/man?keyword=cache&section===8.Google ScholarGoogle Scholar
  42. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In ACM SIGPLAN Notices, Vol. 40. ACM, 190--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xiaoqing Luo, Frank Mueller, Philip Carns, Jonathan Jenkins, Robert Latham, Robert Ross, and Shane Snyder. 2017. ScalaIOExtrap: Elastic I/O tracing and extrapolation. In Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS’17). IEEE, 585--594.Google ScholarGoogle ScholarCross RefCross Ref
  44. Lustre 2016. Lustre File System. Retrieved from http://www.lustre.org.Google ScholarGoogle Scholar
  45. Lustre tracing 2016. Lustre Diagnostic and Debugging Tools. Retrieved from http://wiki.lustre.org/index.php/Diagnostic_and_Debugging_Tools.Google ScholarGoogle Scholar
  46. Huong Luu, Marianne Winslett, William Gropp, Robert Ross, Philip Carns, Kevin Harms, Mr Prabhat, Suren Byna, and Yushu Yao. 2015. A multiplatform study of I/O behavior on petascale supercomputers. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. ACM, 33--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. 2015. Pivot tracing: Dynamic causal monitoring for distributed systems. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 378--393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Mdtest 2015. MDtest metadata benchmark. Retrieved from https://github.com/MDTEST-LANL/mdtest.Google ScholarGoogle Scholar
  49. John Meehan, Cansu Aslantas, Stan Zdonik, Nesime Tatbul, and Jiang Du. 2017. Data ingestion for the connected world. In Proceedings of the Conference on Innovative Data Systems Research (CIDR’17).Google ScholarGoogle Scholar
  50. Michael P. Mesnier, Matthew Wachs, Raja R. Simbasivan, Julio Lopez, James Hendricks, Gregory R. Ganger, and David R. O’hallaron. 2007. Trace: Parallel trace replay with approximate causal events. USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Michael Noeth, Prasun Ratn, Frank Mueller, Martin Schulz, and Bronis R. de Supinski. 2009. ScalaTrace: Scalable compression and replay of communication traces for high-performance computing. J. Parallel Distrib. Comput. 69, 8 (2009), 696--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Sarp Oral and Gautam Shah. 2016. Spectrum Scale Enhancements for CORAL. Presentation slides at Supercomputing’16. Retrieved from http://files.gpfsug.org/presentations/2016/SC16/11_Sarp_Oral_Gautam_Shah_Spectrum_Scale_Enhancements_for_CORAL_v2.pdf.Google ScholarGoogle Scholar
  53. Swapnil Patil and Garth A Gibson. 2011. Scale and concurrency of GIGA+: File system directories with millions of files.. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’11), Vol. 11. 13--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Swapnil Patil, Kai Ren, and Garth Gibson. 2012. A case for scaling HPC metadata performance through de-specialization. In Proceedings of the High Performance Computing, Networking, Storage and Analysis (SCC’12). IEEE, 30--35.Google ScholarGoogle ScholarCross RefCross Ref
  55. Perf 2015. Linux Perf. Retrieved from https://perf.wiki.kernel.org/index.php/Main_Page.Google ScholarGoogle Scholar
  56. K. Pollack and A. Veitch. 2005. I/O Traces, Tools and Analysis. Retrieved from www.usenix.org/events/fast05/bofs.html#io.Google ScholarGoogle Scholar
  57. Xuanjia Qiu, Hongxing Li, Chuan Wu, Zongpeng Li, and Francis CM Lau. 2015. Cost-minimizing dynamic migration of content distribution services into hybrid clouds. IEEE Trans. Parallel Distrib. Syst. 26, 12 (2015), 3330--3345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Kai Ren, Qing Zheng, Swapnil Patil, and Garth Gibson. 2014. IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE, 237--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The linux B-tree filesystem. ACM Trans. Stor. 9, 3 (2013), 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Steven Rostedt. 2009. Finding origins of latencies using ftrace. 11th Real-Time Linux Workshop. 117--130.Google ScholarGoogle Scholar
  61. s3fs 2017. s3fs home page. Retrieved from https://github.com/s3fs-fuse/s3fs-fuse.Google ScholarGoogle Scholar
  62. Raja R. Sambasivan, Alice X. Zheng, Michael De Rosa, Elie Krevat, Spencer Whitman, Michael Stroucken, William Wang, Lianghong Xu, and Gregory R. Ganger. 2011. Diagnosing performance changes by comparing request flows. In Proceedings of the USENIX Symposium on Neworked Systems Design and Implementation (NSDI’11). 43--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Scale developers 2016 (private communication).Google ScholarGoogle Scholar
  64. F. Schmuck and R. Haskin. 2002. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the First USENIX Conference on File and Storage Technologies (FAST’02). USENIX Association, 231--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Suchakrapani Datt Sharma and Michel Dagenais. 2016. Enhanced userspace and in-kernel trace filtering for production systems. J. Comput. Sci. Technol. 31, 6 (2016), 1161--1178.Google ScholarGoogle ScholarCross RefCross Ref
  66. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10). IEEE, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Shane Snyder, Philip Carns, Kevin Harms, Robert Ross, Glenn K. Lockwood, and Nicholas J. Wright. 2016. Modular HPC I/O characterization with darshan. In Proceedings of the 5th Workshop on Extreme-Scale Programming Tools. IEEE Press, 9--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Strace 2001. strace software home page. Retrieved from https://strace.io/.Google ScholarGoogle Scholar
  69. Sysdig 2016. Sysdig. Retrieved from http://www.sysdig.org/.Google ScholarGoogle Scholar
  70. Vasily Tarasov, Santhosh Kumar, Jack Ma, Dean Hildebrand, Anna Povzner, Geoff Kuenning, and Erez Zadok. 2012. Extracting flexible, replayable models from large block traces. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Eno Thereska, Brandon Salmon, John Strunk, Matthew Wachs, Michael Abd-El-Malek, Julio Lopez, and Gregory R Ganger. 2006. Stardust: Tracking activity in a distributed storage system. In ACM SIGMETRICS Performance Evaluation Review, Vol. 34. ACM, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Tracing 2016. Tracing on Linux. Retrieved from https://events.linuxfoundation.org/images/stories/pdf/lceu2012_zannoni.pdf.Google ScholarGoogle Scholar
  73. Andrew Uselton, Mark Howison, Nicholas J. Wright, David Skinner, Noel Keen, John Shalf, Karen L. Karavanic, and Leonid Oliker. 2010. Parallel I/O performance: From events to ensembles. In Proceedings of the 2010 IEEE International Symposium on Parallel 8 Distributed Processing (IPDPS’10). IEEE, 1--11.Google ScholarGoogle ScholarCross RefCross Ref
  74. Marc-André Vef. 2016. Analyzing File Create Performance in IBM Spectrum Scale. Master’s thesis. Johannes Gutenberg University Mainz. Retrieved from http://www.staff.uni-mainz.de/vef/pubs/vef2016thesis.pdf.Google ScholarGoogle Scholar
  75. Marc-André Vef, Vasily Tarasov, Dean Hildebrand, and André Brinkmann. 2016. Tracing of Complex Production Systems: Obstacles and Solutions. System Analytics and Characterization. Retrieved from https://drive.google.com/open?id=0B-75gd4swZPMZ1pOUFBJeWxfVjQ.Google ScholarGoogle Scholar
  76. Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation. USENIX Association, 307--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Steven A. Wright, Simon D. Hammond, Simon J. Pennycook, Robert F. Bird, J. A. Herdman, Ian Miller, A. Vadgama, Abhir Bhalerao, and Stephen A. Jarvis. 2012. Parallel file system analysis through application I/O tracing. Comput. J. 56, 2 (2012), 141--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Jing Xing, Jin Xiong, Ninghui Sun, and Jie Ma. 2009. Adaptive and scalable metadata management to support a trillion files. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, 26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Shuangyang Yang, Walter B. Ligon III, and Elaine C. Quarles. 2011. Scalable distributed directory implementation on orange file system. Proceedings of the IEEE International Workshope on Storage Network Architecture and Parallel I/Os (SNAPI’11).Google ScholarGoogle Scholar
  80. S. Zhou, H. Da Costa, and A. J. Smith. 1984. A file system tracing package for berkeley UNIX. In Proceedings of the USENIX Summer Conference. Portland, OR, 407--419.Google ScholarGoogle Scholar
  81. Yingjin Qian, Xi Li, Shuichi Ihara, Lingfang Zeng, Jürgen Kaiser, Tim Süß, and André Brinkmann. 2017. A configurable rule based classful token bucket filter network request scheduler for the lustre file system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC, Denver, CO, USA, 6:1--6:12. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Challenges and Solutions for Tracing Storage Systems: A Case Study with Spectrum Scale

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Storage
            ACM Transactions on Storage  Volume 14, Issue 2
            May 2018
            210 pages
            ISSN:1553-3077
            EISSN:1553-3093
            DOI:10.1145/3208078
            • Editor:
            • Sam H. Noh
            Issue’s Table of Contents

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 April 2018
            Published in tos Volume 14, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader