skip to main content
10.1145/3095770.3095772acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Jitter-Trace: a low-overhead OS noise tracing tool based on Linux Perf

Authors Info & Claims
Published:27 June 2017Publication History

ABSTRACT

Operating System (OS) noise is a well-known phenomenon in which OS activities interfere with the execution of large-scale parallel applications. Due to OS noise, feature-rich software environments such as Linux can seriously affect scalability. Kernel tracing can be used to identify OS noise sources, but until recently it required substantial OS modifications. This paper presents Jitter-Trace, a low-overhead tool that identifies and quantifies jitter sources. Jitter-Trace calculates the jitter generated by each OS activity, providing a complete set of task profiles and histograms of OS noise. This data is essential to implement OS noise mitigation strategies and reduce its impact on scalability. Jitter-Trace leverages the tracing and profiling capabilities of Linux Perf, which is widely available in current Linux distributions. Perf is tightly integrated in the Linux kernel and features a lightweight implementation.

References

  1. Hakan Akkan, Michael Lang, and Lorie M Liebrock. 2012. Stepping towards noiseless Linux environment. In Proceedings of the 2nd international workshop on runtime and operating systems for supercomputers. ACM, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Pete Beckman, Kamil Iskra, Kazutomo Yoshii, and Susan Coghlan. 2006. The influence of operating systems on the performance of collective operations at extreme scale. In Cluster Computing, 2006 IEEE International Conference on. IEEE, 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  3. Pradipta De, Ravi Kothari, and Vijay Mann. 2007. Identifying sources of operating system jitter through fine-grained kernel instrumentation. In Cluster Computing, 2007 IEEE International Conference on. IEEE, 331--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. De, R. Kothari, and V. Mann. 2007. Identifying sources of Operating System Jitter through fine-grained kernel instrumentation. In 2007 IEEE International Conference on Cluster Computing. 331--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mathieu Desnoyers and Michel Dagenais. 2008. LTTng: Tracing across execution layers, from the Hypervisor to user-space. In Linux symposium, Vol. 101.Google ScholarGoogle Scholar
  6. Frank Ch Eigler and Red Hat. 2006. Problem solving with systemtap. In Proc. of the Ottawa Linux Symposium. Citeseer, 261--268.Google ScholarGoogle Scholar
  7. Rahul Garg and Pradipta De. 2006. Impact of noise on scaling of collectives: An empirical evaluation. In International Conference on High-Performance Computing. Springer, 460--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Roberto Gioiosa, Sally A McKee, and Mateo Valero. 2010. Designing OS for HPC applications: Scheduling. In Cluster Computing (CLUSTER), 2010 IEEE International Conference on. IEEE, 78--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Roberto Gioiosa, Fabrizio Petrini, Kei Davis, and Fabien Lebaillif-Delamare. 2004. Analysis of system overhead on parallel computers. In Signal Processing and Information Technology, 2004. Proceedings of the Fourth IEEE International Symposium on. IEEE, 387--390.Google ScholarGoogle ScholarCross RefCross Ref
  10. William Henderson, David Kendall, and Adrian Robson. 2001. Improving the Accuracy of Scheduling Analysis Applied to Distributed Systems Computing Minimal Response Times and Reducing Jitter. Real-Time Systems 20, 1 (2001), 5--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine. 2010. Characterizing the influence of system noise on large-scale applications by simulation. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Suzanne M Kelly and Ron Brightwell. 2005. Software architecture of the light weight kernel, Catamount. In Proceedings of the 2005 Cray User Group Annual Technical Conference. Citeseer, 16--19.Google ScholarGoogle Scholar
  13. R Krishnakumar. 2005. Kernel korner: kprobes-a kernel debugger. Linux Journal 2005, 133 (2005), 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Adam Lackorzynski, Carsten Woeinhold, and Hermann Härtig. 2016. Decoupled: Low-Effort Noise-Free Execution on Commodity Systems. In Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers. ACM, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Branko Lankester, Michael K. Johnson, Michael Shields, Charles Blake, David Mossberger-Tang, and Albert Cahalan. 2014. ps(1) Linux User's Manual.Google ScholarGoogle Scholar
  16. LWN.net. 2010. Using the TRACE EVENT() macro. (2010). https://lwn.net/Articles/379903/Google ScholarGoogle Scholar
  17. Richard McDougall, Jim Mauro, and Brendan Gregg. 2006. Solaris (TM) Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris (Solaris Series). Prentice Hall PTR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Richard J Moore. 2001. A Universal Dynamic Trace for Linux and Other Operating Systems.. In USENIX Annual Technical Conference, FREENIX Track. 297--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alessandro Morari, Roberto Gioiosa, Robert W Wisniewski, Francisco J Cazorla, and Mateo Valero. 2011. A quantitative analysis of OS noise. In Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International. IEEE, 852--863. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alessandro Morari, Roberto Gioiosa, Robert W Wisniewski, Bryan S Rosenburg, Todd A Inglett, and Mateo Valero. 2012. Evaluating the impact of tlb misses on future hpc systems. In Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International. IEEE, 1010--1021. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Aroon Nataraj, Alan Morris, Allen D. Malony, Matthew Sottile, and Pete Beckman. 2007. The Ghost in the Machine: Observing the Effects of Kernel Operation on Parallel Application Performance. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07). ACM, New York, NY, USA, Article 29, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. NERSC. 2013. PSNAP. (2013). http://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/psnap/Google ScholarGoogle Scholar
  23. Yoonho Park, Eric Van Hensbergen, Marius Hillenbrand, Todd Inglett, Bryan Rosenburg, Kyung Dong Ryu, and Robert W Wisniewski. 2012. FusedOS: Fusing LWK performance with FWK functionality in a heterogeneous environment. In Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on. IEEE, 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Petrini, D. J. Kerbyson, and S. Pakin. 2003. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q. In Supercomputing, 2003 ACM/IEEE Conference. 55--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Eli Rosenthal, Edgar A León, and Adam T Moody. 2013. Mitigating system noise with simultaneous multi-threading. Proceedings of SC13, poster session (2013).Google ScholarGoogle Scholar
  26. Steven Rostedt. 2010. Ftrace Linux Kernel Tracing. In Linux Conference Japan.Google ScholarGoogle Scholar
  27. Seetharami Seelam, Liana Fong, Asser Tantawi, John Lewars, John Divirgilio, and Kevin Gildea. 2010. Extreme scale computing: Modeling the impact of system noise in multicore clustered systems. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE, 1--12.Google ScholarGoogle Scholar
  28. Edi Shmueli, George Almasi, Jose Brunheroto, Jose Castanos, Gabor Dozsa, Sameer Kumar, and Derek Lieber. 2008. Evaluating the effect of replacing CNK with Linux on the compute-nodes of Blue Gene/L. In Proceedings of the 22nd annual international conference on Supercomputing. ACM, 165--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Sottile and R. Minnich. 2004. Analysis of Microbenchmarks for Performance Tuning of Clusters. In Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER '04). IEEE Computer Society, Washington, DC, USA, 371--377. http://dl.acm.org/citation.cfm?id=1111682.1111739 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Starovoitov. 2016. Tracing: Accelerate tracing filters with BPF. (2016). http://lwn.net/Articles/598545/Google ScholarGoogle Scholar
  31. Dan Tsafrir, Yoav Etsion, Dror G. Feitelson, and Scott Kirkpatrick. 2005. System Noise, OS Clock Ticks, and Fine-grained Parallel Applications. In Proceedings of the 19th Annual International Conference on Supercomputing (ICS '05). ACM, New York, NY, USA, 303--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Roberto A. Vitillo. 2011. Performance Tools Developments. (2011). http://indico.cern.ch/event/141309/contributions/1369454/attachments/126021/178987/RobertoVitillo_FutureTech_EDI.pdfGoogle ScholarGoogle Scholar
  33. David Wallace. 2007. Compute Node Linux: Overview, progress to date, and roadmap. In Proceedings of the 2007 Cray User Group Annual Technical Conference.Google ScholarGoogle Scholar
  34. Patrick M Widener, Scott Levy, Kurt B Ferreira, and Torsten Hoefler. 2016. On noise and the performance benefit of nonblocking collectives. The International Journal of High Performance Computing Applications 30, 1 (2016), 121--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Perf Wiki. 2015. perf: Linux profiling with performance counters. (2015). https://perf.wiki.kernel.org/index.php/Main_PageGoogle ScholarGoogle Scholar
  36. Karim Yaghmour and Michel R Dagenais. 2000. Measuring And Characterizing System Behavior Using Kernel-Level Event Logging. In Proceedings of the USENIX Annual Technical Conference. Berkeley, CA, USA, Vol. 2. 2. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Jitter-Trace: a low-overhead OS noise tracing tool based on Linux Perf

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ROSS '17: Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017
      June 2017
      62 pages
      ISBN:9781450350860
      DOI:10.1145/3095770

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 June 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate58of169submissions,34%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader