ABSTRACT
Programming, understanding, and tuning the performance of large multiprocessor systems is challenging. Experts have difficulty achieving good utilization for applications on large machines. The task of implementing a scalable system such as an operating system or database on large machines is even more challenging. And the importance of achieving good performance on multiprocessor machines is increasing as the number of cores per chip increases and as the size of multiprocessors increases. Crucial to achieving good performance is being able to understand the behavior of the system. We have developed an efficient, unified, and scalable tracing infrastructure that allows for correctness debugging, performance debugging, and performance monitoring of an operating system. The infrastructure allows variable-length events to be logged without locking and provides random access to the event stream. The infrastructure allows cheap and parallel logging of events by applications, libraries, servers, and the kernel. The infrastructure was designed for K42, a new open-source research kernel designed to scale near perfectly on large cache-coherent 64-bit multiprocessor systems. The techniques are generally applicable, and many of them have been integrated into the Linux Trace Toolkit. In this paper, we describe the implementation of the infrastructure, how we used the facility, e.g., analyzing lock contention, to understand and achieve K42's scalable performance, and the lessons we learned. The infrastructure has been invaluable to achieving great scalability.
- {1} Jonathan Appavoo, Marc Auslander, David Edelsohn, Dilma da Silva, Orran Krieger, Michal Ostrowski, Bryan Rosenburg, Robert W. Wisniewski, and Jimi Xenidis. Providing a Linux API on the scalable K42 kernel. In Freenix, pages 323-336, San Antonio, TX, June 9-14 2003.Google Scholar
- {2} Marc Auslander, David Edelsohn, Dilma da Silva, Orran Krieger, Michal Ostrowski, Bryan Rosenburg, Robert W. Wisniewski, and Jimi Xenidis. K42 Overview. IBM Research, http://www.research.ibm.com/K42, August 2002.Google Scholar
- {3} Marc Auslander, David Edelsohn, Dilma da Silva, Orran Krieger, Michal Ostrowski, Bryan Rosenburg, Robert W. Wisniewski, and Jimi Xenidis. K42's Performance Monitoring and Tracing. IBM Research, http://www.research.ibm.com/K42, August 2002.Google Scholar
- {4} IBM Linux Technology Center. Dynamic probes. http://www- 124.ibm.com/developerworks/oss/linux/projects/dprobes/.Google Scholar
- {5} IBM Corporation. Aix version 3.1 for risc system/6000 performance monitoring and tuning guide. Technical Report SC23-2365- 00, IBM Corporation.Google Scholar
- {6} Dyninst. An application program interface (api) for runtime code generation. http://www.dyninst.org/.Google Scholar
- {7} D. Kohr, X. Zhang, M. Rahman, and D. Reed. A performance study of an object-oriented parallel operating system. In Proceedings of the 27th Hawaii International Conference on System Sciences , November 27 2000.Google Scholar
- {8} Barton P. Miller, Mark D. Callaghan, Jonathan M. Cargille, Jeffrey K. Hollingsworth, R. Bruce Irvin, Karen L. Karavanic, Krishna Kunchithapadam, and Tia Newhall. The paradyn parallel performance measurement tools. IEEE Computer, 28(11):37-46, November 1995. Google ScholarDigital Library
- {9} Daniel A. Reed, James Arendt, Ruth Aydt, Thomas Birkett, David Jensen, Tara Madhyastha, Bobby Nazief, Ted Nelson, Robert Olson, and Brian Totty. Scalable performance environments for parallel systems. In Sixth Distributed Memory Computing Conference , pages 562-569, Portland OR, April-May 1991.Google ScholarCross Ref
- {10} Craig A. N. Soules, Jonathan Appavoo, Kevin Hui, Robert W. Wisniewski, Dilma da Silva, Gregory R. Ganger, Orran Krieger, Michael Stumm, Marc Auslander, Michal Ostrowski, Bryan Rosenburg, and Jimi Xenidis. System support for online reconfiguration. In USENIX, pages 141-154, San Antonio, TX, June 9-14 2003.Google Scholar
- {11} John Stasko, John Domingue, Marc H. Brown, and Blaine A. Price. Software Visualization, volume 1, chapter 20 Visualization of Dynamics in Real World Software Systems, Doug Kimelman, Bryan Rosenburg, and Tova Roth, pages 293-314. MIT Press, 1998.Google Scholar
- {12} Ariel Tamches and Barton P. Miller. Fine-grained dynamic instrumentation of commodity operating system kernels. In OSDI 99: Third Symposium on Operating Systems Design and Implementation , pages 117-130, New Orleans, February 1999. Google ScholarDigital Library
- {13} Christian Thiffault, Michael Voss, Steven T. Healey, and Seon Wook Kim. Dynamic instrumentation of large-scale mpi/openmp applications. In IPDPS 2003: International Parallel and Distributed Processing Symposium, page to appear, Nice France, April 2003. Google ScholarDigital Library
- {14} Jeffrey S. Vetter and Daniel A. Reed. Managing performance analysis with dynamic statistical projection pursuit. In SC 99 Proceedings of SC 99, page electronic publication, Portland OR, November 1999. Google ScholarDigital Library
- {15} Robert W. Wisniewski and Luis F. Stevens. A model and tools for supporting parallel real-time applications in unix environments. In Proceedings of The 12th IEEE Real-Time Technology and Applications Symposium, pages 126-133, Chicago Illinois, May 15-17 1995. Google ScholarDigital Library
- {16} Karim Yaghmour. Ltt web page. http://www.opersys.com/LTT/index.html.Google Scholar
- {17} Karim Yaghmour. Measuring and characterizing system behavior using kernel-level event logging. In Proceedings of the 2000 USENIX Annual Technical Conference, June 2000. Google ScholarDigital Library
- {18} Tom Zanussi, Karim Yaghmour, Robert W. Wisniewski, Michel Dagenais, and Richard Moore. An efficient unified approach for trasmitting data from kernel to user space. In OLS 2003 - Ottawa Linux Symposium, page to appear, July 23-26 2003.Google Scholar
Recommendations
Silicon-photonic network architectures for scalable, power-efficient multi-chip systems
ISCA '10: Proceedings of the 37th annual international symposium on Computer architectureScaling trends of logic, memories, and interconnect networks lead towards dense many-core chips. Unfortunately, process yields and reticle sizes limit the scalability of large single-chip systems. Multi-chip systems break free of these areal limits, but ...
Silicon-photonic network architectures for scalable, power-efficient multi-chip systems
ISCA '10Scaling trends of logic, memories, and interconnect networks lead towards dense many-core chips. Unfortunately, process yields and reticle sizes limit the scalability of large single-chip systems. Multi-chip systems break free of these areal limits, but ...
Comments