skip to main content
10.1145/1531743.1531751acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Core monitors: monitoring performance in multicore processors

Published:18 May 2009Publication History

ABSTRACT

As we reach the limits of single-core computing, we are promised more and more cores in our systems. Modern architectures include many performance counters per core, but few or no inter-core counters. In fact, performance counters were not designed to be exploited by users, as they now are, but simply as aids for hardware debugging and testing during system creation. As such, they tend to be an "after thought" in the design, with no standardization across or within platforms. Nonetheless, given access to these counters, researchers are using them to great advantage [17]. Furthermore, evaluating counters for multicore systems has become a complex and resource consuming task. We propose a Performance Monitoring System consisting of a specialized CPU core designed to allow efficient collection and evaluation of performance data for both static and dynamic optimizations. Our system provides a transparent mechanism to change architectural features dynamically, inform the Operating System of process behaviors, and assist in profiling and debugging. For instance, a piece of hardware watching snoop packets can determine when a write-update cache coherence protocol would be helpful or detrimental to the currently running program. Our system is designed to allow the hardware to feed performance statistics back to software, allowing dynamic architectural adjustments at runtime.

References

  1. S. B. Pentium 4 performance-monitoring features. IEEE Micro, 22(4):72--82, Jul/Aug 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. Binder. Portable and accurate sampling profiling for java. Softw. Pract. Exper., 36(6):615--650, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. The m5 simulator: Modeling networked systems. IEEE Micro, 26(4):52--60, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Chow and Y. Wu. Feedback-directed selection and characterization of compiler optimizations. 2nd Workshop on Feedback Directed Optimization, 1999.Google ScholarGoogle Scholar
  5. Compaq. Alpha architecture handbook. whitpaper, October 1998.Google ScholarGoogle Scholar
  6. J. Dean, J. Hicks, C. Waldspurger, W. Weihl, and G. Chrysos. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In Proc. IEEE/ACM 30th International Symposium on Microarchitecture, pages 292--302, Dec. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Chrysos. ProfileMe: hardware support for instruction-level profiling on out-of-order processors. In MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, pages 292--302, Washington, DC, USA, 1997. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Delzanno. Automatic verification of parameterized cache coherence protocols. In Computer Aided Verification, pages 53--68, Dec. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Fields, R. Bodik, M. Hill, and C. Newburn. Using interaction costs for microarchitectural bottleneck analysis. In Proc. IEEE/ACM 36th International Symposium on Microarchitecture, pages 228--239, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Grahn and P. Stenstrom. Evaluation of a competitive-update cache coherence protocol with migratory data detection. J. Parallel Distrib. Comput., 39(2):168--180, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Heil and J. E. Smith. Relational profiling: Enable thread-level paralelism in virtual machines. Microarchitecture, IEEE/ACM International Symposium on, 0:281, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Helms, T. Bochner, R. Fritz, T. Schlipf, and M. Walz. Event monitoring in a system-on-a-chip. In Proc. 12th Annual IEEE International ASIC/SOC Conference, Sept. 1999.Google ScholarGoogle ScholarCross RefCross Ref
  13. R. Hockauf, J. Jeitner, W. Karl, R. Lindhof, M. Schulz, V. Gonzales, E. Sanquis, and G. Torralba. Design and implementation aspects for the SMiLE hardware monitor. In G. Horn and W. Karl, editors, Proc. of SCI-Europe 2000, The 3rd International Conference on SCI-Based Technology and Research, pages 47--55. SINTEF Electronics and Cybernetics, Aug. 2000. ISBN: 82-595-9964-3, Also available at http://wwwbode.in.tum.de/events/.Google ScholarGoogle Scholar
  14. Intel. Intel Itanium Architecture Software Developer's Manual, 2000.Google ScholarGoogle Scholar
  15. Intel. Intel Architecture Software Developer's Manual Volume 3: System Programming Guide, 2002.Google ScholarGoogle Scholar
  16. W. Karl, M. Leberecht, and M. Schulz. Optimizing data locality for SCI-based PC-clusters with the SMiLE monitoring approach. In Proc. of International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 169--176, Oct. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Martonosi, D. W. Clark, and M. Mesarina. The SHRIMP performance monitor: Design and applications. In ACM SIGMETRICS Performance Evaluation Review, pages 61--69, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Martonosi, D. Ofelt, and M. Heinrich. Integrating performance monitoring and communication in parallel computers. In Proc.ACM International Conference on Measurement and Modeling of Computer Systems, pages 138--147, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Mu, J. Tao, M. Schulz, and S. McKee. Interactive locality optimization on NUMA architectures. In Proc. ACM 2003 Symposium on Software Visualization (SoftVis), pages 133--142,214, July 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Nanda, K. Mak, K. Sugavanam, R. Sahoo, V. Soundararajan, and T. Smith. MemorIES: a programmable, real-time hardware emulation tool for multiprocessor server design. SIGPLAN Not., 35(11):37--48, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Prvulovic and J. Torrellas. Reenact: Using thread-level speculation mechanisms to debug data races in multithreaded codes. In Proc. 30th IEEE/ACM International Symposium on Computer Architecture, pages 110--121, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Salapura. Bluegene/p performance counters. Personal Communication: Paper in Submission, Nov. 2007.Google ScholarGoogle Scholar
  23. V. Salapura, K. Ganesan, A. Gara, M. Gschwind, J. Sexton, and R. Walkup. Next-generation performance counters: Towards monitoring over thousand concurrent events. Performance Analysis of Systems and software, 2008. ISPASS 2008. IEEE International Symposium on, pages 139--146, April 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Sarangi, A. Tiwari, and J. Torrellas. Phoenix: Detecting and recovering from permanent processor design bugs with programmable hardware. In Proc. IEEE/ACM 40th Annual International Symposium on Microarchitecture, pages 26--37, Dec. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Sastry, R. Bodík, and J. Smith. Rapid profiling via stratified sampling. In Proc. 28th IEEE/ACM International Symposium on Computer Architecture, pages 278--289, July 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Schulz, B. White, S. McKee, H. Lee, and J. Jeitner. Owl: Next generation system monitoring. In Proc. ACM Computing Frontiers Conference, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Sprunt. The basics of performance--monitoring hardware. IEEE Micro, pages 64--71, July/August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Sprunt. Pentium 4 performance-monitoring features. IEEE Micro, pages 72--82, July/August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Xu, R. Bodik, and M. Hill. A flight data recorder for enabling full-system multiprocessor deterministic replay. In Proc. 30th IEEE/ACM International Symposium on Computer Architecture, pages 122--135, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In Proc. 11th ACM Symposium on Architectural Support for Programming Languages and Operating Systems, pages 177--188, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. Zhou, F. Qin, W. Liu, Y. Zhou, and J. Torrellas. iwatcher: efficient architectural support for software debugging. Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on, pages 224--235, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Zhou, F. Qin, W. Liu, Y. Zhou, and J. Torrellas. iWatcher: Efficient architectural support for software de-bugging. In Proc. 31st IEEE/ACM International Symposium on Computer Architecture, pages 224--237, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Core monitors: monitoring performance in multicore processors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CF '09: Proceedings of the 6th ACM conference on Computing frontiers
        May 2009
        238 pages
        ISBN:9781605584133
        DOI:10.1145/1531743

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 May 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CF '09 Paper Acceptance Rate26of113submissions,23%Overall Acceptance Rate240of680submissions,35%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader