skip to main content
10.1145/263764.263796acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article
Free Access

Shared-memory performance profiling

Authors Info & Claims
Published:21 June 1997Publication History

ABSTRACT

This paper describes a new approach to finding performance bottlenecks in shared-memory parallel programs and its embodiment in the Paradyn Parallel Performance Tools running with the Blizzard fine-grain distributed shared memory system. This approach exploits the underlying system's cache coherence protocol to detect data sharing patterns that indicate potential performance bottlenecks and presents performance measurements in a data-centric manner. As a demonstration, Parodyn helped us improve the performance of a new shared-memory application program by a factor of four.

References

  1. 1.J.B. Carter, J. K. Bennett, and W. Zwaenepoel. Implementation and Performance of Munin. 13th ACM Syrup. on Operating Systems Principles, Oct. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.S. Chandra, B. Richards and J. R. tams. Teapot: Language Support for Writing Memory Coherence Protocols. SIGPLAN Conf. on Programming Languages Design and Implementation (PLDI), Philadelphia, PA, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.S. Chandra and J. R. Larus. Optimizing Communication in HPF Programs for Fine-Grain Distributed Memory. 6th A CM SIGPLAN Syrup. on Principles and Practice of Parallel Programming. Alexis Park Resort, Las Vegas, Nevada, June 18-21, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.T.M. Chilimbi, T. Ball, S. G. Eric, J. R. Larus. StormWatch: A Tool for Visualizing Memory System Protocols. Supercomputing'95, San Diego, CA, December, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.A.L. Cox and R. J. Fowler. Adaptive Cache coherency for Detecting Migratory Shared Data. 20th Annual lnt'l Syrup. on Computer Architecture, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.F. Dahlgren, M. Dubois, and P. Stenstrom. Combined Performance Gains of Simple Cache Protocol Extensions. 21th Annual Int'l Syrup. on Computer Architecture, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.B. Falsafi, A. R. Lebeck, S. K. Reinhardt, I. Schoinas, M. D. Hill, J. R. Larus, A. Rogers, and D. A. Wood. Application- Specific Protocols for user-level Shared Memory. Supercomputing'94, November, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.A. Gupta, M. Martonosi, and T. Anderson. MemSpy: Analyzing memory system bottlenecks in programs. Performance Evaluation Review 20, 1, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.M.D. Hill, J.R. Larus, S.K. Reinhardt, and D.A. Wood. Tempest: A Substrate for Portable Parallel Programs. COMPCON'95, San Francisco, March 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.J.K. Hollingsworth and B.P. Miller, "Dynamic Control of Performance Monitoring on Large Scale Parallel Systems", int'l Conf. on Supercomputing, Tokyo, July 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.J.K. Hollingsworth, B. P. Miller, M. J. R. Gon#alves, O. Naim. Z. Xu and L. Zheng. MDL: A Language and Compiler for Dynamic Program Instrumentation. Tech.l Report, Comp. Science Department, LrW-Madison. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.M. Horowitz, M. Martonosi, T. C. Mowry, and M. D. Smith. Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors. 23rd Annual Int'l Syrup. on Comp. Architecture, Philadelphia PA, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.G.A. Huber and S. Kim. Weighted-Ensemble Brownain Dynamics Simulations for Protein Association Reactions. Biophysical Journal, Vol. 70, January 1996.Google ScholarGoogle Scholar
  14. 14.R.B. Irvin and B.P. Miller, "A Performance Tool for High- Level Parallel Programming Languages" in Progranuning Environments for Massively Parallel Distributed Systems, Birkaeuser Verlag, Basel, K.M. Decker and R.M. Rehmann, eds., 1994.Google ScholarGoogle Scholar
  15. 15.R.B. Irvin and B.P. Miller, "Mapping Performance Data for High-Level and Data Views of Parallel Program Performance", lnt'l Conf. on Supercomputing, Philadelphia, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.K.J. Johnson, M. F. Kaashoek, and D. A. Wallach. CRL: High Performance All-Software Distributed Shared Memory. 15th A CM Syrup. on Operating System Principles (SOSP), Copper Mountain, Colorado, December 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.P. Keleher, S. Dwarkadas, A. Cox, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. IEEE Computer 29, 2, February 1996.Google ScholarGoogle Scholar
  18. 18.J. Kuskin et al. The Stanford FLASH Multiprocessor. 21st Annual Int'l Syrup. on Comp. Architecture, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.A.R. Lebeck and D. A. Wood. Cache profiling and spec benchmarks: A case study. IEEE Computer 27, 10, October 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.T. Lover and R. Clapp. STING: A CC-NUMA Computer System for the Commercial Marketplace. 23th Annual Int'l Symp. on Comp. Architecture, Philadelphia PA, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.M. Martonosi, D. Ofelt and M. Heinrich. Integrating Performance Monitoring and Communication in Parallel Computers. A CM Sigmetrics Conf. on Measurement & Modeling of Comp. Systems, Philadelphia, PA, May, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.B.P. Miller, M. D. Callaghan, J. M. Cargille, J. K. Hollingsworth, R. B. Irvin, K. L. Karavanic, K. Kunchithapadarn, and Tia NewhaU. The Paradyn Performance Tools. IEEE Computer 28, 11, November 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.S.K. Reinhardt, J. R. Larus, D. A. Wood. Typhoon and Tempest: User-Level Shared Memory. 21st Int'l Syrup. on Comp. Architecture, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.D.J. Scales, K. Gharachorloo, and C. A. Thekkath. Shasta: A Low Overhead, Software-Only Approach for Supporting Fine- Grain Shared Memory. 8th lnt'l Conf. on Architectural Support for Programming Languages and Operating Sys. (ASPLOS), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.I. Schoinas, B. Falsafi, A. R. Lebeck, S. K. Reinhardt, J. R. Larus, D. A. Wood. Fine-grained Access Control for Distributed Shared Memory. In Pr 6th lnt'l Conf. on Architectural Support for Prog. Languages and Operating Sys. (ASPLOS), Oct. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.P. Stenstrom, M. Brorsson, and L. Sandberg. An Adaptive Cache Coherence Protocol for Optimized Migratory Sharing. 20th Annual Int'l Syrup. on Comp. Architecture, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.Sun Mieroelectronics. UItraSPARC User's Manual. 1996.Google ScholarGoogle Scholar
  28. 28.Y. Zhou, L. Iftode, K. Li, J. P. Singh, B. R. Toonen, I. Shoinas, M.D. Hill and D. A. Wood. Relaxed Consistency and Coherence Granularity in DSM Systems: A Performance Evaluation. 6th A CM SIGPLAN Syrup. on Principles and Practice of Parallel Programming. Las Vegas, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Shared-memory performance profiling

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
              June 1997
              287 pages
              ISBN:0897919068
              DOI:10.1145/263764

              Copyright © 1997 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 21 June 1997

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              PPOPP '97 Paper Acceptance Rate26of86submissions,30%Overall Acceptance Rate230of1,014submissions,23%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader