skip to main content
10.1145/1133956.1133970acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
Article

Decomposing memory performance: data structures and phases

Published: 10 June 2006 Publication History

Abstract

The memory hierarchy continues to have a substantial effect on application performance. This paper explores the potential of high-level application understanding in improving the performance of modern memory hierarchies, decomposing the often-chaotic address stream of an application into multiple more regular streams. We present two orthogonal methodologies. The first is a system called DTrack that decomposes the dynamic reference stream of a C program by tagging each reference with its global variable or heap call-site name. The second is a technique to determine the correct granularity at which to study the global phase behavior of applications. Applying these twin analysis methods to twelve CSPEC2000 benchmarks, we demonstrate that they reveal data structure interactions that remain obscured with traditional aggregation-based analysis methods. Such a characterization creates a rich profile of an application's memory behavior that highlights the most memory-intensive data structures and program phases, and we illustrate how this profile can lead system and application designers to a deeper understanding of the applications they study.

References

[1]
S. G. Abraham, R. A. Sugumar, D. Windheiser, B. R. Rau, and R. Gupta. Predictability of load/store instruction latencies. In Proceedings of the 28th International Symposium on Microarchitecture, Austin, TX, Dec. 1993.
[2]
M. Annavaram, R. Rakvic, M. Polito, J.-Y. Bouguet, R. Hankins, and B. Davies. The fuzzy correlation between code and performance predictability. In Proceedings of the 37th Annual International Symposium on Microarchitecture, pages 93--104, 2004.
[3]
D. Burger and T. M. Austin. The simplescalar tool set version 2.0. Technical Report 1342, Department of Computer Sciences, University of Wisconsin-Madison, June 1997.
[4]
T. M. Chilimbi and M. Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In Proceeding of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, 2002.
[5]
R. Cooksey, S. Jourdan, and D. Grunwald. A stateless, content-directed data prefetching mechanism. In ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 279--290, New York, NY, USA, 2002. ACM Press.
[6]
R. Desikan, D. Burger, and S. W. Keckler. Measuring experimental error in microprocessor simulation. In Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 266--277, July 2001.
[7]
S. Z. Guyer, D. A. Jiménez, and C. Lin. The C-Breeze compiler infrastructure. Technical Report TR 01-43, Dept. of Computer Sciences, University of Texas at Austin, November 2001.
[8]
I. J. Haikala and P. H. Kutvonen. Split cache organizations. In Performance '84: Proceedings of the Tenth International Symposium on Computer Performance Modelling, Measurement and Evaluation, pages 459--472. North-Holland, 1985.
[9]
M. D. Hill. A case for direct-mapped caches. IEEE Computer, 21(12):25--40, Dec. 1988.
[10]
J. Lau, J. Sampson, E. Perelman, G. Hamerly, and B. Calder. The strong correlation between code signatures and performance. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, March 2005.
[11]
J. Lau, S. Schoenmackers, and B. Calder. Structures for phase classification. In Proceedings of the IEEE International Symposiumon Performance Analysis of Systems and Software, March 2004.
[12]
A. R. Lebeck and D. A. Wood. Cache profiling and the SPEC benchmarks: A case study. IEEE Computer, pages 15--26, Oct. 1994.
[13]
M. Martonosi, A. Gupta, and T. E. Anderson. MemSpy: Analyzing memory system bottlenecks in programs. In Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, pages 1--12, Newport, RI, June 1992.
[14]
K. S. McKinley and O. Temam. A quantitative analysis of loop nest locality. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pages 94--104, Cambridge, MA, Oct. 1996.
[15]
P. Nagpurkar, M. Hind, C. Krintz, P. Sweeney, and V. Rajan. Online phase detection algorithms. In Proceedings of the 4th annual international symposium on code generation and optimization, March 2006.
[16]
A. Roth and G. Sohi. Effective jump-pointer prefetching for linked data structures. In Proceedings of the 26th International Symposiumon Computer Architecture, Atlanta, GA, May 1999.
[17]
S. Rubin, R. Bodik, and T. M. Chilimbi. An efficient profile-analysis framework for data-layout optimizations. In Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, 2002.
[18]
M. L. Seidl and B. G. Zorn. Segregating heap objects by reference behavior and lifetime. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12--23, San Jose, CA, Oct. 1998.
[19]
T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 3--14, September 2001.
[20]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 45--57, Oct. 2002.
[21]
T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In Proceedings of the 30th International Symposium of Computer Architecture, pages 336--347, June 2003.
[22]
A. J. Smith. Second bibliography on cache memories. Computer Architecture News, 19(4):154--182, June 1991.
[23]
J. E. Smith and A. R. Pleszkun. Implementing precise interrupts inpipelined processors. IEEE Trans. Comput., 37(5):562--573, 1988.
[24]
S. Srinivasan, R. Ju, A. R. Lebeck, and C. Wilkerson. Locality vs. criticality. In Proceedings of the 28th International Symposium on Computer Architecture, pages 132--144, June 2001.
[25]
E. van der Deijl, G. Kanbier, O. Temam, and E. Granston. A cache visualization tool. IEEE Computer, pages 71--78, July 1997.

Cited By

View all
  • (2019)Demystifying Complex Workload-DRAM InteractionsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33667083:3(1-50)Online publication date: 17-Dec-2019
  • (2016)Concurrent memory subsystem and application optimization for ASIP design2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)10.1109/SAMOS.2016.7818325(1-10)Online publication date: Jul-2016
  • (2015)Exploiting compressed block size as an indicator of future reuse2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2015.7056021(51-63)Online publication date: Feb-2015
  • Show More Cited By

Index Terms

  1. Decomposing memory performance: data structures and phases

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ISMM '06: Proceedings of the 5th international symposium on Memory management
        June 2006
        202 pages
        ISBN:1595932216
        DOI:10.1145/1133956
        • General Chair:
        • Erez Petrank,
        • Program Chair:
        • Eliot Moss
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 10 June 2006

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. CPU2000
        2. DTrack
        3. SPEC
        4. data structure
        5. phase
        6. simulation

        Qualifiers

        • Article

        Conference

        ISMM06
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 72 of 156 submissions, 46%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)9
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 06 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2019)Demystifying Complex Workload-DRAM InteractionsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33667083:3(1-50)Online publication date: 17-Dec-2019
        • (2016)Concurrent memory subsystem and application optimization for ASIP design2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)10.1109/SAMOS.2016.7818325(1-10)Online publication date: Jul-2016
        • (2015)Exploiting compressed block size as an indicator of future reuse2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2015.7056021(51-63)Online publication date: Feb-2015
        • (2013)SPM-SieveProceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.5555/2555729.2555750(1-10)Online publication date: 29-Sep-2013
        • (2013)SPM-Sieve: A framework for assisting data partitioning in scratch pad memory based systems2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)10.1109/CASES.2013.6662527(1-10)Online publication date: Sep-2013
        • (2012)Phase guided profiling for fast cache modelingProceedings of the Tenth International Symposium on Code Generation and Optimization10.1145/2259016.2259040(175-185)Online publication date: 31-Mar-2012
        • (2012)Phase behavior in serial and parallel applicationsProceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC.2012.6402900(47-58)Online publication date: 4-Nov-2012
        • (2011)Implications of Program Phase Behavior on Timing AnalysisProceedings of the 2011 15th Workshop on Interaction between Compilers and Computer Architectures10.1109/INTERACT.2011.12(71-79)Online publication date: 12-Feb-2011

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media