Article

Decomposing memory performance: data structures and phases

Authors:

Kartik K. Agaram,

Stephen W. Keckler,

Kathryn S. McKinleyAuthors Info & Claims

ISMM '06: Proceedings of the 5th international symposium on Memory management

Pages 95 - 103

https://doi.org/10.1145/1133956.1133970

Published: 10 June 2006 Publication History

Abstract

The memory hierarchy continues to have a substantial effect on application performance. This paper explores the potential of high-level application understanding in improving the performance of modern memory hierarchies, decomposing the often-chaotic address stream of an application into multiple more regular streams. We present two orthogonal methodologies. The first is a system called DTrack that decomposes the dynamic reference stream of a C program by tagging each reference with its global variable or heap call-site name. The second is a technique to determine the correct granularity at which to study the global phase behavior of applications. Applying these twin analysis methods to twelve CSPEC2000 benchmarks, we demonstrate that they reveal data structure interactions that remain obscured with traditional aggregation-based analysis methods. Such a characterization creates a rich profile of an application's memory behavior that highlights the most memory-intensive data structures and program phases, and we illustrate how this profile can lead system and application designers to a deeper understanding of the applications they study.

References

[1]

S. G. Abraham, R. A. Sugumar, D. Windheiser, B. R. Rau, and R. Gupta. Predictability of load/store instruction latencies. In Proceedings of the 28th International Symposium on Microarchitecture, Austin, TX, Dec. 1993.

Digital Library

[2]

M. Annavaram, R. Rakvic, M. Polito, J.-Y. Bouguet, R. Hankins, and B. Davies. The fuzzy correlation between code and performance predictability. In Proceedings of the 37th Annual International Symposium on Microarchitecture, pages 93--104, 2004.

Digital Library

[3]

D. Burger and T. M. Austin. The simplescalar tool set version 2.0. Technical Report 1342, Department of Computer Sciences, University of Wisconsin-Madison, June 1997.

Digital Library

[4]

T. M. Chilimbi and M. Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In Proceeding of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, 2002.

Digital Library

[5]

R. Cooksey, S. Jourdan, and D. Grunwald. A stateless, content-directed data prefetching mechanism. In ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 279--290, New York, NY, USA, 2002. ACM Press.

Digital Library

[6]

R. Desikan, D. Burger, and S. W. Keckler. Measuring experimental error in microprocessor simulation. In Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 266--277, July 2001.

Digital Library

[7]

S. Z. Guyer, D. A. Jiménez, and C. Lin. The C-Breeze compiler infrastructure. Technical Report TR 01-43, Dept. of Computer Sciences, University of Texas at Austin, November 2001.

[8]

I. J. Haikala and P. H. Kutvonen. Split cache organizations. In Performance '84: Proceedings of the Tenth International Symposium on Computer Performance Modelling, Measurement and Evaluation, pages 459--472. North-Holland, 1985.

Digital Library

[9]

M. D. Hill. A case for direct-mapped caches. IEEE Computer, 21(12):25--40, Dec. 1988.

Digital Library

[10]

J. Lau, J. Sampson, E. Perelman, G. Hamerly, and B. Calder. The strong correlation between code signatures and performance. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, March 2005.

Digital Library

[11]

J. Lau, S. Schoenmackers, and B. Calder. Structures for phase classification. In Proceedings of the IEEE International Symposiumon Performance Analysis of Systems and Software, March 2004.

Digital Library

[12]

A. R. Lebeck and D. A. Wood. Cache profiling and the SPEC benchmarks: A case study. IEEE Computer, pages 15--26, Oct. 1994.

Digital Library

[13]

M. Martonosi, A. Gupta, and T. E. Anderson. MemSpy: Analyzing memory system bottlenecks in programs. In Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, pages 1--12, Newport, RI, June 1992.

Digital Library

[14]

K. S. McKinley and O. Temam. A quantitative analysis of loop nest locality. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pages 94--104, Cambridge, MA, Oct. 1996.

Digital Library

[15]

P. Nagpurkar, M. Hind, C. Krintz, P. Sweeney, and V. Rajan. Online phase detection algorithms. In Proceedings of the 4th annual international symposium on code generation and optimization, March 2006.

Digital Library

[16]

A. Roth and G. Sohi. Effective jump-pointer prefetching for linked data structures. In Proceedings of the 26th International Symposiumon Computer Architecture, Atlanta, GA, May 1999.

Digital Library

[17]

S. Rubin, R. Bodik, and T. M. Chilimbi. An efficient profile-analysis framework for data-layout optimizations. In Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, 2002.

Digital Library

[18]

M. L. Seidl and B. G. Zorn. Segregating heap objects by reference behavior and lifetime. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12--23, San Jose, CA, Oct. 1998.

Digital Library

[19]

T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 3--14, September 2001.

Digital Library

[20]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 45--57, Oct. 2002.

Digital Library

[21]

T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In Proceedings of the 30th International Symposium of Computer Architecture, pages 336--347, June 2003.

Digital Library

[22]

A. J. Smith. Second bibliography on cache memories. Computer Architecture News, 19(4):154--182, June 1991.

Digital Library

[23]

J. E. Smith and A. R. Pleszkun. Implementing precise interrupts inpipelined processors. IEEE Trans. Comput., 37(5):562--573, 1988.

Digital Library

[24]

S. Srinivasan, R. Ju, A. R. Lebeck, and C. Wilkerson. Locality vs. criticality. In Proceedings of the 28th International Symposium on Computer Architecture, pages 132--144, June 2001.

Digital Library

[25]

E. van der Deijl, G. Kanbier, O. Temam, and E. Granston. A cache visualization tool. IEEE Computer, pages 71--78, July 1997.

Digital Library

Cited By

Ghose SLi THajinazar NCali DMutlu O(2019)Demystifying Complex Workload-DRAM InteractionsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33667083:3(1-50)Online publication date: 17-Dec-2019
https://dl.acm.org/doi/10.1145/3366708
Eusse JFernandez FLeupers RAscheid G(2016)Concurrent memory subsystem and application optimization for ASIP design2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)10.1109/SAMOS.2016.7818325(1-10)Online publication date: Jul-2016
https://doi.org/10.1109/SAMOS.2016.7818325
Pekhimenko GHuberty TCai RMutlu OGibbons PKozuch MMowry T(2015)Exploiting compressed block size as an indicator of future reuse2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2015.7056021(51-63)Online publication date: Feb-2015
https://doi.org/10.1109/HPCA.2015.7056021
Show More Cited By

Index Terms

Decomposing memory performance: data structures and phases
1. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Metrics
2. Hardware
  1. Robustness

Recommendations

How to Build a Benchmark
ICPE '15: Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering

Standardized benchmarks have become widely accepted tools for the comparison of products and evaluation of methodologies. These benchmarks are created by consortia like SPEC and TPC under confidentiality agreements which provide little opportunity for ...
The DaCapo benchmarks: java benchmarking development and analysis
OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications

Since benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex runtime tradeoffs due to dynamic compilation and garbage collection ...
The DaCapo benchmarks: java benchmarking development and analysis
Proceedings of the 2006 OOPSLA Conference

Since benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex runtime tradeoffs due to dynamic compilation and garbage collection ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISMM '06: Proceedings of the 5th international symposium on Memory management

June 2006

202 pages

ISBN:1595932216

DOI:10.1145/1133956

General Chair:
Erez Petrank
Technion - Israel Institute of Technology
,
Program Chair:
Eliot Moss
University of Massachusetts, Amherst

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ISMM06

Sponsor:

ISMM06: The 2006 International Symposium on Memory Management

June 10 - 11, 2006

Ontario, Ottawa, Canada

Acceptance Rates

Overall Acceptance Rate 72 of 156 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
332
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ghose SLi THajinazar NCali DMutlu O(2019)Demystifying Complex Workload-DRAM InteractionsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33667083:3(1-50)Online publication date: 17-Dec-2019
https://dl.acm.org/doi/10.1145/3366708
Eusse JFernandez FLeupers RAscheid G(2016)Concurrent memory subsystem and application optimization for ASIP design2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)10.1109/SAMOS.2016.7818325(1-10)Online publication date: Jul-2016
https://doi.org/10.1109/SAMOS.2016.7818325
Pekhimenko GHuberty TCai RMutlu OGibbons PKozuch MMowry T(2015)Exploiting compressed block size as an indicator of future reuse2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2015.7056021(51-63)Online publication date: Feb-2015
https://doi.org/10.1109/HPCA.2015.7056021
Chakraborty PPanda PRabbah RRaghunathan A(2013)SPM-SieveProceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.5555/2555729.2555750(1-10)Online publication date: 29-Sep-2013
https://dl.acm.org/doi/10.5555/2555729.2555750
Chakraborty PPanda P(2013)SPM-Sieve: A framework for assisting data partitioning in scratch pad memory based systems2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)10.1109/CASES.2013.6662527(1-10)Online publication date: Sep-2013
https://doi.org/10.1109/CASES.2013.6662527
Sembrant ABlack-Schaffer DHagersten EEidt CHoller ASrinivasan UAmarasinghe S(2012)Phase guided profiling for fast cache modelingProceedings of the Tenth International Symposium on Code Generation and Optimization10.1145/2259016.2259040(175-185)Online publication date: 31-Mar-2012
https://dl.acm.org/doi/10.1145/2259016.2259040
Sembrant ABlack-Schaffer DHagersten E(2012)Phase behavior in serial and parallel applicationsProceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC.2012.6402900(47-58)Online publication date: 4-Nov-2012
https://dl.acm.org/doi/10.1109/IISWC.2012.6402900
Ravindar ASrikant Y(2011)Implications of Program Phase Behavior on Timing AnalysisProceedings of the 2011 15th Workshop on Interaction between Compilers and Computer Architectures10.1109/INTERACT.2011.12(71-79)Online publication date: 12-Feb-2011
https://dl.acm.org/doi/10.1109/INTERACT.2011.12

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents