|
ABSTRACT
The memory hierarchy continues to have a substantial effect on application performance. This paper explores the potential of high-level application understanding in improving the performance of modern memory hierarchies, decomposing the often-chaotic address stream of an application into multiple more regular streams. We present two orthogonal methodologies. The first is a system called DTrack that decomposes the dynamic reference stream of a C program by tagging each reference with its global variable or heap call-site name. The second is a technique to determine the correct granularity at which to study the global phase behavior of applications. Applying these twin analysis methods to twelve CSPEC2000 benchmarks, we demonstrate that they reveal data structure interactions that remain obscured with traditional aggregation-based analysis methods. Such a characterization creates a rich profile of an application's memory behavior that highlights the most memory-intensive data structures and program phases, and we illustrate how this profile can lead system and application designers to a deeper understanding of the applications they study.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Santosh G. Abraham , Rabin A. Sugumar , Daniel Windheiser , B. R. Rau , Rajiv Gupta, Predictability of load/store instruction latencies, Proceedings of the 26th annual international symposium on Microarchitecture, p.139-152, December 01-03, 1993, Austin, Texas, United States
|
| |
2
|
Murali Annavaram , Ryan Rakvic , Marzia Polito , Jean-Yves Bouguet , Richard A. Hankins , Bob Davies, The Fuzzy Correlation between Code and Performance Predictability, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.93-104, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.34]
|
| |
3
|
D. Burger and T. M. Austin. The simplescalar tool set version 2.0. Technical Report 1342, Department of Computer Sciences, University of Wisconsin-Madison, June 1997.
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
S. Z. Guyer, D. A. Jiménez, and C. Lin. The C-Breeze compiler infrastructure. Technical Report TR 01-43, Dept. of Computer Sciences, University of Texas at Austin, November 2001.
|
| |
8
|
|
| |
9
|
|
| |
10
|
J. Lau, J. Sampson, E. Perelman, G. Hamerly, and B. Calder. The strong correlation between code signatures and performance. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, March 2005.
|
| |
11
|
J. Lau, S. Schoenmackers, and B. Calder. Structures for phase classification. In Proceedings of the IEEE International Symposiumon Performance Analysis of Systems and Software, March 2004.
|
| |
12
|
|
 |
13
|
Margaret Martonosi , Anoop Gupta , Thomas Anderson, MemSpy: analyzing memory system bottlenecks in programs, Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.1-12, June 01-05, 1992, Newport, Rhode Island, United States
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
 |
24
|
Srikanth T. Srinivasan , Roy Dz-ching Ju , Alvin R. Lebeck , Chris Wilkerson, Locality vs. criticality, Proceedings of the 28th annual international symposium on Computer architecture, p.132-143, June 30-July 04, 2001, Göteborg, Sweden
|
| |
25
|
|
|