|
ABSTRACT
This paper addresses a new cache organization in a Chip Multiprocessors (CMP) environment. We introduce Nahalal, an architecture whose novel floorplan topology partitions cached data according to its usage (shared versus private data), and thus enables fast access to shared data for all processors while preserving the vicinity of private data to each processor. The Nahalal architecture combines the best of both shared caches and private caches, enabling fast accesses to data as in private caches while eliminating the need for inter-cache coherence transactions. Detailed simulations in Simics demonstrate that Nahalal decreases cache access latency by up to 41.1% compared to traditional CMP designs, yielding performance gains of up to 12.65% in run time.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ho, K. Mai, and M. Horowitz, ''The future of wires,'' Proceedings of IEEE,89(4), April 2001.
|
| |
2
|
Hammond, B. A. Nayfeh, and K. Olukotun. ''A Single-Chip Multiprocessor''. IEEE Computer, September 1997
|
| |
3
|
Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger. Clock rate vs. IPC: The end of the road for conventional microprocessors. ISCA--27, June 2000
|
| |
4
|
WJ Dally and S. Lacy. VLSI Architecture: Past, Present, and Future, In Proceedings of the Advanced Research in VLSI conference, Jan. 1999, pp. 232--241.
|
| |
5
|
Kim, D. Burger, and S. W. Keckler, ''An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches,'' In ASPLOS X, pages 211--222, Oct. 2002
|
| |
6
|
Gochman, A. Mendelson, A. Naveh, A, and E. Rotem, "Introduction to Intel® Core" Duo Processor Architecture," Intel Technology Journal, Volume 10, Issue 02. May 2006.
|
| |
7
|
AMD white paper, ''Key Architectural Features AMD Athlon™ 64 X2 Dual-Core and AMD Athlon™ X2 Dual-Core Processors,'' http://www.amd.com/gb-uk/Processors/ProductInformation/0,,30_118_9485_13041%5E13043,00.html
|
| |
8
|
AMD technical articles, ''Barcelona's Innovative Architecture Is Driven by a New Shared Cache,'' http://developer.amd.com/article_print.jsp?id=173
|
| |
9
|
A. Wulf and S.A. McKee, ''Hitting the Memory Wall: Implications of the Obvious,'' Computer Architecture News, vol. 23, no. 1, pp. 14--24, Mar. 1995
|
| |
10
|
M. Beckmann and D. A. Wood, ''Managing wire delay in large chip multiprocessor caches,'' MICRO 37, Dec. 2004
|
| |
11
|
Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler, ''A substrate for Flexible CMP Cache Sharing,'' ICS 05, June, 2005
|
| |
12
|
Guz, I. Keidar, A. Kolodny, U. C. Weiser, "Nahalal: Cache Organization for Chip Multiprocessors", IEEE Computer Architecture Letters, vol. 6, no. 1, May 2007
|
| |
13
|
M. Beckmann, M. R. Marty, and D. A. Wood, ''ASR: Adaptive Selective Replication for CMP Caches,'' MICRO 39, December 2006
|
| |
14
|
Howard, ''Garden Cities of To-Morrow,'' London: Swan Sonnenschein & Co. Ltd, 1902
|
| |
15
|
Y. Morad, U. C. Weiser, A. Kolodny, M. Valero, and E. Ayguadé, ''Performance, Power Efficiency, and Scalability of Asymmetric Cluster Chip Multiprocessors,'' In Computer Architecture Letters, Volume 4, July 2005.
|
| |
16
|
Chang and G. S. Sohi. ''Cooperative Caching for Chip Multiprocessors,'' ISCA-33, June 2006
|
| |
17
|
Chishti, M. D. Powell, and T. N. Vijaykumar, ''Optimizing Replication, Communication, and Capacity Allocation in CMPs,'' ISCA32, 2005.
|
| |
18
|
Brown, R. Kumar, and D. Tullsen. "Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures", 19th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA, San Diego, June 2007
|
| |
19
|
Jin and S. Cho, ''Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring'', in Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2007.
|
| |
20
|
R. Marty and M. D. Hill, ''Virtual Hierarchies to Support Server Consolidation'', ISCA-34, June 2007.
|
| |
21
|
Liu, A. Sivasubramaniam, M. Kandemir, and M. J. Irwin, ''Enhancing L2 organization for CMPs with a center cell,'' IPDPS'06, April 2006.
|
| |
22
|
Jin, and S. Cho, ''Better than the two: Exceeding private and shared caches via two-dimensional page coloring,'' in Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2007.
|
| |
23
|
Zhang and K. Asanovic, ''Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors'' ISCA32, 2005.
|
| |
24
|
S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, and G. Hallberg, ''Simics: A full system simulation platform,'' IEEE Computer, 35(2):50--58, Feb. 2002.
|
| |
25
|
Ricci, S. Barrus, D. Gebhardt, and R. Balasubramonian, ''Leveraging Bloom Filters for Smart Search Within NUCA Caches'', 7th Workshop on Complexity-Effective Design (WCED), June 2006.
|
| |
26
|
C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. ISCA-22, June 1995.
|
| |
27
|
Aslot, M. Domeika, R. Eigenmann, G. Gaertner, W. Jones, and B. Parady. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance. In Workshop on OpenMP Applications and Tools, pages 1--10, July 2001.
|
| |
28
|
Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Measurement and Modeling of Computer Systems, pages 151--160, June 1998.
|
| |
29
|
|
| |
30
|
|
| |
31
|
J. Marathe, M. F. Spear, C. Heriot, A. Acharya, D. Eisenstat, W. N. Scherer III, and M. L. Scott, "Lowering the Overhead of Nonblocking Software Transactional Memory," TRANSACT 2006
|
| |
32
|
Kumar, D. Tullsen, P. Ranganathan, N. Jouppi, and K. Farkas. "Single-ISA Heterogeneous Multi-core Architectures for Multithreaded Workload Performance." ISCA-31, June 2004.
|
| |
33
|
K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, and K. Hazelwood, ''Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation,'' PLDI 2005.
|
|