ACM Home Page
Please provide us with feedback. Feedback
Utilizing shared data in chip multiprocessors with the Nahalal architecture
Full text PdfPdf (427 KB)
Source
ACM Symposium on Parallel Algorithms and Architectures archive
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures table of contents
Munich, Germany
SESSION: Special track: multicores table of contents
Pages 1-10  
Year of Publication: 2008
ISBN:978-1-59593-973-9
Authors
Zvika Guz  Technion - Israel Institute of Technology, Haifa, Israel
Idit Keidar  Technion - Israel Institute of Technology, Haifa, Israel
Avinoam Kolodny  Technion - Israel Institute of Technology, Haifa, Israel
Uri C. Weiser  Technion - Israel Institute of Technology, Haifa, Israel
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 26,   Downloads (12 Months): 72,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1378533.1378535
What is a DOI?

ABSTRACT

This paper addresses a new cache organization in a Chip Multiprocessors (CMP) environment. We introduce Nahalal, an architecture whose novel floorplan topology partitions cached data according to its usage (shared versus private data), and thus enables fast access to shared data for all processors while preserving the vicinity of private data to each processor. The Nahalal architecture combines the best of both shared caches and private caches, enabling fast accesses to data as in private caches while eliminating the need for inter-cache coherence transactions. Detailed simulations in Simics demonstrate that Nahalal decreases cache access latency by up to 41.1% compared to traditional CMP designs, yielding performance gains of up to 12.65% in run time.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Ho, K. Mai, and M. Horowitz, ''The future of wires,'' Proceedings of IEEE,89(4), April 2001.
 
2
Hammond, B. A. Nayfeh, and K. Olukotun. ''A Single-Chip Multiprocessor''. IEEE Computer, September 1997
 
3
Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger. Clock rate vs. IPC: The end of the road for conventional microprocessors. ISCA--27, June 2000
 
4
WJ Dally and S. Lacy. VLSI Architecture: Past, Present, and Future, In Proceedings of the Advanced Research in VLSI conference, Jan. 1999, pp. 232--241.
 
5
Kim, D. Burger, and S. W. Keckler, ''An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches,'' In ASPLOS X, pages 211--222, Oct. 2002
 
6
Gochman, A. Mendelson, A. Naveh, A, and E. Rotem, "Introduction to Intel® Core" Duo Processor Architecture," Intel Technology Journal, Volume 10, Issue 02. May 2006.
 
7
AMD white paper, ''Key Architectural Features AMD Athlon™ 64 X2 Dual-Core and AMD Athlon™ X2 Dual-Core Processors,'' http://www.amd.com/gb-uk/Processors/ProductInformation/0,,30_118_9485_13041%5E13043,00.html
 
8
AMD technical articles, ''Barcelona's Innovative Architecture Is Driven by a New Shared Cache,'' http://developer.amd.com/article_print.jsp?id=173
 
9
A. Wulf and S.A. McKee, ''Hitting the Memory Wall: Implications of the Obvious,'' Computer Architecture News, vol. 23, no. 1, pp. 14--24, Mar. 1995
 
10
M. Beckmann and D. A. Wood, ''Managing wire delay in large chip multiprocessor caches,'' MICRO 37, Dec. 2004
 
11
Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler, ''A substrate for Flexible CMP Cache Sharing,'' ICS 05, June, 2005
 
12
Guz, I. Keidar, A. Kolodny, U. C. Weiser, "Nahalal: Cache Organization for Chip Multiprocessors", IEEE Computer Architecture Letters, vol. 6, no. 1, May 2007
 
13
M. Beckmann, M. R. Marty, and D. A. Wood, ''ASR: Adaptive Selective Replication for CMP Caches,'' MICRO 39, December 2006
 
14
Howard, ''Garden Cities of To-Morrow,'' London: Swan Sonnenschein & Co. Ltd, 1902
 
15
Y. Morad, U. C. Weiser, A. Kolodny, M. Valero, and E. Ayguadé, ''Performance, Power Efficiency, and Scalability of Asymmetric Cluster Chip Multiprocessors,'' In Computer Architecture Letters, Volume 4, July 2005.
 
16
Chang and G. S. Sohi. ''Cooperative Caching for Chip Multiprocessors,'' ISCA-33, June 2006
 
17
Chishti, M. D. Powell, and T. N. Vijaykumar, ''Optimizing Replication, Communication, and Capacity Allocation in CMPs,'' ISCA32, 2005.
 
18
Brown, R. Kumar, and D. Tullsen. "Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures", 19th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA, San Diego, June 2007
 
19
Jin and S. Cho, ''Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring'', in Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2007.
 
20
R. Marty and M. D. Hill, ''Virtual Hierarchies to Support Server Consolidation'', ISCA-34, June 2007.
 
21
Liu, A. Sivasubramaniam, M. Kandemir, and M. J. Irwin, ''Enhancing L2 organization for CMPs with a center cell,'' IPDPS'06, April 2006.
 
22
Jin, and S. Cho, ''Better than the two: Exceeding private and shared caches via two-dimensional page coloring,'' in Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2007.
 
23
Zhang and K. Asanovic, ''Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors'' ISCA32, 2005.
 
24
S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, and G. Hallberg, ''Simics: A full system simulation platform,'' IEEE Computer, 35(2):50--58, Feb. 2002.
 
25
Ricci, S. Barrus, D. Gebhardt, and R. Balasubramonian, ''Leveraging Bloom Filters for Smart Search Within NUCA Caches'', 7th Workshop on Complexity-Effective Design (WCED), June 2006.
 
26
C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. ISCA-22, June 1995.
 
27
Aslot, M. Domeika, R. Eigenmann, G. Gaertner, W. Jones, and B. Parady. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance. In Workshop on OpenMP Applications and Tools, pages 1--10, July 2001.
 
28
Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Measurement and Modeling of Computer Systems, pages 151--160, June 1998.
 
29
 
30
 
31
J. Marathe, M. F. Spear, C. Heriot, A. Acharya, D. Eisenstat, W. N. Scherer III, and M. L. Scott, "Lowering the Overhead of Nonblocking Software Transactional Memory," TRANSACT 2006
 
32
Kumar, D. Tullsen, P. Ranganathan, N. Jouppi, and K. Farkas. "Single-ISA Heterogeneous Multi-core Architectures for Multithreaded Workload Performance." ISCA-31, June 2004.
 
33
K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, and K. Hazelwood, ''Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation,'' PLDI 2005.

Collaborative Colleagues:
Zvika Guz: colleagues
Idit Keidar: colleagues
Avinoam Kolodny: colleagues
Uri C. Weiser: colleagues