ACM Home Page
Please provide us with feedback. Feedback
An efficient cache design for scalable glueless shared-memory multiprocessors
Full text PdfPdf (399 KB)
Source Conference On Computing Frontiers archive
Proceedings of the 3rd conference on Computing frontiers table of contents
Ischia, Italy
SESSION: Cache architectures table of contents
Pages: 321 - 330  
Year of Publication: 2006
ISBN:1-59593-302-6
Authors
Alberto Ros  Universidad de Murcia, Murcia, Spain
Manuel E. Acacio  Universidad de Murcia, Murcia, Spain
José M. García  Universidad de Murcia, Murcia, Spain
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 59,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1128022.1128065
What is a DOI?

ABSTRACT

Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the access to main memory to recover the sharing status of the block is generally put in the critical path of every cache miss, increasing its latency. Considering the ever-increasing distance to memory, these cache coherence protocols are far from being optimal from the perspective of performance. On the other hand, shared-memory multiprocessors formed by connecting chips that integrate the processor, caches, coherence logic, switch and memory controller through a low-cost, low-latency point-to-point network (glueless shared-memory multiprocessors) are a reality.In this work, we propose a novel design for the L2 cache level, at which coherence has to be maintained, aimed at being used in glueless shared-memory multiprocessors. Our proposal splits the cache structure into two different parts: one for storing data and directory information for the blocks requested by the local processor, and another one for storing only directory information for blocks accessed by remote processors. Using this cache scheme we remove the directory from main memory. Besides saving memory space, our proposal brings very significant reductions in terms of latency of the cache misses (speed-ups of 3.0 on average), which translate into reductions in applications' execution time of 31% on average.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
A. Ahmed, P. Conway, B. Hughes, and F. Weber. AMD Opterontexttrademark space Shared-Memory MP Systems. In 14th HotChips Symposium, August 2002.
5
 
6
A. Gupta, W. Weber, and T. Mowry. Reducing Memory Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes. In Int'l Conference on Parallel Processing (ICPP'90), pages 312--321, August 1990.
 
7
L. Gwennap. Alpha 21364 to Ease Memory Bottleneck. Microprocessor Report, 12(14):12--15, October 1998.
 
8
 
9
 
10
11
12
13
 
14
15
16
 
17
A. Nanda, A. Nguyen, M. Michael, and D. Joseph. High-Throughput Coherence Controllers. In 6th Int'l Symposium on High-Performance Computer Architecture (HPCA-6), pages 145--155, January 2000.
 
18
A. Ros, M. E. Acacio, and J. M. García. A Novel Lightweight Directory Architecture for Scalable Shared-Memory Multiprocessors. In 11th Int'l Euro-Par Conference, pages 582--591, August 2005.
19
 
20
M. Woodacre, D. Robb, D. Roe, and K. Feind. The SGI Altixtexttrademark space 3000 global shared-memory architecture. Technical Whitepaper, Silicon Graphics, Inc., 2003.

Collaborative Colleagues:
Alberto Ros: colleagues
Manuel E. Acacio: colleagues
José M. García: colleagues