|
ABSTRACT
Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the access to main memory to recover the sharing status of the block is generally put in the critical path of every cache miss, increasing its latency. Considering the ever-increasing distance to memory, these cache coherence protocols are far from being optimal from the perspective of performance. On the other hand, shared-memory multiprocessors formed by connecting chips that integrate the processor, caches, coherence logic, switch and memory controller through a low-cost, low-latency point-to-point network (glueless shared-memory multiprocessors) are a reality.In this work, we propose a novel design for the L2 cache level, at which coherence has to be maintained, aimed at being used in glueless shared-memory multiprocessors. Our proposal splits the cache structure into two different parts: one for storing data and directory information for the blocks requested by the local processor, and another one for storing only directory information for blocks accessed by remote processors. Using this cache scheme we remove the directory from main memory. Besides saving memory space, our proposal brings very significant reductions in terms of latency of the cache misses (speed-ups of 3.0 on average), which translate into reductions in applications' execution time of 31% on average.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
A. Ahmed, P. Conway, B. Hughes, and F. Weber. AMD Opterontexttrademark space Shared-Memory MP Systems. In 14th HotChips Symposium, August 2002.
|
 |
5
|
A. Krishnamurthy , D. E. Culler , A. Dusseau , S. C. Goldstein , S. Lumetta , T. von Eicken , K. Yelick, Parallel programming in Split-C, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.262-273, December 1993, Portland, Oregon, United States
[doi> 10.1145/169627.169724]
|
| |
6
|
A. Gupta, W. Weber, and T. Mowry. Reducing Memory Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes. In Int'l Conference on Parallel Processing (ICPP'90), pages 312--321, August 1990.
|
| |
7
|
L. Gwennap. Alpha 21364 to Ease Memory Bottleneck. Microprocessor Report, 12(14):12--15, October 1998.
|
| |
8
|
|
| |
9
|
|
| |
10
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Wolf-Dietrich Weber , Anoop Gupta , John Hennessy , Mark Horowitz , Monica S. Lam, The Stanford Dash Multiprocessor, Computer, v.25 n.3, p.63-79, March 1992
[doi> 10.1109/2.121510
]
|
 |
11
|
|
 |
12
|
|
 |
13
|
Milo M. K. Martin , Daniel J. Sorin , Anatassia Ailamaki , Alaa R. Alameldeen , Ross M. Dickson , Carl J. Mauer , Kevin E. Moore , Manoj Plakal , Mark D. Hill , David A. Wood, Timestamp snooping: an approach for extending SMPs, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.25-36, November 2000, Cambridge, Massachusetts, United States
|
| |
14
|
|
 |
15
|
|
 |
16
|
Shubhendu S. Mukherjee , Shamik D. Sharma , Mark D. Hill , James R. Larus , Anne Rogers , Joel Saltz, Efficient support for irregular applications on distributed-memory machines, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.68-79, July 19-21, 1995, Santa Barbara, California, United States
|
| |
17
|
A. Nanda, A. Nguyen, M. Michael, and D. Joseph. High-Throughput Coherence Controllers. In 6th Int'l Symposium on High-Performance Computer Architecture (HPCA-6), pages 145--155, January 2000.
|
| |
18
|
A. Ros, M. E. Acacio, and J. M. García. A Novel Lightweight Directory Architecture for Scalable Shared-Memory Multiprocessors. In 11th Int'l Euro-Par Conference, pages 582--591, August 2005.
|
 |
19
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
20
|
M. Woodacre, D. Robb, D. Roe, and K. Feind. The SGI Altixtexttrademark space 3000 global shared-memory architecture. Technical Whitepaper, Silicon Graphics, Inc., 2003.
|
|