|
ABSTRACT
As processor performance continues to improve at a rate much higher than DRAM and network performance, we are approaching a time when large-scale distributed shared memory systems will have remote memory latencies measured in tens of thousands of processor cycles. The Impulse memory system architecture adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to control how data is accessed and cached and thereby improve cache and bus utilization and reduce the number of memory accesses required. Previous Impulse work focuses on uniprocessor systems and relies on software to flush processor caches when necessary to ensure data coherence. In this paper, we investigate an extension of Impulse to multiprocessor systems that extends the coherence protocol to maintain data coherence without requiring software-directed cache flushing. Specifically, the multiprocessor Impulse controller can gather/scatter data across the network while its coherence protocol guarantees that each gather request gets coherent data and each scatter request updates every coherent replica in the system. Our simulation results demonstrate that the proposed system can significantly outperform conventional systems, achieving an average speedup of 9X on four memory-bound benchmarks on a 32-processor system.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
AAE Corp. 2000. DIS Stressmark Suite. AAE Corp.
|
| |
2
|
J. Carter , W. Hsieh , L. Stoller , M. Swanson , L. Zhang , E. Brunvand , A. Davis , C.-C. Kuo , R. Kuramkote , M. Parker , L. Schaelicke , T. Tateyama, Impulse: Building a Smarter Memory Controller, Proceedings of the 5th International Symposium on High Performance Computer Architecture, p.70, January 09-12, 1999
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
Graybill, R. 2002. High productivity computing systems. http://www.darpa.mil/DARPATech2002/presentations/ipto_pdf/speeches/GRAYBILL.pdf.
|
 |
10
|
Mary Hall , Peter Kogge , Jeff Koller , Pedro Diniz , Jacqueline Chame , Jeff Draper , Jeff LaCoss , John Granacki , Jay Brockman , Apoorv Srivastava , William Athas , Vincent Freeh , Jaewook Shin , Joonseok Park, Mapping irregular applications to DIVA, a PIM-based data-intensive architecture, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.57-es, November 14-19, 1999, Portland, Oregon, United States
[doi> 10.1145/331532.331589]
|
| |
11
|
|
| |
12
|
|
 |
13
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Anoop Gupta , John Hennessy, The directory-based cache coherence protocol for the DASH multiprocessor, Proceedings of the 17th annual international symposium on Computer Architecture, p.148-159, May 28-31, 1990, Seattle, Washington, United States
|
 |
14
|
|
 |
15
|
Sally A. McKee , Assaji Aluwihare , Benjamin H. Clark , Robert H. Klenke , Trevor C. Landon , Christopher W. Oliver , Maximo H. Salinas , Adam E. Szymkowiak , Kenneth L. Wright , Wm. A. Wulf , James H. Aylor, Design and evaluation of dynamic access ordering hardware, Proceedings of the 10th international conference on Supercomputing, p.125-132, May 25-28, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237578.237594]
|
| |
16
|
MIPS 2002. MIPS R18000 Microprocessor User's Manual, Version 2.0. MIPS.
|
| |
17
|
Moore, G. 1965. Moore's law. http://www.intel.com/research/silicon/mooreslaw.htm.
|
 |
18
|
|
| |
19
|
David Patterson , Thomas Anderson , Neal Cardwell , Richard Fromm , Kimberly Keeton , Christoforos Kozyrakis , Randi Thomas , Katherine Yelick, A Case for Intelligent RAM, IEEE Micro, v.17 n.2, p.34-44, March 1997
[doi> 10.1109/40.592312
]
|
| |
20
|
Scott Rixner , William J. Dally , Ujval J. Kapasi , Brucek Khailany , Abelardo López-Lagunas , Peter R. Mattson , John D. Owens, A bandwidth-efficient architecture for media processing, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.3-13, November 1998, Dallas, Texas, United States
|
 |
21
|
|
| |
22
|
SGI. 2001. SN2-MIPS Communication Protocol Specification, Revision 0.12. SGI.
|
| |
23
|
SGI. 2002. Orbit Functional Specification, Vol. 1, Revision 0.1. SGI.
|
 |
24
|
|
| |
25
|
Sunaga, T., Kogge, P. M. et al. 1996. A processor in memory chip for massively parallel embedded applicatiions. IEEE Journal of Solid State Circuits, 1556--1559.
|
 |
26
|
|
| |
27
|
Elliot Waingold , Michael Taylor , Devabhaktuni Srikrishna , Vivek Sarkar , Walter Lee , Victor Lee , Jang Kim , Matthew Frank , Peter Finch , Rajeev Barua , Jonathan Babb , Saman Amarasinghe , Anant Agarwal, Baring It All to Software: Raw Machines, Computer, v.30 n.9, p.86-93, September 1997
[doi> 10.1109/2.612254
]
|
 |
28
|
Yoji Yamada , John Gyllenhall , Grant Haab , Wen-mei Hwu, Data relocation and prefetching for programs with large data sets, Proceedings of the 27th annual international symposium on Microarchitecture, p.118-127, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192740]
|
| |
29
|
Zhang, L. 2000. URSIM reference manual. Tech. Rep. UUCS-00-015, University of Utah. August.
|
| |
30
|
|
| |
31
|
Lixin Zhang , Zhen Fang , Mide Parker , Binu K. Mathew , Lambert Schaelicke , John B. Carter , Wilson C. Hsieh , Sally A. McKee, The Impulse Memory Controller, IEEE Transactions on Computers, v.50 n.11, p.1117-1132, November 2001
[doi> 10.1109/12.966490
]
|
|