ABSTRACT
Reconfigurable hardware, in conjunction with soft-CPUs, has increasingly established itself in computer architecture education. In this paper we expand this approach into the area of distributed memory multi-processor systems.
Arguments that supported the introduction of reconfigurable hardware as a substitute for commodity CPUs on educational computer architecture boards are equally applicable to teaching hardware that facilitates the construction and configuration of multiprocessor systems.
The IEEE Standard for the Scalable Coherent Interface (SCI) was chosen as the interconnect technology because it enables the demonstration of the most important architecture concepts in this context. This interconnect exhibits high bandwidth and low latencies and not only specifies a hardware Distributed Shared Memory (DSM) architecture, but also defines cache coherence protocols. Consequently an implementation of this standard allows the design of Non-Uniform Memory Access (NUMA) and cache-coherent NUMA (ccNUMA) multiprocessor systems.
- R. Brennan and M. Manzke, "On the introduction of reconfigurable hardware into computer architecture education," in Workshop on Computer Architecture Education WCAE 2003 (E. F. Gehringer, ed.), pp. 96--103, June 2003. Google ScholarDigital Library
- D. Lynch, "A motorola 68008 opcode compatible vhdl cpu," April 2004. http://www.cs.tcd.ie/Michael.Manzke/fyp2003-2004/DavidLynch.pdf.Google Scholar
- L. Redmond, "Design of a teaching instruction set processor in vhdl," April 2004. http://www.cs.tcd.ie/Michael.Manzke/fyp2003--2004/LauraRedmond.pdf.Google Scholar
- P. M. Kelty, IEEE Standard for Scalable Coherent Interface. IEEE, ieee std 1596--1992 ed., March 1992.Google Scholar
- Dolphin Interconnect Solutions Inc., LC3- SCI Link Controller for System Area Networks. http://www.dolphinics.com/products/hardware/lc3.html.Google Scholar
- "Clusters@top500," May 2004. http://clusters.top500.org/.Google Scholar
- "Dolphin interconnect solutions inc.," May 2004. http://www.dolphinics.com.Google Scholar
- "Dolphin interconnect solutions inc.," June 2000. http://www.dolphinics.com/news/2000/june020-2000.html.Google Scholar
- M. Manzke and B. Coghlan, "Non-intrusive deep tracing of sci interconnect traffic," in SCI-Europe (W. Karl and G. Horn, eds.), pp. 53--58, September 1999.Google Scholar
- B. C. O. L. Michael Manzke, Stuard Kenny, "Tuning and verification of simulation models for high speed interconnection," in PDPTA (H. Arabnia, ed.), pp. 1087--1093, June 2001.Google Scholar
- Dolphin Interconnect Solutions Inc., A backside link (B-Link) for scalable coherent interface (SCI nodes), May 1996.Google Scholar
- H. Hellwagner, SCI: Scalable Coherent Interface, vol. 1734 of Lecture Notes in Computer-Science, ch. 1. The SCI Standard and Applications of SCI, pp. 26--30. Springer. 1999. Google ScholarDigital Library
- Intel, Intel 865G/865GV/865PE/865P Chipset. Intel, March 2004. http://developer.intel.com/design/chipsets/.Google Scholar
- Xilinx, Introduction to the VirtexII Product Family. Xilinx Inc, December 2001. http://www.xilinx.com.Google Scholar
- ARM, AMBA Specification V2.0. ARM Limited, May 1999. http://www.arm.com.Google Scholar
- J. Gaisler, Leon2-1.0.10 Users Guide. Gaisler Research, December 2002. http://www.gaisler.com.Google Scholar
- SPARC, SPARC V8 Manual. SPARC International Inc, January 1992. http://www.sparc.org.Google Scholar
- "Rtems is the real-time operating system for multiprocessor systems," May 2004. http://www.rtems.com/.Google Scholar
- "Linux for Ieon2 processor," May 2004. http://www.gaisler.com/linux.html.Google Scholar
Recommendations
Scalable directory architecture for distributed shared memory chip multiprocessors
Traditional Directory-based cache coherence protocol is far from optimal for large-scale cache coherent shared memory multiprocessors due to the increasing latency to access directories stored in DRAM memory. Instead of keeping directories in main ...
Moving Address Translation Closer to Memory in Distributed Shared-Memory Multiprocessors
To support a global virtual memory space, an architecture must translate virtual addresses dynamically. In current processors, the translation is done in a TLB (Translation Lookaside Buffer), before or in parallel with the first-level cache access. As ...
Memory access buffering in multiprocessors
Special Issue: Proceedings of the 13th annual international symposium on Computer architecture (ISCA '86)In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache. This is done to reduce the average memory access latency and to take advantage of memory interleaving. Lock-up free caches are designed ...
Comments