|
ABSTRACT
As device scales shrink, higher transistor counts are available while soft-errors, even in logic, become a major concern. A new class of architectures, such as Merrimac and the IBM Cell, take advantage of the higher transistor count by exposing control, communication, and a large number of functional-units at the architectural level, thus achieving high performance and efficiency. This paper explores soft-error fault tolerance in the context of these computeintensive architectures, which differ significantly from their control-intensive CPU counterparts. The main goal of the proposed schemes for Merrimac is to conserve the critical and costly off-chip bandwidth and on-chip storage resources, while maintaining high peak and sustained performance. We achieve this by allowing for reconfigurability and relying on programmer input. The processor is either run at full peak performance employing software fault-tolerance methods, or reduced performance with hardware redundancy. We present several methods, their analysis, and detailed case studies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Hisashige Ando , Yuuji Yoshida , Aiichiro Inoue , Itsumi Sugiyama , Takeo Asakawa , Kuniki Morita , Toshiyuki Muta , Tsuyoshi Motokurumada , Seishi Okada , Hideo Yamashita , Yoshihiko Satsukawa , Akihiko Konmoto , Ryouichi Yamashita , Hiroyuki Sugiyama, A 1.3GHz fifth generation SPARC64 microprocessor, Proceedings of the 40th conference on Design automation, June 02-06, 2003, Anaheim, CA, USA
[doi> 10.1145/775832.776010]
|
| |
3
|
[3] D. M. Andrews. Using executable assertions for testing and fault tolerance. In 9th Fault-Tolerance Computing Symposium, Madison, Wisconsin, USA, June 1979.
|
| |
4
|
[4] ATI. Radeon X800 product site. http://www.ati.com/products/radeonx800.
|
| |
5
|
|
| |
6
|
[6] A. Avizienis, G. C. Gilley, F. P. Mathur, D. A. Rennels, J. A. Rohr, and D. K. Rubin. The STAR (Self Testing And Repairing) computer: an investigation of the theory and practice of fault-tolerant computer design. IEEE Trans. Comput., C-20(11), November 1971.
|
| |
7
|
|
| |
8
|
Prithviraj Banerjee , Joe T. Rahmeh , Craig Stunkel , V. S. Nair , Kaushik Roy , Vijay Balasubramanian , Jacob A. Abraham, Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor, IEEE Transactions on Computers, v.39 n.9, p.1132-1145, September 1990
[doi> 10.1109/12.57055
]
|
| |
9
|
William J. Dally , Francois Labonte , Abhishek Das , Patrick Hanrahan , Jung-Ho Ahn , Jayanth Gummaraju , Mattan Erez , Nuwan Jayasena , Ian Buck , Timothy J. Knight , Ujval J. Kapasi, Merrimac: Supercomputing with Streams, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.35, November 15-21, 2003
|
| |
10
|
|
 |
11
|
|
| |
12
|
[12] T. R. Halfhill. Floating point buoys ClearSpeed. Microprocessor Report, November 17, 2003.
|
| |
13
|
[13] P. Hazucha, T. Karnik, S. W. B. Bloechel, J. T. J. Maiz, K. Soumyanath, G. Dermer, S. Narendra, V. De, and S. Borkar. Measurements and analysis of SER tolerant latch in a 90 nm dual-Vt CMOS process. In 2003 IEEE Custom Integrated Circuits Conference, pages 617-620, September 2003.
|
| |
14
|
[14] K. H. Huang and J. A. Abraham. Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput., C-33:518-528, 1984.
|
| |
15
|
|
| |
16
|
Ujval J. Kapasi , Scott Rixner , William J. Dally , Brucek Khailany , Jung Ho Ahn , Peter Mattson , John D. Owens, Programmable Stream Processors, Computer, v.36 n.8, p.54-62, August 2003
[doi> 10.1109/MC.2003.1220582]
|
| |
17
|
|
| |
18
|
Brucek Khailany , William J. Dally , Scott Rixner , Ujval J. Kapasi , John D. Owens , Brian Towles, Exploring the VLSI Scalability of Stream Processors, Proceedings of the 9th International Symposium on High-Performance Computer Architecture, p.153, February 08-12, 2003
|
| |
19
|
[19] K. Krewell. Cell moves into the limelight. Microprocessor Report, February 14, 2005.
|
| |
20
|
[20] D. Lunardini, B. Narasimham, V. Ramachandran, V. Srinivasan, R. D. Schrimpf, and W. H. Robinson. A performance comparison between hardened-by-design and conventional-design standard cells. In 2004 Workshop on Radiation Effects on Components and Systems, Radiation Hardening Techniques and New Developments, September 2004.
|
| |
21
|
[21] A. Mahmood, D. J. Lu, and E. J. McCluskey. Concurrent fault detection using a watchdog processor and assertions. In 1983 International Test Conference, pages 622-628, Philadelphia, Pennsylvania, USA, October 1983.
|
| |
22
|
[22] MIPS Technologies. MIPS64 20Kc Core. http://www.mips.com/ProductCatalog/P_MIPS6420KcCore.
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
| |
28
|
[28] NVIDIA. NVIDIA GeFORCE FX. http://www.nvidia.com/docs/lo/2430/SUPP/ PO_GFFX_Consumer_030503.pdf.
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
 |
34
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Doug Burger , Stephen W. Keckler , Charles R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, Proceedings of the 30th annual international symposium on Computer architecture, June 09-11, 2003, San Diego, California
|
| |
35
|
[35] Semiconductor Industry Association. The International Technology Roadmap for Semiconductors, 2001 Edition.
|
| |
36
|
|
| |
37
|
|
| |
38
|
Timothy J. Slegel , Robert M. Averill III , Mark A. Check , Bruce C. Giamei , Barry W. Krumm , Christopher A. Krygowski , Wen H. Li , John S. Liptay , John D. MacDougall , Thomas J. McPherson , Jennifer A. Navarro , Eric M. Schwarz , Kevin Shum , Charles F. Webb, IBM's S/390 G5 Microprocessor Design, IEEE Micro, v.19 n.2, p.12-23, March 1999
[doi> 10.1109/40.755464
]
|
 |
39
|
Jared C. Smolens , Brian T. Gold , Jangwoo Kim , Babak Falsafi , James C. Hoe , Andreas G. Nowatzyk, Fingerprinting: bounding soft-error detection latency and bandwidth, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
| |
40
|
[40] D. van der Spoel, A. R. van Buuren, E. Apol, P. J. Meulenhoff, D. P. Tieleman, A. L. T. M. Sijbers, B. Hess, K. A. Feenstra, E. Lindahl, R. van Drunen, and H. J. C. Berendsen. Gromacs User Manual version 3.1. Nijenborgh 4, 9747 AG Groningen, The Netherlands. Internet: http://www.gromacs.org, 2001.
|
| |
41
|
|
| |
42
|
John H. Wensley , Milton W. Green , Karl N. Levitt , Robert E. Shostak, The design, analysis, and verification of the SIFT fault tolerant system, Proceedings of the 2nd international conference on Software engineering, p.458-469, October 13-15, 1976, San Francisco, California, United States
|
| |
43
|
|
|