skip to main content
10.1145/1168857.1168868acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article

Ultra low-cost defect protection for microprocessor pipelines

Published: 20 October 2006 Publication History

Abstract

The sustained push toward smaller and smaller technology sizes has reached a point where device reliability has moved to the forefront of concerns for next-generation designs. Silicon failure mechanisms, such as transistor wearout and manufacturing defects, are a growing challenge that threatens the yield and product lifetime of future systems. In this paper we introduce the BulletProof pipeline, the first ultra low-cost mechanism to protect a microprocessor pipeline and on-chip memory system from silicon defects. To achieve this goal we combine area-frugal on-line testing techniques and system-level checkpointing to provide the same guarantees of reliability found in traditional solutions, but at much lower cost. Our approach utilizes a microarchitectural checkpointing mechanism which creates coarse-grained epochs of execution, during which distributed on-line built in self-test (BIST) mechanisms validate the integrity of the underlying hardware. In case a failure is detected, we rely on the natural redundancy of instructionlevel parallel processors to repair the system so that it can still operate in a degraded performance mode. Using detailed circuit-level and architectural simulation, we find that our approach provides very high coverage of silicon defects (89%) with little area cost (5.8%). In addition, when a defect occurs, the subsequent degraded mode of operation was found to have only moderate performance impacts, (from 4% to 18% slowdown).

References

[1]
R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The tera computer system. In Int'l Conf. on Supercomputing (ICS), pages 1--6, June 1990.
[2]
T. Austin, D. Blaauw, T. Mudge, and K. Flautner. Making typical silicon matter with razor. IEEE Computer, 37(3):57--65, 2004.
[3]
A. Avizienis. Arithmetic error codes: Cost and effectiveness studies for application in digital system design. IEEE Trans. on Computers, C-20(II):1322--1331, 1971.
[4]
T.S. Barnett and A.D. Singh. Relating yield models to burn-in fall-out in time. In Proc. of Int'l Test Conference (ITC), pages 77--84, 2003.
[5]
J.M. Berger. A note on error detection codes for asymmetric channels. Information and Control, 4(1):68--73, 1961.
[6]
K. Bernstein. Nano-meter scale CMOS devices (tutorial presentation). In 5th Int'l Symposium on Quality of Electronic Design, 2004.
[7]
S. Borkar. VLSI design challenges for gigascale integration (keynote address). In 18th Int'l Conference on VLSI Design, 2005.
[8]
B. Bose and D.J. Lin. Systematic unidirectional error-detecting codes. IEEE Trans. on Computers, 34(11):1026--1032, 1985.
[9]
F.A. Bower, P.G. Shealy, S. Ozev, and D.J. Sorin. Tolerating hard faults in microprocessor array structures. In Proc. Int'l Symposium on Microarchitecture (MICRO), June 2004.
[10]
F.A. Bower, D.J. Sorin, and S. Ozev. A mechanism for online diagnosis of hard faults in microprocessors. In Proc. Int'l Symposium on Microarchitecture (MICRO), Nov. 2005.
[11]
K. Constantinides, J. Blome, S. Plaza, B. Zhang, V. Bertacco, S. Mahlke, T. Austin, and M. Orshansky. BulletProof: A defecttolerant CMP switch architecture. In Proc. of the Int'l Symposium on High-Performance Computer Architecture, Feb. 2006.
[12]
R. Guo, S. Mitra, E. Amyeen, J. Lee, S. Sivaraj, and S. Venkataraman. Evaluation of test metrics: stuck-at, bridge coverage estimate and gate exhaustive. In VLSI Test Symposium, pages 66--71, 2006.
[13]
P. Gupta and A.B. Kahng. Manufacturing-aware physical design. In Proc. of Int'l Conference on Computer-Aided Design (ICCAD), pages 681--685, 2003.
[14]
M.R. Guthaus, J.S. Ringenberg, D. Ernst, T.M. Austin, T. Mudge, and R.B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In IEEE Annual Workshop on Workload Characteristics, pages 3--14, 2001.
[15]
J.R. Heath, P. Kuekes, G. Snider, and S. Williams. A defect-tolerant computer architecture: Opportunities for nanotechnology. Science, 280(5370):1716--1721, 1998.
[16]
M.D. Hill and A.J. Smith. Evaluating associativity in cpu caches. IEEE Trans. on Computers, 38(12):1612--1630, 1989.
[17]
A.M. Ionescu, M.J. Declercq, S. Mahapatra, K. Banerjee, and J. Gautier. Few electron devices: towards hybrid CMOS-SET integrated circuits. In Proc. of the Design Automation Conference, pages 88--93, 2002.
[18]
B. Janssens and W.K. Fuchs. The performance of cache-based error recovery in multiprocessors. IEEE Trans. Parallel Distributed Systems, 5(10):1033--1043, 1994.
[19]
A.J. KleinOsowski and D.J. Lilja. The NanoBox project: Exploring fabrics of self-correcting logic blocks for high defect rate molecular device technologies. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pages 19--24, 2004.
[20]
M. Kirman, N. Kirman, and J. Martinez. Cherry-MP: Correctly integrating checkpointed early resource recycling in chip multiprocessors. Intl. Symposium on Microarchitecture (MICRO), Dec. 2005.
[21]
C. Lee, M. Potkonjak, and W.H. Mangione-Smith. MediaBench: A tool for evaluating and synthesizing multimedia and communicatons systems. In Int'l Symposium on Computer Architecture, pages 330--335, 1997.
[22]
J.F. Martinez, J. Renau, M.C. Huang, M. Prvulovic, and J. Torrellas. Cherry: Checkpointed early resource recycling in out-of-order microprocessors. In Proc. Int'l Symposium on Microarchitecture (MICRO), pages 3--14, 2002.
[23]
M. Meterelliyoz, H. Mahmoodi, and K. Roy. A leakage control system for thermal stability during burn-in test. In Proc. of Int'l Test Conference (ITC), Nov. 2005.
[24]
S. Mitra and E.J. McCluskey. Which concurrent detection scheme to choose? In Proc. of Int'l Test Conference (ITC), 2000.
[25]
B.T. Murray and J.P. Hayes. Testing ICs: Getting to the core of the problem. IEEE Computer, 29(11):32--38, 1996.
[26]
M. Nicolaidis, R. de Oliveira Duarte, S. Manich, and J. Figueras. Fault-secure parity prediction arithmetic operators. IEEE Design & Test of Computers, 14(2):60--71, 1997.
[27]
M.K. Qureshi, O. Mutlu, and Y.N. Patt. Microarchitecturebased introspection: A technique for transient-fault tolerance in microprocessors. In Proc. of Int'l Conference on Dependable Systems and Networks (DSN), 2005.
[28]
J.M. Rabaey. Digital integrated circuits: a design perspective. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996.
[29]
P. Shivakumar, S.W. Keckler, C.R. Moore, and D. Burger. Exploiting microarchitectural redundancy for defect tolerance. In Proc. of Int'l Conference on Computer Design (ICCD), 2003.
[30]
M. Shulz. The end of the road for silicon. Nature Magazine, June 1999.
[31]
D.P. Siewiorek and R.S. Swarz. Reliable computer systems: Design and evaluation, 3rd edition. AK Peters, Ltd, 1998.
[32]
J. Smolens, B. Gold, K.J, B. Falsaff, J. Hoe, and A. Nowatzyk. Fingerprinting: Bounding the soft-error detection latency and bandwidth. In Proc. of the Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2004.
[33]
J.H. Stathis. Reliability limits for the gate insulator in CMOS technology. IBM Journal of Research and Development, 46(2/3):265--286, 2002.
[34]
R. Teodorescu, J. Nakano, and J. Torrellas. SWICH: A prototype for efficient cache-level checkpointing and rollback. IEEE Micro, 2006.
[35]
Trimaran. An infrastructure for research in ILP. www.trimaran.org
[36]
C. Weaver and T. Austin. A fault tolerant approach to microprocessor design. In Proc. of Int'l Conference on Dependable Systems and Networks (DSN), pages 411--420, 2001.

Cited By

View all

Index Terms

  1. Ultra low-cost defect protection for microprocessor pipelines

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
      October 2006
      440 pages
      ISBN:1595934510
      DOI:10.1145/1168857
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 34, Issue 5
        Proceedings of the 2006 ASPLOS Conference
        December 2006
        425 pages
        ISSN:0163-5964
        DOI:10.1145/1168919
        Issue’s Table of Contents
      • cover image ACM SIGOPS Operating Systems Review
        ACM SIGOPS Operating Systems Review  Volume 40, Issue 5
        Proceedings of the 2006 ASPLOS Conference
        December 2006
        425 pages
        ISSN:0163-5980
        DOI:10.1145/1168917
        Issue’s Table of Contents
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 41, Issue 11
        Proceedings of the 2006 ASPLOS Conference
        November 2006
        425 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1168918
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 October 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. defect-protection
      2. low-cost
      3. pipelines
      4. reliability

      Qualifiers

      • Article

      Conference

      ASPLOS06

      Acceptance Rates

      ASPLOS XII Paper Acceptance Rate 38 of 158 submissions, 24%;
      Overall Acceptance Rate 535 of 2,713 submissions, 20%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)DiagnosisFault Tolerant Computer Architecture10.1007/978-3-031-01723-0_4(81-87)Online publication date: 5-Mar-2022
      • (2022)Error DetectionFault Tolerant Computer Architecture10.1007/978-3-031-01723-0_2(19-59)Online publication date: 5-Mar-2022
      • (2021)A Holistic Solution for Reliability of 3D Parallel SystemsACM Journal on Emerging Technologies in Computing Systems10.1145/348890018:1(1-27)Online publication date: 16-Nov-2021
      • (2021)Remaining useful life prediction in embedded systems using an online auto-updated machine learning based modelingMicroelectronics Reliability10.1016/j.microrel.2021.114071119(114071)Online publication date: Apr-2021
      • (2020)Incremental Modeling and Monitoring of Embedded CPU-GPU ChipsProcesses10.3390/pr80606788:6(678)Online publication date: 9-Jun-2020
      • (2019)32-Bit One Instruction Core: A Low-Cost, Reliable, and Fault-Tolerant Core for Multicore SystemsJournal of Testing and Evaluation10.1520/JTE2018049247:6(20180492)Online publication date: 31-Jan-2019
      • (2019)Cache Bypassing and Checkpointing to Circumvent Data Security Attacks on STTRAMIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2017.26538137:2(262-270)Online publication date: 1-Apr-2019
      • (2018)Error correlation prediction in lockstep processors for safety-critical systemsProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00065(737-748)Online publication date: 20-Oct-2018
      • (2018)Data-Driven Approach for Feature Drift Detection in Embedded Electronic DevicesIFAC-PapersOnLine10.1016/j.ifacol.2018.09.71451:24(1024-1029)Online publication date: 2018
      • (2017)Leak Detection in Pipe Networks Using Hybrid ANN MethodWater Conservation Science and Engineering10.1007/s41101-017-0035-12:4(145-152)Online publication date: 17-Oct-2017
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media