Article

Free Access

Reliability modeling techniques for self-repairing computer systems

Authors Info & Claims

ACM '69: Proceedings of the 1969 24th national conferenceAugust 1969Pages 295–309https://doi.org/10.1145/800195.805940

Published:26 August 1969Publication History

ACM '69: Proceedings of the 1969 24th national conference

Pages 295–309

ABSTRACT

This paper develops techniques for generating and using mathematical models applicable to architectural evaluation of the tradeoffs involved in designing self-repairing highly reliable computers for long missions.

These systems must use standby sparing and their reliability is shown to be extremely sensitive to small variations in a new design parameter, the coverage, c, defined as the probability of system recovery given the existence of a failure. Interactive terminal calculations show c to be the single most important parameter in high-reliability system design. Changing the coverage from 1 to .98 can result in orders of magnitude change in system mission time with a specified reliability.

Most techniques for increasing system reliability (e.g. adding more spares) are shown to be futile in the face of an inadequate .99 coverage. Adding checking, diagnostics, etc. to improve failure coverage is shown to be the most advantageous technique by examples of system tradeoff evaluation. This mandates extensive application of modeling techniques throughout all computer system design phases.

References

1.J. P. Roth, W. G. Bouricius, W. C. Carter and P. R. Schneider, Phase II of an Architectural Study for a Self-Repairing Computer, SAMSO TR-67-106, Nov. 1967.Google Scholar
2.A. Avizienis, "Design of Fault-Tolerant Computers", FJCC, Vol. 31, pp. 733-743, 1967.Google Scholar
3.C. W. Churchman, R. L. Ackoff and E. L. Arnoff, Introduction to Operations Research, Chapter 1, Wiley, New York, 1957.Google Scholar
4.J. K. Knox-Seith, A Redundancy Technique for Improving the Reliability of Digital Systems, Stanford Electronics Laboratory, TR No. 4816-1, Dec. 1963.Google Scholar
5.W. G. Bouricius, W. C. Carter, J. P. Roth and P. R. Schneider, Investigations in the Design of an Automatically Repaired Computer, First Annual IEEE Computer Conference, Sept, 1967.Google Scholar
6.J. von Neumann, "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components", Automata Studies, Annals of Mathematics, No.34, pp. 43-98, Princeton, 1956.Google Scholar
7.J. G. Tryon, Quadded Logic, Redundancy Techniques for Computing Systems, Spartan Books, 1962.Google Scholar
8.P. O. Nerber, "Power Off Time Impact on Reliability Estimates", IEEE Int. Convention Rec., Part 10, pp. 1-5, March 22-26, New York.Google Scholar
9.A. D. Falkoff and K. E. Iverson, The APL Terminal System, Instructions for Operation, IBM Watson Research Center, Yorktown Heights, N. Y., March 1966.Google Scholar
10.R. Courant, Differential and Integral Calculus, Vol. 1, P. 330, Nordemann publishing Co., 1937.Google Scholar
11.W. S. Feller, An Introduction to Probability Theory and Its Application, Volume I, Wiley, New York, 1957.Google Scholar
12.W. C. Carter and P. R. Schneider, Design of Dynamically Checked Computers, IFIPS '68, Edinburgh, Scotland.Google Scholar
13.W. G. Bouricius, W. C. Carter, K. A. Duke, J. P. Roth and P. R. Schneider, Interactive Design of Self-Testing Circuitry, Purdue Centennial Symp. on Information Processing, May 1969.Google Scholar

Index Terms

Reliability modeling techniques for self-repairing computer systems

Recommendations

Reliability Analysis in Self-Repairing Embryonic Systems
EH '99: Proceedings of the 1st NASA/DOD workshop on Evolvable Hardware

One characteristic of biological organisms that is desirable in engineering systems is the ability to tolerate faults in their components. Fault tolerance in artificial cellular systems is generally achieved by either time-redundancy or hardware-...
Read More
The STAR (Self-Testing And Repairing) Computer: An Investigation of the Theory and Practice of Fault-Tolerant Computer Design

This paper presents the results obtained in a continuing investigation of fault-tolerant computing which is being conducted at the Jet Propulsion Laboratory. Initial studies led to the decision to design and construct an experimental computer with ...
Read More
Reliability modeling and evaluation in computer networks and distributed systems
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACM '69: Proceedings of the 1969 24th national conference
August 1969
686 pages
ISBN:9781450374934
DOI:10.1145/800195
Chairmen:
Solomon L. Pollack,
Thomas R. Dines,
Ward Sangren,
Norman R. Nielsen,
William G. Gerkin,
Alfred E. Corduan,
Len Nowak,
James L. Mueller,
Joseph Horner,
Pasteur S. T. Yuen,
Jeffery Stein,
Margaret M. Mueller
Copyright © 1969 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 August 1969
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 167
  Total Citations
  View Citations
- 201
  Total Downloads
- Downloads (Last 12 months)63
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reliability modeling techniques for self-repairing computer systems

ACM '69: Proceedings of the 1969 24th national conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reliability Analysis in Self-Repairing Embryonic Systems

The STAR (Self-Testing And Repairing) Computer: An Investigation of the Theory and Practice of Fault-Tolerant Computer Design

Reliability modeling and evaluation in computer networks and distributed systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Reliability modeling techniques for self-repairing computer systems

ACM '69: Proceedings of the 1969 24th national conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reliability Analysis in Self-Repairing Embryonic Systems

The STAR (Self-Testing And Repairing) Computer: An Investigation of the Theory and Practice of Fault-Tolerant Computer Design

Reliability modeling and evaluation in computer networks and distributed systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media