ABSTRACT
This paper develops techniques for generating and using mathematical models applicable to architectural evaluation of the tradeoffs involved in designing self-repairing highly reliable computers for long missions.
These systems must use standby sparing and their reliability is shown to be extremely sensitive to small variations in a new design parameter, the coverage, c, defined as the probability of system recovery given the existence of a failure. Interactive terminal calculations show c to be the single most important parameter in high-reliability system design. Changing the coverage from 1 to .98 can result in orders of magnitude change in system mission time with a specified reliability.
Most techniques for increasing system reliability (e.g. adding more spares) are shown to be futile in the face of an inadequate .99 coverage. Adding checking, diagnostics, etc. to improve failure coverage is shown to be the most advantageous technique by examples of system tradeoff evaluation. This mandates extensive application of modeling techniques throughout all computer system design phases.
- 1.J. P. Roth, W. G. Bouricius, W. C. Carter and P. R. Schneider, Phase II of an Architectural Study for a Self-Repairing Computer, SAMSO TR-67-106, Nov. 1967.Google Scholar
- 2.A. Avizienis, "Design of Fault-Tolerant Computers", FJCC, Vol. 31, pp. 733-743, 1967.Google Scholar
- 3.C. W. Churchman, R. L. Ackoff and E. L. Arnoff, Introduction to Operations Research, Chapter 1, Wiley, New York, 1957.Google Scholar
- 4.J. K. Knox-Seith, A Redundancy Technique for Improving the Reliability of Digital Systems, Stanford Electronics Laboratory, TR No. 4816-1, Dec. 1963.Google Scholar
- 5.W. G. Bouricius, W. C. Carter, J. P. Roth and P. R. Schneider, Investigations in the Design of an Automatically Repaired Computer, First Annual IEEE Computer Conference, Sept, 1967.Google Scholar
- 6.J. von Neumann, "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components", Automata Studies, Annals of Mathematics, No.34, pp. 43-98, Princeton, 1956.Google Scholar
- 7.J. G. Tryon, Quadded Logic, Redundancy Techniques for Computing Systems, Spartan Books, 1962.Google Scholar
- 8.P. O. Nerber, "Power Off Time Impact on Reliability Estimates", IEEE Int. Convention Rec., Part 10, pp. 1-5, March 22-26, New York.Google Scholar
- 9.A. D. Falkoff and K. E. Iverson, The APL Terminal System, Instructions for Operation, IBM Watson Research Center, Yorktown Heights, N. Y., March 1966.Google Scholar
- 10.R. Courant, Differential and Integral Calculus, Vol. 1, P. 330, Nordemann publishing Co., 1937.Google Scholar
- 11.W. S. Feller, An Introduction to Probability Theory and Its Application, Volume I, Wiley, New York, 1957.Google Scholar
- 12.W. C. Carter and P. R. Schneider, Design of Dynamically Checked Computers, IFIPS '68, Edinburgh, Scotland.Google Scholar
- 13.W. G. Bouricius, W. C. Carter, K. A. Duke, J. P. Roth and P. R. Schneider, Interactive Design of Self-Testing Circuitry, Purdue Centennial Symp. on Information Processing, May 1969.Google Scholar
Index Terms
- Reliability modeling techniques for self-repairing computer systems
Recommendations
Reliability Analysis in Self-Repairing Embryonic Systems
EH '99: Proceedings of the 1st NASA/DOD workshop on Evolvable HardwareOne characteristic of biological organisms that is desirable in engineering systems is the ability to tolerate faults in their components. Fault tolerance in artificial cellular systems is generally achieved by either time-redundancy or hardware-...
The STAR (Self-Testing And Repairing) Computer: An Investigation of the Theory and Practice of Fault-Tolerant Computer Design
This paper presents the results obtained in a continuing investigation of fault-tolerant computing which is being conducted at the Jet Propulsion Laboratory. Initial studies led to the decision to design and construct an experimental computer with ...
Comments