- 1 Abbadi, A.E., Skeen, D., Crisfian, Fo An efficient fault-tolerant protocol Fourth, ACM Conference on P?inciples of Database S~steras (1985).Google Scholar
- 2 Anderson, T., Lee, P. Fauit-toiernce-PrinciOles and Practice. Prentice Hall, 198I. Google ScholarDigital Library
- 3 Avizienis, A. Software fault tolerance. IFIP Comt~aer Com~'ress (San Francisco, Aug. i989),Google Scholar
- 4 avizienis. A,, Gunningberg, p. Kelly J. strigini, L., Traverse, P., Tso, K., Voges, U. The UCLA Dedix system : A distributed tested for multi-version software. 15th lnternationd Conference on Fauh-tolerand Computing (Ann Afar, Michi 1985).Google Scholar
- 5 Babaoglu, O., Drumoind, R.streets of Byzantium: Network architetures for fast realible brodcast . IEEE Trans. Softw. Eng. SE-11, 6, (1985).Google Scholar
- 6 Barbara. D., Garcia-Molina, H., Spauster, A, Increasing avability under Mutal exclusion constraints with Dynamic vote ressignment. ACM Trans. Comput. Sys.7,4(nov. 1989) Google ScholarDigital Library
- 7 Bartlett. J.A Nonstop Kernel. Eighth Sympossium on Operating System Principles (Dec. 1981). Google ScholarDigital Library
- 8 Bernstein, P.Sequoia: A fault-toler ant tightly coupled multiprocessor for traction Processing. IEEE Comput. (1988). Google ScholarDigital Library
- 9 Brnstein,P.,Hadzilacos, V.,Goodman,N. Concurrency Control and Re covery in Database Systems,Addision- WEsley, Reading, Mass.,1987. Google ScholarDigital Library
- 10 Birman, K,, Joseph T; Reliable communication in the presence of failures.ACM Trans. Comput.Sys.5, 1(Feb; t987). Google ScholarDigital Library
- 11 Borg,A., Blau,W., Graetch, W., Herrman, F., Oberle, W. Faultolerance underUnix ACMTrans Comput,Syst. 7,1(Feb 1989) Google ScholarDigital Library
- 12 Carr,R. The Tandem Global update Protocol.Tandem Sys. Rev. i,2 (June 1985).Google Scholar
- 13 Chang. J.M.,Maxemchuck, N. Reliable Broadcast Protocols. ACM Trans. Compt.Systs, 2,3(Ayg.1984). Google ScholarDigital Library
- 14 Xheriton, D., Zwaenpoel, W. Distibuted process groups in the V Kernel. ACM Trans Comput. Syst, 3,2 (May 2985). Google ScholarDigital Library
- 15 Clark, D. The Structuring of systems using up-calls. 10th ACM Symposmun in operting system principles (1985). Google ScholarDigital Library
- 16 Comer, D., Perterson, L.Understanding. Distributred Comput. 3 (1989), 51-60.Google ScholarDigital Library
- 17 Copper. E. Rep{licated distrubtation, programs. ph.D dissertration, UC Berkely, 1985.Google Scholar
- 18 Cristain, F.A. rigorous approach to faule-toleant programming. IEEE Trans. Softw. Eng. SE 11,1 (1985).Google Scholar
- 19 Cristain, F. Agreeing on who is present and who is absent in a synchronous distributed system. 18th International Conference on Fault- Tolerant Computing (Tokyo, June 1988).Google Scholar
- 20 Cristain, F. Exception handling. In Dependability of Resident Computers. T. Anderson, De., Blackwell Svientrific Publication, Oxford, 1989.Google Scholar
- 21 Cristain, F. Probailistic clock synchronization. Distributed Computing3 (1989), 146-158.Google Scholar
- 22 Cristain, F. Synchronous atomic broadcast for redundant broadcast channels. IBM Res. Rep. RJ 7203, Dec.1989.Google Scholar
- 23 Cristain, F., Aghilim H., Strong, R., Dolev, D. Atomic broadcast: From simple diffusion to Byzantine agreement. 15th International Conference on Fault-tolerant Computimng (Ann Arbor, Mich., 1985).Google Scholar
- 24 Cristain, F. Dancey, R. Dehn, J Fault-tolerant in the adacned automation system. 20th International Conference on Fault-tolerant Computing (Newcastle upon Tyne, England, June 1990).Google Scholar
- 25 Dijkstra. E. Hierarchuical ordering of sequential process. Acta Informatica 1 (1971), 115-138.Google ScholarDigital Library
- 26 Ezhilchelvan, P., Shrivastave, S., A characterization of faultsmin systems. Fifth Symposium on Reliabil;ity in Distributed Software and Database systems (Los Angeles. Jan. 1989).Google Scholar
- 27 Gracia-Moilina, H., Spauster, A. Message ordring in a multicast environment. Nonth Intrernational Conference on Distirbuted Systems (Newport Beach, Calif., June 1989).Google Scholar
- 28 Gray, J., Notes on Database Operating Systems. Operating Systems- An Advanced Course. Vol 60, LectureNotes inComputer Science, Springer Verlag, 1978. Google ScholarDigital Library
- 29 Gray, J. Why do Computers Sytop and what cna bne dpone about it? Fifth Symposium on Reliability in Distributed Softwarre and Database systems )Los Angles, Jan. 1986).Google Scholar
- 30 Harper, R., Lala, J., Deyst, J. Fault tolerant paralled processor architectuere overivew. 18th International Conference Fault-Tolerant Computing (Tokyo, June 1989).Google Scholar
- 31 Hopkins, A. Smith, B., Lala, J. FTMP-A highly reliable fault-toler ant multi-processor for aircraft. In Proceesings IEEE, Vol, 66. Ocy. 1978.Google Scholar
- 32 IBM International TEchnical Support Centeres. IMS/VS extended recovery faculity (XRF). Tech. Ref. 1978Google Scholar
- 33 Johson, D., Zwaenepoei, W. Sender based meeage logging. 17th Inernational Conference on Fault- Tolerant Computign (Tokyo, June 1987).Google Scholar
- 34 Kaashoek, F., Tanenbaum, A. Fauklt-tolerant using group communication. Fourth ACM SIGOPS European Workshop (Bologna, Sept. 1990). Google ScholarDigital Library
- 35 Knight, J., Amann, P. Issues infuencing the us of N-version programming in Processing of the IFIP congress(San Francisco, Aug. 1989).Google Scholar
- 36 Koo, R., Toueg, S. Check-pointing and rollbcak recovery for distribuuted systems. IEEE Trans. Softw. Eng. SE-13, 1 (1986). Google ScholarDigital Library
- 37 Kopetz, H., Curnstedi, G., Resiinger, J. Fault-tolerant membership in a sunchronous real-times systems. IFIP Working Conference on Dependable Computing for Critical Aplications (Santa Barbara, Aug. 1989).Google Scholar
- 38 Kronenberg, N., Levy. H., Strecker, w. VAXclusters: A Closely coupled distruted system. ACM Trans. Comput. Syst. 4,2 (1986). Google ScholarDigital Library
- 39 Ladin, R., Liskov, B., Shria, L., Lazy replication: A method for managing replicated data. Ninth Annual ACM Symposium on Prinicples of Distributed Computing (Aug. 1990).Google Scholar
- 40 Lamport, L., Using times instead of times-outs in falut-tolerant systems. ACM Trans, Prog. Lan. Syst, 6, 2 (1984). Google ScholarDigital Library
- 41 Lamport, L., The part time Parli ment. Ces SRC Rep. 49, Sept. 1989.Google Scholar
- 42 Lamport, L., Sturgis, H., Atomic Transactions in Distributed Systems: An Advanced Course. Lecture Notes in Computing Science Vol. 105. Springer Verlag, 1981.Google Scholar
- 43 Laprie, J.C. Dpendability: A unifying Concept For Reliable Computing and Fault-tolerant, T. Anderson, Ed., Blackwell Scientific Publications, Oxford, 1989.Google Scholar
- 44 Laprie, J.C. Arlat, J., Becounes, C., Kanoun, K. Definition and analysis od hardware and software-faulttolerant architectures. IEEE Comput. (July 1990). Google ScholarDigital Library
- 45 Le Lann, G. Critical issues in distributed realtimes computing. In preceedings of ESTEC Workship on communication Networks and Distribuuted Operating Systems within the Space Environment, European Space Agency REp. WPP-10, Noordwijk, Oct. 24-26, 1989.Google Scholar
- 46 Luan, S., Gligor, V. A fault-tolerant protocal for atomic broadcast. 10th International Conference on Distributed Computing Systems (Paris, May 1990).Google Scholar
- 47 McCluskey, E. Fault-tolerant systerms. Tech. Rep. CSL-199 Standfor Univ., 1982.Google Scholar
- 48 Melliar-Smith, M., Moser, L., Agrawale, V. Broadcast Protocols for distributed systems. IEEE Trans. Parallel and Distributed Syst. 1, 1(Jan. 1990). Google ScholarDigital Library
- 49 Oki, B., Liskov, B. Viewstamped replication: A new primary copy method to suport highly available distributed systems. Seventh ACM Symposium on Principles of Distributed Computing (Aug. 1988). Google ScholarDigital Library
- 50 Palumbe, D., Butler, R. Measurement of SIFT operating system overhead. NASA Tech. Mem. 86322,1985Google Scholar
- 51 Parnas, D. Desigining software for ease of extension and contraction IEEE Trans Softw. Eng. Se-5, 2 (Mar. 1979).Google Scholar
- 52 Peterson, W., Weldon, E. Error Correcting Codes. MIT Press, Cambridge, Mass., 1972.Google Scholar
- 53 Powell, D. La tolerant aux fautes dasns les systems repats: Les hupothese d'erreur er Leur importance. LAAS REs. Rep. 89-258, Sept. 1989.Google Scholar
- 54 Randell, B. System structure for software fault-tolerant. IEEE Trans. Soft. Eng. SE-1,2 (1975).Google Scholar
- 55 Saltzer. J., Reed., D., Clark, D.Endto-end arguments in system design. ACM Trans. Comput. Syst., 2,4, (Nov. 1984). Google ScholarDigital Library
- 56 Schmuch, F. The us of efficent broadcast protocos in asynchronous distributed systems. ph.D Disseration TR88-928 Cornell Univ., 1988. Google ScholarDigital Library
- 57 Schneider, F. The state machine approach: A tutorial. TR 86-800, Cornell Univ., 1986. Google ScholarDigital Library
- 58 Sieworek, D. Fault-tolerant in commercial computers. IEEE Comput. (July 1990). Google ScholarDigital Library
- 59 Strom, R., Yeminio, S Ootimistic recovery in distributed systems. ACM Trans. Comput.syst., 3, 3 (1986). Google ScholarDigital Library
- 60 Strong, R. Skeen, D., Cristian, F., Aghili, H. Handshake protocols. Seventh International Conference on Distributed Computing Systems (Berlin, Sept. 1978).Google Scholar
- 61 Tanenbaum, A. Computer Networks. Prientice. Hall, Englewood Cliffs, N.J., 1981. Google ScholarDigital Library
- 62 Taylor, D. and Wilison, G. The Strtus system architecture. In Dependability of Resilient Computer. T. An derson, Ed., Blackwell Scientific Publication, Oxford, 1989.Google Scholar
- 63 Trivedi, K. Probality and Statistics with reliablity, Queuing and Computer Science Application. Prentice Hall, Englewood Cliffs, N.J., 1982. Google ScholarDigital Library
- 64 Verissimo, P., Rodrigues, L., Baptista, M. AMp: A highly parallel atomic multicast protocol. In Proceedings ACM SEGGOM'89 (Austin, Tex., Sept. 89). Google ScholarDigital Library
- 65 Wakerly. J. Error deteching codes, selfchecking circuits and applications. El servier Noth Holland, Inc., N.Y., 1978.Google Scholar
- 66 Wensely, J., Lamport, L., Goldberg, J., Green M., Levitt, K., Melliar- Smith, M., Shostak, R. Weinstock, C. SIFT : Design and analysis of a fault tolerant computer for aircraft contorl. Proceedings IEEE, Vol. 66, Oct. 1978.Google Scholar
- 67 Wulf, W. Reliable hardware-software architecture. 1975 International Conference on Reliable Software, SIGPLAN 10, 6 (1975). Google ScholarDigital Library
Index Terms
- Understanding fault-tolerant distributed systems
Recommendations
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems
The authors describe a dependability evaluation method based on fault injection that establishes the link between the experimental evaluation of the fault tolerance process and the fault occurrence process. The main characteristics of a fault injection ...
Graceful Degradation in Algorithm-Based Fault Tolerant Multiprocessor Systems
Algorithm-based fault tolerance (ABFT) is a technique which improves the reliability of a multiprocessor system by providing concurrent error detection and fault location capability to it. It encodes data at the system level and modifies the algorithm ...
Fault tolerant distributed shared memory algorithms
SPDP '90: Proceedings of the 1990 IEEE Second Symposium on Parallel and Distributed ProcessingDistributed shared memory (DSM) has received increased attention as a mechanism for interprocess communication in loosely-coupled distributed systems because of its perceived advantages over direct use of message passing or remote procedure calls. One ...
Comments