Abstract
To ensure atomicity of transactions in distributed systems so-called 2-phase commit (2PC) protocols have been proposed. The basic assumption of these protocols is that the processing nodes involved in transactions are “sane,” i.e., they only fail with omission failures, and nodes eventually recover from failures. Unfortunately, this assumption is not realistic for so-called Open Distributed Systems (ODSs), in which nodes may have totally different reliability characteristics. In ODSs, nodes can be classified into trusted nodes (e.g., a banking server) and nontrusted nodes (e.g., a home PC requesting a remote banking service). While trusted nodes are assumed to be sane, nontrusted nodes may fail permanently and even cause commission failures to occur.
In this paper, we propose a family of 2PC protocols that tolerate any number of omission failures at trusted nodes and any number of commission and omission failures at nontrusted nodes. The proposed protocols ensure that (at least) the trusted nodes participating in a transaction eventually terminate the transaction in a consistent manner. Unlike Byzantine commit protocols, our protocols do not incorporate mechanisms for achieving Byzantine agreement, which has advantages in terms of complexity: Our protocols have the same or only a slightly higher message complexity than traditional 2PC protocols.
- 1 CHAN, A., DAYAL, U., FOX, S., GOODMAN, N., Rms, D., AND SKEEN, D. Overview of an Ada compatible distributed database manager. ACM SIGMOD 13, 4 (1983), 228 237. Google ScholarDigital Library
- 2 DECNET General Descrtptton. Doc. AA-K179A-TK. Digital Equipment Corporation.Google Scholar
- 3 DOLEV, D., AND STRONG, H.R. Distributed commit with bounded waiting. In Proceedings of the Second IEEE Syrup. on Reliability in Distributed Software and Database Systems (1982); see also IBM Research Report RJ3417, 1982.Google Scholar
- 4 EPPINGER, J. L., MUMMERT, L. B., AND SPECTOR, A. Z. S. EDS.,. Camelot and Avalon: A D~stributed Transaction Facd~ty. Morgan Kaufmann, San Mateo, Calif., 1991. Google ScholarDigital Library
- 5 GARCIA-MOLINA, H., AND ABBOTT, R.K. Reliable distributed database management. Proceedings of the IEEE 75, 5 (1987), 601 620.Google ScholarCross Ref
- 6 GRAY, J. Notes on database operating systems. In Operating' Systems: An Advanced Course. Lecture Notes in Computer Science 60, Springer, 1978, 393 481. Google ScholarDigital Library
- 7 HAERDER, T., AND REUTER, A. Principles of transaction-oriented database recovery. ACM Comput. Surv. 15, 4 (1983), 287-317. Google ScholarDigital Library
- 8 HASKIN, R., MALACHI, Y., AND SAWDON, W. Recovery management in QuickSilver. ACM Trans. Comput. ~yst. 6, 1 (1988), 82-108. Google ScholarDigital Library
- 9 Information Processing Systems Open Systems Interconnection Basic Reference Model. International Standard ISO 7498, 1984.Google Scholar
- 10 Information Processing Systems Open Systems Interconnection--Definition of Common Application Service Elements--Commitment, Concurrency and Recovery. Draft Internationa} Standard ISO 9804/2, 1989.Google Scholar
- 11 Information Processing Systems--Open Systems Interconnection Distributed Transaction Processing-Part 1: Model, Part 2: Service Definition, Part 3: Protocol Specification Draft Proposal ISO 10026/1-3, 1989.Google Scholar
- 12 LAMPSON, B. Atomic transactions. Lecture Notes in Computer Science 105, Springer, 1981, 246 266 Google ScholarDigital Library
- 13 LAMPORT, L, SHOSTAK, R., AND PEASE, M The Byzantine generals problem. ACM Trans. Program. Lang. Syst., 4, 3 (1982), 382-401. Google ScholarDigital Library
- 14 LELANN, G. Error Recovem,. Lecture Notes in Computer Science 105, Springer, 1981, 371-376.Google ScholarCross Ref
- 15 LINDSAY, B., HAAS, L., MOHAN, C., WILMS, P., AND YOST, R. Computation and communication in R*: A distributed database manager,, ACM Trans. Comput. Syst. 2, 1 (1984), 24 38. Google ScholarDigital Library
- 16 MOHAN, C., AND LINDSAY, B. Efficient commit protocols for the tree process model of distributed transactions. In Proceedzngs of the ACM SIGACT-SIGOPS Symposzum on PrincL- ples of Dlstrzbuted Computing (Montreal, 1983), 76 80 Google ScholarDigital Library
- 17 MOHAN, C., LINDSAY, B., AND OBERMARCK, R. Transaction management in the R* distributed database management system. ACM Trans. Database Syst. 11, 4 (1986), 378 396. Google ScholarDigital Library
- 18 MOHAN, C. STRONG, H. R., AND FINKLESTE{N, S. Method for distributed transaction commit and recovery using Byzantine agreement within clusters of processors. In Procee&ngs of the 2rid ACM SIGACT/SIGOPS SyraposLum on Principles of Dzstributed Computing (Montreal, Canada, 1983), 89 103. Google ScholarDigital Library
- 19 NEEDHAM, R. M., AND SCHROEDER, M. D. Using encryption for authentication in large networks of computers Commun. ACM 21, 12 (1978), 210-216. Google ScholarDigital Library
- 20 RIVEST, R. L., SHAMm, A., AND ADLEMAN, L. A method for obtaimng digital signatures and pubhc-key cryptosystems. Commun. ACM 21, 12 (1978), 120 126. Google ScholarDigital Library
- 21 ROTHERMEL, K., AND PAPPE, S. Open commit protocols for the tree of processes model. In Proceedings of the lOth {nternatmnal Conference on Dzstrzbuted Computer Systems (Paris, 1990), 236 244.Google ScholarCross Ref
- 22 RosE, M T. The Open Booh A Practzcal Perspecm, e on OSI. Prentice-Hall, Englewood Cliffs, N.J., 1989. Google ScholarDigital Library
- 23 SCHLICHTING, R., AND SCHNEIDER, F. Fail-Stop processors' An approach to designing faulttolerant computing systems ACM Trons, Comput. Syst. 1, 3 (1983), 222-238. Google ScholarDigital Library
- 24 SKEEN, D., AND STONEBRAKER, M. A formal model of crash recovery in a distributed system In Procee&ngs of the 5th Internatzonal Workshop on Dlstrzbuted Data Management and Computer Netu,orks (1981), 129 142Google Scholar
- 25 Systems Network Architecture Concepts and Products. IBM Corporation, GC30-3072, 1985Google Scholar
- 26 Systems Network Architecture, Format and Protocol Reference Manual: Architecture Logic For LU Type 6.2 IBM Corporation, GC30-3269, 1985.Google Scholar
- 27 SPECTOR, h., THOMPSON, n., PAUSCH, R,, EPPINGER, J. L., DAVES, R., DUCHAMP, D., DANIELS, D. S., AND BLOCH, J.J. Camelot: A distributed transaction facility for Mach and the Internet--an interim report. Tech. Rep, CMU-CS-87-129, Dept. of Computer Science, Carnegie Mellon Univ., Pittsburgh, Pa, 1987.Google Scholar
- 28 TANENBAUM, A. Computer Networks Prentice-Hall, Englewood Cliffs, N.J, 1981. Google ScholarDigital Library
- 29 Introductmn to the Transactmn Momtorzng Fac~l~(~ (TMF) Tandem Computers Incorporated.Google Scholar
- 30 WALTER, B. A robust and efficmnt protocol for checking the availability of remote sites Comput. Netw. 6, 3 (1982), 173 188.Google Scholar
- 31 X/OPEN Portabihty GuLde. Elsevier, New York, 1987.Google Scholar
Index Terms
Open commit protocols tolerating commission failures
Recommendations
A Formal Model of Crash Recovery in a Distributed System
A formal model for atomic commit protocols for a distributed database system is introduced. The model is used to prove existence results about resilient protocols for site failures that do not partition the network and then for partitioned networks. For ...
Commit processing in distributed real-time database systems
RTSS '96: Proceedings of the 17th IEEE Real-Time Systems SymposiumWe investigate the performance implications of supporting transaction atomicity in a distributed real-time database system. Using a detailed simulation model of a firm-deadline distributed real-time database system, we profile the real-time performance ...
The Performance of Two Phase Commit Protocols in the Presence of Site Failures
The two phase commit is an important protocol in distributed database systems. Much of the existing literature on the protocol is restricted to discussing and analyzing the protocol (and its variants) in the absence of failures. Very little, especially in ...
Comments