Abstract
Many distributed systems replicate data for fault tolerance or availability. In such systems, a logical update on a data item results in a physical update on a number of copies. The synchronization and communication required to keep the copies of replicated data consistent introduce a delay when operations are performed. In this paper, we describe a technique that relaxes the usual degree of synchronization, permitting replicated data items to be updated concurrently with other operations, while at the same time ensuring that correctness is not violated. The additional concurrency thus obtained results in better response time when performing operations on replicated data. We also discuss how this technique performs in conjunction with a roll-back and a roll-forward failure recovery mechanism.
- 1 BERNSTEIN P. AND GOODMAN, N. Concurrency control in distributed database systems. ACM Coraput. Surv. 13, 2 (June 1981), 185-222. Google Scholar
- 2 BERNSTEIN P. AND GOODMAN, N. The failure and recovery problem for distributed databases. In Proceedings o{ the 2nd Symposium on Principles of Distributed Computing (Montreal, Canada, Aug.), ACM, New York 1983, 114-122. Google Scholar
- 3 BIRMAN, K. Replication and fault-tolerance in the ISIS system. In The 10th ACM Symposium on Operating Systems Principles (Orcas Island, Wash., Dec. 1-4) 1985. To be published. Google Scholar
- 4 BIRMAN, K., DIETRICH, W., EL ABBADI, A., JOSEPH, T. AND RAEUCHLE, R. An overview of the ISIS project. Tech. Rep. TR 84-642, Department of Computer Science, Cornell University, Ithaca, NY. Oct. 1984. Google Scholar
- 5 BIRMAN, K., JOSEPH. T., AND SKEEN, D. Reliable communication in an unreliable environment. Tech. Rep. TR 85-694. Department of Computer Science, Cornell University, Ithaca, NY. Aug. 1985. Google Scholar
- 6 BIRMAN, K., JOSEPH, T., RAEUCHLE, R., AND EL ABBADI, A. Implementing Fault-Tolerant Distributed Objects. IEEE Trans. Softw. Eng. 11, 6 (June 1985) 502-508.Google Scholar
- 7 BIRMAN, K., JOSEPH, T. AND RAEUCHLE, T. Extending resilient object types efficiently. The 2nd GI/NTG GMI Conference on Fault-Tolerant Computing Systems. (Bonn, West Germany), Sept. 1984. Springer-Verlag. Google Scholar
- 8 CHANG, J. AND MAXEMCHUK, N. Reliable broadcast protocols. ACM Trans. Comput. Syst. 2, 3 (Aug. 1984), 251-273. Google Scholar
- 9 FISHER, J. A., ELLIS, J. R., RUTTENBERG, J. AND NICOLAU, A. Parallel processing: A smart compiler for a dumb machine, Tech. Rep. Department of Computer Science, Yale University, June 1984.Google Scholar
- 10 GRAY, J. Notes on Database Operating Systems. Lecture Notes in Computer Science 60, Springer-Verlag, New York 1978. Google Scholar
- 11 JOSEPH, T. Low cost management of replicated data. Department of Computer Science, Cornell University, Ph.D. dissertation. Jan. 1986. Google Scholar
- 12 KOHLER, W. A survey of techniques for synchronization and recovery in decentralized computer systems. ACM Comput. Surv. 13, 2 (June 1981), 149-184. Google Scholar
- 13 LAMPORT, L. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (July 1978), 95-114. Google Scholar
- 14 PAPADIMITR|OU, C.H. The serializability of concurrent database updates. J. ACM 26, (Oct. 1979), 631-653. Google Scholar
- 15 SCHLICHTING, R. D. AND SCHNEIDER, F.B. Fail-Stop processors: An approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222-238. Google Scholar
- 16 SCHNEIDER, F., GRIES, D. AND SCHLICTING, R. "Fault-tolerant broadcast. Sc~ Comput. Program. 4, 1984, 1-15. Google Scholar
- 17 SKEEN, D. Determining the last process to fail. ACM Trans. Comput. Syst. 3, (Feb. 1985), 15-30. Google Scholar
Index Terms
- Low cost management of replicated data in fault-tolerant distributed systems
Comments