skip to main content
article
Free Access

Low cost management of replicated data in fault-tolerant distributed systems

Published:10 February 1986Publication History
Skip Abstract Section

Abstract

Many distributed systems replicate data for fault tolerance or availability. In such systems, a logical update on a data item results in a physical update on a number of copies. The synchronization and communication required to keep the copies of replicated data consistent introduce a delay when operations are performed. In this paper, we describe a technique that relaxes the usual degree of synchronization, permitting replicated data items to be updated concurrently with other operations, while at the same time ensuring that correctness is not violated. The additional concurrency thus obtained results in better response time when performing operations on replicated data. We also discuss how this technique performs in conjunction with a roll-back and a roll-forward failure recovery mechanism.

References

  1. 1 BERNSTEIN P. AND GOODMAN, N. Concurrency control in distributed database systems. ACM Coraput. Surv. 13, 2 (June 1981), 185-222. Google ScholarGoogle Scholar
  2. 2 BERNSTEIN P. AND GOODMAN, N. The failure and recovery problem for distributed databases. In Proceedings o{ the 2nd Symposium on Principles of Distributed Computing (Montreal, Canada, Aug.), ACM, New York 1983, 114-122. Google ScholarGoogle Scholar
  3. 3 BIRMAN, K. Replication and fault-tolerance in the ISIS system. In The 10th ACM Symposium on Operating Systems Principles (Orcas Island, Wash., Dec. 1-4) 1985. To be published. Google ScholarGoogle Scholar
  4. 4 BIRMAN, K., DIETRICH, W., EL ABBADI, A., JOSEPH, T. AND RAEUCHLE, R. An overview of the ISIS project. Tech. Rep. TR 84-642, Department of Computer Science, Cornell University, Ithaca, NY. Oct. 1984. Google ScholarGoogle Scholar
  5. 5 BIRMAN, K., JOSEPH. T., AND SKEEN, D. Reliable communication in an unreliable environment. Tech. Rep. TR 85-694. Department of Computer Science, Cornell University, Ithaca, NY. Aug. 1985. Google ScholarGoogle Scholar
  6. 6 BIRMAN, K., JOSEPH, T., RAEUCHLE, R., AND EL ABBADI, A. Implementing Fault-Tolerant Distributed Objects. IEEE Trans. Softw. Eng. 11, 6 (June 1985) 502-508.Google ScholarGoogle Scholar
  7. 7 BIRMAN, K., JOSEPH, T. AND RAEUCHLE, T. Extending resilient object types efficiently. The 2nd GI/NTG GMI Conference on Fault-Tolerant Computing Systems. (Bonn, West Germany), Sept. 1984. Springer-Verlag. Google ScholarGoogle Scholar
  8. 8 CHANG, J. AND MAXEMCHUK, N. Reliable broadcast protocols. ACM Trans. Comput. Syst. 2, 3 (Aug. 1984), 251-273. Google ScholarGoogle Scholar
  9. 9 FISHER, J. A., ELLIS, J. R., RUTTENBERG, J. AND NICOLAU, A. Parallel processing: A smart compiler for a dumb machine, Tech. Rep. Department of Computer Science, Yale University, June 1984.Google ScholarGoogle Scholar
  10. 10 GRAY, J. Notes on Database Operating Systems. Lecture Notes in Computer Science 60, Springer-Verlag, New York 1978. Google ScholarGoogle Scholar
  11. 11 JOSEPH, T. Low cost management of replicated data. Department of Computer Science, Cornell University, Ph.D. dissertation. Jan. 1986. Google ScholarGoogle Scholar
  12. 12 KOHLER, W. A survey of techniques for synchronization and recovery in decentralized computer systems. ACM Comput. Surv. 13, 2 (June 1981), 149-184. Google ScholarGoogle Scholar
  13. 13 LAMPORT, L. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (July 1978), 95-114. Google ScholarGoogle Scholar
  14. 14 PAPADIMITR|OU, C.H. The serializability of concurrent database updates. J. ACM 26, (Oct. 1979), 631-653. Google ScholarGoogle Scholar
  15. 15 SCHLICHTING, R. D. AND SCHNEIDER, F.B. Fail-Stop processors: An approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222-238. Google ScholarGoogle Scholar
  16. 16 SCHNEIDER, F., GRIES, D. AND SCHLICTING, R. "Fault-tolerant broadcast. Sc~ Comput. Program. 4, 1984, 1-15. Google ScholarGoogle Scholar
  17. 17 SKEEN, D. Determining the last process to fail. ACM Trans. Comput. Syst. 3, (Feb. 1985), 15-30. Google ScholarGoogle Scholar

Index Terms

  1. Low cost management of replicated data in fault-tolerant distributed systems

              Recommendations

              Reviews

              Wai Sum Lai

              The keywords “low cost” in the title of the paper notwithstanding, the authors have not clearly defined, or even mentioned, what the cost objectives are, be they processing or bandwidth overheads in terms of the number of messages required, etc. Rather, a more concrete contribution of the paper is the presentation of a technique to minimize the expected response time when performing operations on replicated data. The method relies on: (1) the piggybacking of operation ordering information, such as timestamp on the synchronization messages of an underlying concurrency control algorithm; and (2) the use of a so-called “atomic broadcast” procedure for the transmission of update instructions to the data sites, concurrently with ongoing operations. This is an interesting proposal. Intuitively, it does help to cut down the amount of delay required in a synchronous updating method. To contrast the difference in performance, it is perhaps best seen through the authors' own implementation experience: “A request . . . (obtains) its reply after about 0.6 seconds; additional updates delay the response by 0.1 seconds each. In a synchronous mode, . . . such a computation requires 1.5 seconds, with additional updates requiring 0.5 seconds each.”

              Access critical reviews of Computing literature here

              Become a reviewer for Computing Reviews.

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Computer Systems
                ACM Transactions on Computer Systems  Volume 4, Issue 1
                Feb. 1986
                106 pages
                ISSN:0734-2071
                EISSN:1557-7333
                DOI:10.1145/6306
                Issue’s Table of Contents

                Copyright © 1986 ACM

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 10 February 1986
                Published in tocs Volume 4, Issue 1

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader