article

Free Access

Low cost management of replicated data in fault-tolerant distributed systems

Authors:
Thomas A. Joseph

Cornell Univ., Ithaca, NY

Cornell Univ., Ithaca, NY
View Profile

,
Kenneth P. Birman

Cornell Univ., Ithaca, NY

Cornell Univ., Ithaca, NY
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 4 Issue 1pp 54–70https://doi.org/10.1145/6306.6309

Published:10 February 1986Publication History

ACM Transactions on Computer Systems

Abstract

Many distributed systems replicate data for fault tolerance or availability. In such systems, a logical update on a data item results in a physical update on a number of copies. The synchronization and communication required to keep the copies of replicated data consistent introduce a delay when operations are performed. In this paper, we describe a technique that relaxes the usual degree of synchronization, permitting replicated data items to be updated concurrently with other operations, while at the same time ensuring that correctness is not violated. The additional concurrency thus obtained results in better response time when performing operations on replicated data. We also discuss how this technique performs in conjunction with a roll-back and a roll-forward failure recovery mechanism.

References

1 BERNSTEIN P. AND GOODMAN, N. Concurrency control in distributed database systems. ACM Coraput. Surv. 13, 2 (June 1981), 185-222. Google Scholar
2 BERNSTEIN P. AND GOODMAN, N. The failure and recovery problem for distributed databases. In Proceedings o{ the 2nd Symposium on Principles of Distributed Computing (Montreal, Canada, Aug.), ACM, New York 1983, 114-122. Google Scholar
3 BIRMAN, K. Replication and fault-tolerance in the ISIS system. In The 10th ACM Symposium on Operating Systems Principles (Orcas Island, Wash., Dec. 1-4) 1985. To be published. Google Scholar
4 BIRMAN, K., DIETRICH, W., EL ABBADI, A., JOSEPH, T. AND RAEUCHLE, R. An overview of the ISIS project. Tech. Rep. TR 84-642, Department of Computer Science, Cornell University, Ithaca, NY. Oct. 1984. Google Scholar
5 BIRMAN, K., JOSEPH. T., AND SKEEN, D. Reliable communication in an unreliable environment. Tech. Rep. TR 85-694. Department of Computer Science, Cornell University, Ithaca, NY. Aug. 1985. Google Scholar
6 BIRMAN, K., JOSEPH, T., RAEUCHLE, R., AND EL ABBADI, A. Implementing Fault-Tolerant Distributed Objects. IEEE Trans. Softw. Eng. 11, 6 (June 1985) 502-508.Google Scholar
7 BIRMAN, K., JOSEPH, T. AND RAEUCHLE, T. Extending resilient object types efficiently. The 2nd GI/NTG GMI Conference on Fault-Tolerant Computing Systems. (Bonn, West Germany), Sept. 1984. Springer-Verlag. Google Scholar
8 CHANG, J. AND MAXEMCHUK, N. Reliable broadcast protocols. ACM Trans. Comput. Syst. 2, 3 (Aug. 1984), 251-273. Google Scholar
9 FISHER, J. A., ELLIS, J. R., RUTTENBERG, J. AND NICOLAU, A. Parallel processing: A smart compiler for a dumb machine, Tech. Rep. Department of Computer Science, Yale University, June 1984.Google Scholar
10 GRAY, J. Notes on Database Operating Systems. Lecture Notes in Computer Science 60, Springer-Verlag, New York 1978. Google Scholar
11 JOSEPH, T. Low cost management of replicated data. Department of Computer Science, Cornell University, Ph.D. dissertation. Jan. 1986. Google Scholar
12 KOHLER, W. A survey of techniques for synchronization and recovery in decentralized computer systems. ACM Comput. Surv. 13, 2 (June 1981), 149-184. Google Scholar
13 LAMPORT, L. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (July 1978), 95-114. Google Scholar
14 PAPADIMITR|OU, C.H. The serializability of concurrent database updates. J. ACM 26, (Oct. 1979), 631-653. Google Scholar
15 SCHLICHTING, R. D. AND SCHNEIDER, F.B. Fail-Stop processors: An approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222-238. Google Scholar
16 SCHNEIDER, F., GRIES, D. AND SCHLICTING, R. "Fault-tolerant broadcast. Sc~ Comput. Program. 4, 1984, 1-15. Google Scholar
17 SKEEN, D. Determining the last process to fail. ACM Trans. Comput. Syst. 3, (Feb. 1985), 15-30. Google Scholar

Index Terms

Recommendations

Reviews

Reviewer: Wai Sum Lai

The keywords “low cost” in the title of the paper notwithstanding, the authors have not clearly defined, or even mentioned, what the cost objectives are, be they processing or bandwidth overheads in terms of the number of messages required, etc. Rather, a more concrete contribution of the paper is the presentation of a technique to minimize the expected response time when performing operations on replicated data. The method relies on: (1) the piggybacking of operation ordering information, such as timestamp on the synchronization messages of an underlying concurrency control algorithm; and (2) the use of a so-called “atomic broadcast” procedure for the transmission of update instructions to the data sites, concurrently with ongoing operations. This is an interesting proposal. Intuitively, it does help to cut down the amount of delay required in a synchronous updating method. To contrast the difference in performance, it is perhaps best seen through the authors' own implementation experience: “A request . . . (obtains) its reply after about 0.6 seconds; additional updates delay the response by 0.1 seconds each. In a synchronous mode, . . . such a computation requires 1.5 seconds, with additional updates requiring 0.5 seconds each.”

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Computer Systems Volume 4, Issue 1
Feb. 1986
106 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/6306
Issue’s Table of Contents

Copyright © 1986 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 February 1986
Published in tocs Volume 4, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 68
  Total Citations
  View Citations
- 849
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Low cost management of replicated data in fault-tolerant distributed systems

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Low Cost Management of Replicated Data in Fault-Tolerant Distributed Systems

Low cost management of replicated data

A highly fault-tolerant distributed database system with replicated data

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Low cost management of replicated data in fault-tolerant distributed systems

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Low Cost Management of Replicated Data in Fault-Tolerant Distributed Systems

Low cost management of replicated data

A highly fault-tolerant distributed database system with replicated data

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media