article

Free Access

The fault span of crash failures

Authors:
George Varghese

Washington Univ., St. Louis, MO

Washington Univ., St. Louis, MO
View Profile

,
Mahesh Jayaram

Washington Univ., St. Louis, MO

Washington Univ., St. Louis, MO
View Profile

Authors Info & Claims

Journal of the ACM Volume 47 Issue 2pp 244–293https://doi.org/10.1145/333979.333982

Published:01 March 2000Publication History

Journal of the ACM

Abstract

A crashing network protocol is an asynchronous protocol whose memory does not survive crashes. We show that a crashing network protocol that works over unreliable links can be driven to arbitrary global states, where each node is in a state reached in some (possibly different) execution, and each link has an arbitrary mixture of packets sent in (possibly different) executions. Our theorem considerably generalizes an earlier result, due to Fekete et al., which states that there is no correct crashing Data Link Protocol. For example, we prove that there is no correct crashing protocol for token passing and for many other resource allocation protocols such as k-exclusion, and the drinking and dining philosophers problems. We further characterize the reachable states caused by crash failures using reliable non-FIFO and reliable FIFO links. We show that with reliable non-FIFO links any acyclic subset of nodes and links can be driven to arbitrary states. We show that with reliable FIFO links, only nodes can be driven to arbitrary states. Overall, we show a strict hierarchy in terms of the set of states reachable by crash failures in the three link models.

References

AFEK, Y., AWERBUCH, B., AND GAFNI, E. 1987. Applying static network protocols to dynamic networks. In Proceedings of the 28th IEEE Symposium on Foundations of Computer Science (Oct.). IEEE Computer Society Press, Los Alamitos, Calif. pp. 358-370.Google Scholar
AFEK, Y., AND BROWN, G.M. 1993. Self-stabilization over unreliable communication media. Distr. Comput. 7, 1, 27-34. Google Scholar
ATTIYA, H., DOLEV, S., AND WELCH, J. L. 1995. Connection management without retaining information. Inf. Comput. 123, 2, (Dec.), 155-171. Google Scholar
BARATZ, A., AND SEGALL, A. 1988. Reliable link initialization procedures. IEEE Trans. Commun. (Feb.), 144-153.Google Scholar
DIGITAL EQUIPMENT CORPORATION. 1983. Phase IV NSP Functional Specification. Digital Order Number AA-X439A-TK.Google Scholar
FEKETE, A., LYNCH, N. A., MANSOUR, Y., AND SPINELLI, J. 1993. The impossibility of implementing reliable communication in the face of crashes. J. ACM, 40, 5 (Nov.). Google Scholar
FINN, S. C. 1979. Resynch procedures and a fail-safe network protocol. IEEE Trans. Commun. COM-27, 6 (June), 840-845.Google Scholar
JAYARAM, M. 1996. Fault span of crash failures. M.S. Thesis, Washington Univ. St. Louis, MO.Google Scholar
JAYARAM, M., AND VARGHESE, G. 1997. The complexity of crash failures. In Proceedings of the 16th Annual ACM Symposium on Principles of Distributed Computing (Santa Barbara, Calif., Aug. 21-24). ACM, New York, 179-188. Google Scholar
LYNCH, N. A., AND TUTTLE, M.R. 1989. An introduction to input/output automata. CWI Quarterly 2, 3, 219-246.Google Scholar
LYNCH, N.A. 1996. Distributed Algorithms. Morgan-Kaufman, San Francisco, Calif. Google Scholar
McQUILLAN, J. M., RICHER, I., AND ROSEN, E. C. 1980. The new routing algorithm for the arpanet. IEEE Trans. Commun. COM-28, 5 (May), 711-719.Google Scholar
TANNENBAUM, A. 1996. Computer Networks, 3rd ed. Prentice-Hall, Upper Saddle River, N.J. Google Scholar
WATSON, R.W. 1981. Timer based mechanisms in reliable transport protocol connection management. Comput. Netw. 5 (Feb.), 47-56.Google Scholar

Index Terms

Recommendations

Contention-related crash failures: Definitions, agreement algorithms, and impossibility results
Abstract
This article explores an interplay between process crash failures and concurrency. Namely, it aims at answering the question, “Is it possible to cope with more crash failures when some number of crashes occur before some predefined ...
Read More
Crash-only software and microreboot: a design and technique for achieving high availability in frequently-failing software systems
Read More
Consensus in anonymous asynchronous systems with crash-recovery and omission failures
Abstract
In anonymous distributed systems, processes are indistinguishable because they have no identity and execute the same algorithm. Currently, anonymous systems are receiving a lot of attention mainly because they preserve privacy, which is an ... $^{}$ $^{}$ $^{}$
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Journal of the ACM Volume 47, Issue 2
March 2000
192 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/333979
Issue’s Table of Contents

Copyright © 2000 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2000
Published in jacm Volume 47, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 752
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The fault span of crash failures

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Contention-related crash failures: Definitions, agreement algorithms, and impossibility results

Crash-only software and microreboot: a design and technique for achieving high availability in frequently-failing software systems

Consensus in anonymous asynchronous systems with crash-recovery and omission failures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The fault span of crash failures

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Contention-related crash failures: Definitions, agreement algorithms, and impossibility results

Crash-only software and microreboot: a design and technique for achieving high availability in frequently-failing software systems

Consensus in anonymous asynchronous systems with crash-recovery and omission failures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media