skip to main content
10.1145/1281100.1281135acmconferencesArticle/Chapter ViewAbstractPublication PagespodcConference Proceedingsconference-collections
Article

On the weakest failure detector ever

Published: 12 August 2007 Publication History

Abstract

Many problems in distributed computing are impossible when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is needed to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure informatio.
We present an abstraction, denoted Υ, that provides very little failure information. In every run of the distributed system, Υ eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it only excludes one possibility of correct set among many, Υ still captures non-trivial failure information. We show that Υ is sufficient to circumvent the fundamental wait-free set-agreement impossibility. While doing so, we (a) disprove previous conjectures about the weakest failure detector to solve set-agreement and we (b) prove that solving set-agreement with registers is strictly weaker than solving n+1-process consensus using n-process consensus. We prove that Υ is, in a precise sense, minimal to circumvent any wait-free impossibility. Roughly, we show that Υ is the weakest eventually stable failure detect or to circumvent any wait-free impossibility.
Our results are generalized through an abstraction Υf that we introduce and prove necessary to solve any problem that cannot be solved in an f-resilient manner, and yet sufficient to solve f-resilient f-set-agreement.

References

[1]
Y. Afek, H. Attiya, D. Dolev, E. Gafni, M. Merritt, and N. Shavit. Atomic snapshots of shared memory. J. ACM, 40(4):873--890, 1993.
[2]
E. Borowsky and E. Gafni. Generalized FLP impossibility result for t-resilient asynchronous computations. In Proceedings of the 25th ACM Symposium on Theory of Computing, pages 91--100, May 1993.
[3]
T. D. Chandra, V. Hadzilacos, and S. Toueg. The weakest failure detector for solving consensus. J. ACM, 43(4):685--722, July 1996.
[4]
T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2):225--267, Mar. 1996.
[5]
S. Chaudhuri. More choices allow more faults: Set consensus problems in totally asynchronous systems. Information and Computation, 105(1):132--158, 1993.
[6]
C. Delporte-Gallet, H. Fauconnier, R. Guerraoui, V. Hadzilacos, P. Kouznetsov, and S. Toueg. The weakest failure detectors to solve certain fundamental problems in distributed computing. In PODC, pages 338--346, 2004.
[7]
C. Delporte-Gallet, H. Fauconnier, R. Guerraoui, and P. Kouznetsov. Mutual exclusion in asynchronous systems with failure detectors. J. Parallel Distrib. Comput., 65(4):492--505, 2005.
[8]
M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374--382, Apr. 1985.
[9]
R. Guerraoui and P. Kouznetsov. On failure detectors and type boosters. In Proceedings of the 17th International Symposium on Distributed Computing, pages 292--305. Springer-Verlag, 2003.
[10]
M. Herlihy and N. Shavit. The asynchronous computability theorem for t-resilient tasks. In Proceedings of the 25th ACM Symposium on Theory of Computing, pages 111--120, May 1993.
[11]
M. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463--492, 1990.
[12]
P. Jayanti. Robust wait-free hierarchies. Journal of the ACM, 44(4):592--614, 1997.
[13]
G. Neiger. Failure detectors and the wait-free hierarchy. In 14th ACM Symposium on Principles of Distributed Computing, 1995.
[14]
M. Raynal and C. Travers. In search of the holy grail: Looking for the weakest failure detector for wait-free set agreement. In OPODIS, pages 3--19, 2006.
[15]
M. Saks and F. Zaharoglou. Wait-free k-set agreement is impossible: The topology of public knowledge. In Proceedings of the Twenty fifth ACM Symposium on Theory of Computing, pages 101--110, May 1993.
[16]
J. Yang, G. Neiger, and E. Gafni. Structured derivations of consensus algorithms for failure detectors. In Proceedings of the 17th ACM Symposium on Principles of Distributed Computing, pages 297--306, 1998.

Cited By

View all
  • (2014)The Generalized Loneliness Detector and Weak System Models for k-Set AgreementIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.7725:4(1078-1088)Online publication date: 1-Apr-2014
  • (2011)The failure detector abstractionACM Computing Surveys10.1145/1883612.188361643:2(1-40)Online publication date: 4-Feb-2011
  • (2009)Brief announcementProceedings of the 28th ACM symposium on Principles of distributed computing10.1145/1582716.1582770(290-291)Online publication date: 10-Aug-2009
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODC '07: Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
August 2007
424 pages
ISBN:9781595936165
DOI:10.1145/1281100
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. failure detectors
  2. set-agreement
  3. wait-free impossibilities
  4. weakest failure detector ever

Qualifiers

  • Article

Conference

PODC07

Acceptance Rates

Overall Acceptance Rate 740 of 2,477 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2014)The Generalized Loneliness Detector and Weak System Models for k-Set AgreementIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.7725:4(1078-1088)Online publication date: 1-Apr-2014
  • (2011)The failure detector abstractionACM Computing Surveys10.1145/1883612.188361643:2(1-40)Online publication date: 4-Feb-2011
  • (2009)Brief announcementProceedings of the 28th ACM symposium on Principles of distributed computing10.1145/1582716.1582770(290-291)Online publication date: 10-Aug-2009
  • (2009)The disagreement power of an adversaryProceedings of the 28th ACM symposium on Principles of distributed computing10.1145/1582716.1582769(288-289)Online publication date: 10-Aug-2009
  • (2009)The weakest failure detector for solving k-set agreementProceedings of the 28th ACM symposium on Principles of distributed computing10.1145/1582716.1582735(83-91)Online publication date: 10-Aug-2009
  • (2009)On the weakest failure detector everDistributed Computing10.1007/s00446-009-0079-321:5(353-366)Online publication date: 1-Feb-2009
  • (2009)Weak Synchrony Models and Failure Detectors for Message Passing (k-)Set AgreementProceedings of the 13th International Conference on Principles of Distributed Systems10.1007/978-3-642-10877-8_23(285-299)Online publication date: 3-Dec-2009
  • (2009)Enhanced Fault-Tolerance through Byzantine Failure DetectionProceedings of the 13th International Conference on Principles of Distributed Systems10.1007/978-3-642-10877-8_12(129-143)Online publication date: 3-Dec-2009
  • (2009)The Minimum Information about Failures for Solving Non-local Tasks in Message-Passing SystemsProceedings of the 13th International Conference on Principles of Distributed Systems10.1007/978-3-642-10877-8_11(115-128)Online publication date: 3-Dec-2009
  • (2008)Sharing is harder than agreeingProceedings of the twenty-seventh ACM symposium on Principles of distributed computing10.1145/1400751.1400764(85-94)Online publication date: 18-Aug-2008
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media