Loading…

On implementing omega in systems with weak reliability and synchrony assumptions

We study the feasibility and cost of implementing Ω —a fundamental failure detector at the core of many algorithms—in systems with weak reliability and synchrony assumptions. Intuitively, Ω allows processes to eventually elect a common leader. We first give an algorithm that implements Ω in a weak s...

Full description

Saved in:

Bibliographic Details
Published in:	Distributed computing 2008-10, Vol.21 (4), p.285-314
Main Authors:	Aguilera, Marcos K., Delporte-Gallet, Carole, Fauconnier, Hugues, Toueg, Sam
Format:	Article
Language:	English
Subjects:	Computer Communication Networks Computer Hardware Computer Science Computer Systems Organization and Communication Networks Software Engineering/Programming and Operating Systems Theory of Computation
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We study the feasibility and cost of implementing Ω —a fundamental failure detector at the core of many algorithms—in systems with weak reliability and synchrony assumptions. Intuitively, Ω allows processes to eventually elect a common leader. We first give an algorithm that implements Ω in a weak system S where (a) except for some unknown timely process s , all processes may be arbitrarily slow or may crash, and (b) only the output links of s are eventually timely (all other links can be arbitrarily slow and lossy). Previously known algorithms for Ω worked only in systems that are strictly stronger than S in terms of reliability or synchrony assumptions.We next show that algorithms that implement Ω in system S are necessarily expensive in terms of communication complexity: all correct processes (except possibly one) must send messages forever; moreover, a quad-ratic number of links must carry messages forever. This result holds even for algorithms that tolerate at most one crash. Finally, we show that with a small additional assumption to system S —the existence of some unknown correct process whose links can be arbitrarily slow and lossy but fair—there is a communication-efficient algorithm for Ω such that eventually only one process (the elected leader) sends messages. Some recent experimental results indicate that two of the algorithms for Ω described in this paper can be used in dynamically-changing systems and work well in practice [Schiper, Toueg in Proceedings of the 38th International Conference on Dependable Systems and Networks, pp. 207–216 (2008)].
ISSN:	0178-2770 1432-0452
DOI:	10.1007/s00446-008-0068-y