Computing Reviews, the leading online review service for computing literature.

Search

A technique for constructing highly available services
Ladin R., Liskov B., Shrira L. Algorithmica3 (3):393-420,1988.Type:Article

Date Reviewed: Nov 1 1988

This research paper presents a general technique for implementing a highly available service in a distributed computer system. The technique works even when processors fail by stopping, the network partitions, or messages are lost, duplicated, or delivered out of order. Originally developed to detect orphans in the Argus system, the technique may be applied to any service in which updates enjoy idempotence and commutativity and in which queries can use old information. At the cost of additional delay, however, clients can use timestamps to order their updates or to obtain the most recent information. In essence, the service comprises a small number of replicas in the network, and a client communicates with just one replica. Every replica maintains a state and a timestamp of the state. To obtain the most recent state, the replicas communicate with each other via gossip messages either in background or when necessary to process a query. The service uses multipart timestamps, with one part for each replica. Upon receiving a query, a replica compares its timestamp to the timestamp in the query to determine whether it has the needed information. The exposition is superb. In particular, the authors present their algorithm through a sequence of increasingly more efficient versions, with a clear motivation for each new idea. The authors state their assumptions explicitly, include small examples, and provide forward references to later parts of the paper when appropriate. A lucid, rigorous proof of correctness appears in the appendix. Compared with voting, the new technique appears to be more efficient when crashes and partitions are rare, as operations require locking at only one replica. Voting handles more general services with operations that do not enjoy idempotence and commutativity, however. A definitive assessment of the performance of the new technique would require either extensive simulations or implementation on an actual distributed system.

Reviewer: M. C. Loui	Review #: CR112804

Fault-Tolerance (D.4.5 ... )

Distributed Databases (H.2.4 ... )

Distributed Databases (C.2.4 ... )

Distributed Systems (D.4.7 ... )

Would you recommend this review?

yes

Other reviews under "Fault-Tolerance":	Date

A theory of reliability in database systems Hadzilacos V. Journal of the ACM 35(1): 121-145, 1988. Type: Article	Oct 1 1988

Applications of Byzantine agreement in database systems Molina H., Pittelli F., Davidson S. ACM Transactions on Database Systems 11(1): 27-47, 1986. Type: Article	Nov 1 1986

Analytic models for the primary site approach to fault-tolerance Huang Y., Jalote P. Acta Informatica 26(6): 543-557, 1989. Type: Article	Oct 1 1990

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy