Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A technique for constructing highly available services
Ladin R., Liskov B., Shrira L. Algorithmica3 (3):393-420,1988.Type:Article
Date Reviewed: Nov 1 1988

This research paper presents a general technique for implementing a highly available service in a distributed computer system. The technique works even when processors fail by stopping, the network partitions, or messages are lost, duplicated, or delivered out of order. Originally developed to detect orphans in the Argus system, the technique may be applied to any service in which updates enjoy idempotence and commutativity and in which queries can use old information. At the cost of additional delay, however, clients can use timestamps to order their updates or to obtain the most recent information.

In essence, the service comprises a small number of replicas in the network, and a client communicates with just one replica. Every replica maintains a state and a timestamp of the state. To obtain the most recent state, the replicas communicate with each other via gossip messages either in background or when necessary to process a query. The service uses multipart timestamps, with one part for each replica. Upon receiving a query, a replica compares its timestamp to the timestamp in the query to determine whether it has the needed information.

The exposition is superb. In particular, the authors present their algorithm through a sequence of increasingly more efficient versions, with a clear motivation for each new idea. The authors state their assumptions explicitly, include small examples, and provide forward references to later parts of the paper when appropriate. A lucid, rigorous proof of correctness appears in the appendix.

Compared with voting, the new technique appears to be more efficient when crashes and partitions are rare, as operations require locking at only one replica. Voting handles more general services with operations that do not enjoy idempotence and commutativity, however. A definitive assessment of the performance of the new technique would require either extensive simulations or implementation on an actual distributed system.

Reviewer:  M. C. Loui Review #: CR112804
Bookmark and Share
 
Fault-Tolerance (D.4.5 ... )
 
 
Distributed Databases (H.2.4 ... )
 
 
Distributed Databases (C.2.4 ... )
 
 
Distributed Systems (D.4.7 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Fault-Tolerance": Date
A theory of reliability in database systems
Hadzilacos V. Journal of the ACM 35(1): 121-145, 1988. Type: Article
Oct 1 1988
Applications of Byzantine agreement in database systems
Molina H., Pittelli F., Davidson S. ACM Transactions on Database Systems 11(1): 27-47, 1986. Type: Article
Nov 1 1986
Analytic models for the primary site approach to fault-tolerance
Huang Y., Jalote P. Acta Informatica 26(6): 543-557, 1989. Type: Article
Oct 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy