This research paper presents a general technique for implementing a highly available service in a distributed computer system. The technique works even when processors fail by stopping, the network partitions, or messages are lost, duplicated, or delivered out of order. Originally developed to detect orphans in the Argus system, the technique may be applied to any service in which updates enjoy idempotence and commutativity and in which queries can use old information. At the cost of additional delay, however, clients can use timestamps to order their updates or to obtain the most recent information.
In essence, the service comprises a small number of replicas in the network, and a client communicates with just one replica. Every replica maintains a state and a timestamp of the state. To obtain the most recent state, the replicas communicate with each other via gossip messages either in background or when necessary to process a query. The service uses multipart timestamps, with one part for each replica. Upon receiving a query, a replica compares its timestamp to the timestamp in the query to determine whether it has the needed information.
The exposition is superb. In particular, the authors present their algorithm through a sequence of increasingly more efficient versions, with a clear motivation for each new idea. The authors state their assumptions explicitly, include small examples, and provide forward references to later parts of the paper when appropriate. A lucid, rigorous proof of correctness appears in the appendix.
Compared with voting, the new technique appears to be more efficient when crashes and partitions are rare, as operations require locking at only one replica. Voting handles more general services with operations that do not enjoy idempotence and commutativity, however. A definitive assessment of the performance of the new technique would require either extensive simulations or implementation on an actual distributed system.