Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Analytic models for the primary site approach to fault-tolerance
Huang Y., Jalote P. Acta Informatica26 (6):543-557,1989.Type:Article
Date Reviewed: Oct 1 1990

The modeling method in this paper is intended for the analysis of a certain type of fault-tolerant service. The service is fault-tolerant through replication: one node is designated as the primary node and other nodes form a sequence of backup nodes. The primary node periodically checkpoints its state on the backups. If the primary fails, the next backup takes over as primary and updates the last checkpoint by re-executing the requests performed by the previous primary since that checkpoint. The checkpointing can be performed in three different ways: by broadcasting the state of the service, by sending it serially to all subsequent nodes, or by sending it serially to the surviving subsequent nodes. The original model assumes that a failed node will not be repaired; a refined model removes this restriction.

The model is intended to answer three questions: What is the reliability and availability of the system? How often should a checkpoint be performed? and What should the degree of replication be? The model is based on fundamental probability theory.

The derivation of the results is rather straightforward and, hence, easy to read and understand, but the results may not be usable. It is not clear how reasonable it is to design systems that fit this model (for example, why use backup nodes with shorter expected lifetimes than the primary node has, and why update failed nodes?). Also, some neglected characteristics of the real systems may have nontrivial effects on the analysis (such as how the requests needed to update the checkpoints are stored, retrieved, and used).

The major contribution of the paper is that it shows how nontrivial results can be obtained rather easily by carefully analyzing the real system and using some elementary probabilistic techniques.

Reviewer:  T. Alanko Review #: CR114241
Bookmark and Share
 
Fault-Tolerance (D.4.5 ... )
 
 
Modeling And Prediction (D.4.8 ... )
 
 
Queueing Theory (D.4.8 ... )
 
 
Reliability, Availability, And Serviceability (C.4 ... )
 
 
Distributed Systems (C.2.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Fault-Tolerance": Date
A theory of reliability in database systems
Hadzilacos V. Journal of the ACM 35(1): 121-145, 1988. Type: Article
Oct 1 1988
A technique for constructing highly available services
Ladin R., Liskov B., Shrira L. Algorithmica 3(3): 393-420, 1988. Type: Article
Nov 1 1988
Applications of Byzantine agreement in database systems
Molina H., Pittelli F., Davidson S. ACM Transactions on Database Systems 11(1): 27-47, 1986. Type: Article
Nov 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy