Computing Reviews

Analytic models for the primary site approach to fault-tolerance
Huang Y., Jalote P. Acta Informatica26(6):543-557,1989.Type:Article
Date Reviewed: 10/01/90

The modeling method in this paper is intended for the analysis of a certain type of fault-tolerant service. The service is fault-tolerant through replication: one node is designated as the primary node and other nodes form a sequence of backup nodes. The primary node periodically checkpoints its state on the backups. If the primary fails, the next backup takes over as primary and updates the last checkpoint by re-executing the requests performed by the previous primary since that checkpoint. The checkpointing can be performed in three different ways: by broadcasting the state of the service, by sending it serially to all subsequent nodes, or by sending it serially to the surviving subsequent nodes. The original model assumes that a failed node will not be repaired; a refined model removes this restriction.

The model is intended to answer three questions: What is the reliability and availability of the system? How often should a checkpoint be performed? and What should the degree of replication be? The model is based on fundamental probability theory.

The derivation of the results is rather straightforward and, hence, easy to read and understand, but the results may not be usable. It is not clear how reasonable it is to design systems that fit this model (for example, why use backup nodes with shorter expected lifetimes than the primary node has, and why update failed nodes?). Also, some neglected characteristics of the real systems may have nontrivial effects on the analysis (such as how the requests needed to update the checkpoints are stored, retrieved, and used).

The major contribution of the paper is that it shows how nontrivial results can be obtained rather easily by carefully analyzing the real system and using some elementary probabilistic techniques.

Reviewer:  T. Alanko Review #: CR114241

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy