Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Fault Tolerance in Commercial Computers
Siewiorek D. Computer23 (7):26-37,1990.Type:Article
Date Reviewed: Jun 1 1991

Siewiorek is a well-known researcher from a prestigious university and the author of a good book on fault tolerance. I therefore expected this paper to contain some new insights into the rather difficult art of designing fault-tolerant computer systems. This paper is a disappointment, however. Several commercial systems are described superficially, using what seem to be almost direct quotations from their vendors’ literature. The author makes no attempt to analyze their design decisions or limitations in any depth. For example, he mentions that Stratus does away with checkpointing, which he sees as an advantage over systems such as Tandem that require process checkpointing. Nothing is said, however, of the fact that this is at the expense of processing power; a system using process pairs needs to duplicate only critical processes, while the Stratus approach quadruplicates the execution of every process. Also, since the Stratus hardware configuration requires a continuous comparison of processor outputs, approaches such as N-version programming are precluded--software fault tolerance is not possible in this approach.

The descriptions use unexplained vendor’s jargon, such as “firewall” for the DEC system of Figure 3 and “fallback” for the Teradata system. Some of the information is also dated: Tandem’s new system is quite different from the one described here, and Sequoia is said to “produce” a modularly expandable system while in fact they are now out of business.

I wonder how the usually strict IEEE Computer reviewers and editors selected this paper for a special issue on fault-tolerant systems for which many papers were submitted. Maybe they were so impressed by the names of the author and his institution that they were afraid to read the paper critically.

Reviewer:  E. B. Fernandez Review #: CR114719
Bookmark and Share
  Featured Reviewer  
 
Reliability, Availability, And Serviceability (C.4 ... )
 
 
Error-Checking (B.1.3 ... )
 
 
Control Structure Reliability, Testing, And Fault-Tolerance (B.1.3 )
 
 
Multiple Data Stream Architectures (Multiprocessors) (C.1.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Reliability, Availability, And Serviceability": Date
Implementing fault-tolerant services using the state machine approach: a tutorial
Schneider F. ACM Computing Surveys 22(4): 299-319, 2001. Type: Article
Jul 1 1992
Network reliability and algebraic structures
Shier D., Clarendon Press, New York, NY, 1991. Type: Book (9780198533863)
Sep 1 1992
On building systems that will fail
Corbató F. Communications of the ACM 34(9): 72-81, 1991. Type: Article
Sep 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy