Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Recovery in the Calypso file system
Devarakonda M., Kish B., Mohindra A. ACM Transactions on Computer Systems14 (3):287-310,1996.Type:Article
Date Reviewed: Mar 1 1997

Calypso is a stateful, scalable distributed Unix file system that is implemented on IBM RISC clusters running AIX. This paper describes the recovery scheme used in Calypso, which guarantees data consistency of the file system across processor failures. The authors highlight Calypso recovery as nondisruptive and based on reconstructing the distributed state rather than on replicating the server state.

The state of a Calypso file system is defined in terms of a set of tokens, which represents client access to parts of files and client authorization to perform file and directory operations. The state of the tokens is distributed among the clients as well as maintained at the server. The recovery protocol reconstructs the state of the tokens by a series of remote procedure calls among the clients and the server. The recovery protocol depends on software services external to Calypso for failure detection and group membership. It also depends on hardware-level redundancy in the form of multiported disks for data availability and on the Journaled File System of AIX for server recovery.

In addition to a description of the recovery protocol, the authors compare the Calypso recovery scheme to that used in other distributed file systems, including Sprite and Spritely NFS. They also examine the bottlenecks of the recovery scheme by measuring its performance on two distributed file system benchmarks. They conclude that the recovery scheme provides relatively short recovery times with low overhead.

Reviewer:  S. K. Andrianoff Review #: CR120441 (9703-0198)
Bookmark and Share
 
Fault-Tolerance (D.4.5 ... )
 
 
Backup/ Recovery (E.5 ... )
 
 
Distributed File Systems (D.4.3 ... )
 
 
Distributed Systems (D.4.7 ... )
 
 
File Systems Management (D.4.3 )
 
 
Organization And Design (D.4.7 )
 
  more  
Would you recommend this review?
yes
no
Other reviews under "Fault-Tolerance": Date
A theory of reliability in database systems
Hadzilacos V. Journal of the ACM 35(1): 121-145, 1988. Type: Article
Oct 1 1988
A technique for constructing highly available services
Ladin R., Liskov B., Shrira L. Algorithmica 3(3): 393-420, 1988. Type: Article
Nov 1 1988
Applications of Byzantine agreement in database systems
Molina H., Pittelli F., Davidson S. ACM Transactions on Database Systems 11(1): 27-47, 1986. Type: Article
Nov 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy