Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Performance of fault-tolerant data and compute intensive programs over a network of workstations
Smith J., Shrivastava S. Theoretical Computer Science196 (1-2):319-345,1998.Type:Article
Date Reviewed: Jan 1 1999

A fault-tolerant implementation for parallel execution of data- and computation-intensive programs that require persistent storage and fault-tolerant facilities is described. The well-known “bag of tasks” structuring technique is used together with remote procedure calls (RPCs) for parallel execution of tasks in a network of workstations. The bag of tasks balances load between an arbitrary collection of slave processes. On the other hand, a single bag of tasks is insufficient to express more general structures in cases where a task depends on more than one other task. In this case, an additional synchronization mechanism is required. Derivation of the upper and lower limits for the computation time required for the parallel processing of tasks is outlined. The estimates derived are demonstrated on three applications, namely, the publicly available ray tracing package rayshade; matrix multiplication; and Cholesky factorization.

The authors assess performance based on these three examples. Some of the most interesting conclusions reached in this section are, first, that increasing task size improves performance. Second, the difference between the fault-tolerant execution time and the non-fault-tolerant execution time is not very significant. Third, for sufficiently large granularity, the cost of fault tolerance is small, but fault tolerance provides significant improvement in the event of a failure. Finally, doubling the computing speed of the workstations would not affect the achievable execution rate, but would reduce the number of machines required.

With the increasing number of networks of workstations and the increasing need for data- and computation-intensive applications, the implementation outlined and the conclusions reached in the paper have a high likelihood of being used extensively in the future.

Reviewer:  Ismet Gungor Review #: CR121870 (9901-0026)
Bookmark and Share
 
Fault Tolerance (C.4 ... )
 
 
Parallelism And Concurrency (F.1.2 ... )
 
 
Workstations (C.5.3 ... )
 
 
Distributed Systems (C.2.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Fault Tolerance": Date
System diagnosis with smallest risk of error
Diks K., Pelc A. Theoretical Computer Science 203(1): 163-173, 1998. Type: Article
Mar 1 1999
Coding approaches to fault tolerance in combinational and dynamic systems
Hadjicostis C., Kluwer Academic Publishers, Norwell, MA, 2001.  216, Type: Book (9780792376248)
Jul 2 2002
Model-based programming of fault-aware systems
Williams B., Ingham M., Chung S., Elliott P., Hofbaur M., Sullivan G. AI Magazine 24(4): 61-75, 2004. Type: Article
May 11 2004
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy