Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Do-all computing in distributed systems
Georgiou C., Shvartsman A., Springer-Verlag New York, Inc., Secaucus, NJ, 2007. 219 pp. Type: Book (9780387309187)
Date Reviewed: Jul 15 2008

In the so-called Do-All problem, p processors that are connected through a network are required to perform n independent tasks cooperatively and in the presence of adversity. This abstraction has become more significant in recent years since it can be used to model and study the strengths and limitations of network-based supercomputing. In their well-written monograph, Georgiou and Shvartsman unify this research effort by collecting, explaining, and rigorously verifying different solutions to the Do-All problem in various settings.

Until very recently, supercomputing was the exclusive domain of large corporations, universities, and research institutes. To name a few examples, solving a partial differential equation, processing images, or breaking a cryptographic scheme used to require access to a supercomputer. Partly due to the cheap availability of high-bandwidth Internet connections, it has now become possible to implement network-based supercomputers that do not require a large investment in expensive hardware. Such Internet supercomputing comes at a much lower cost than purchasing a supercomputer or building a cluster of powerful machines.

However, Internet supercomputing does not rely on the availability of high-speed Internet connections alone. To be feasible, it also requires that the given computational problem can be divided into many independent tasks that can be performed by a large number of independent processors connected through a network. This leads to two almost opposite challenges: to achieve the desired computational speed-up, and to provide the required redundancy to obtain a desired level of fault tolerance. Developing distributed algorithms that balance these two seemingly opposite interests is a challenging problem. It requires careful study and abstraction of the underlying problems. This abstraction is called the Do-All problem.

It is difficult to combine fault tolerance with efficiency: fault tolerance requires the introduction of redundancy, while efficiency requires the removal of redundancy. The authors carefully explain and verify advances made in solving the Do-All problem in distributed message-passing settings, under various models of adversity, such as crashes, asynchrony, message delays, network partitions, and malicious processor behavior.

The monograph begins in chapter 1 with a definition of the Do-All problem, and a discussion of several variants of it in different models of computation. Chapter 2 lays the groundwork for the remaining chapters by formally defining adversarial models, the nature of the tasks to be solved, the Do-All problem, the Omni-Do problem (Do-All for partitionable networks), and the required complexity measures for the evaluation of the algorithms. Chapter 3 studies Do-All for distributed settings with processor crashes. Upper and lower bounds for Do-All are provided under the assumption of perfect knowledge, namely where an algorithm is aided by an omniscient oracle and processors communicate via reliable broadcasts. In chapter 4, processors communicate via point-to-point messages. In chapter 5, Do-All is solved in a model where processors not only crash, but also restart. Chapter 6 considers a model where processors potentially exhibit Byzantine behavior. In chapter 7, the model is modified to allow an adversary to introduce message delays and asynchrony. Chapter 8 considers partitionable networks and the Omni-Do problem, and chapter 9 covers Omni-Do with arbitrary reconfigurations. Chapter 10 presents a model for Do-All where processors start in singleton groups and are later allowed to rendezvous; the focus is on minimizing the redundant work before the rendezvous. Finally, chapter 11 surveys related models of cooperation.

The book is well written, and provides an excellent overview of its subject. Its formal approach may be of particular interest to theoreticians, while the algorithms, with lower and upper bounds presented, are of importance and interest to any practitioner in the field. I especially recommend this monograph to readers interested in implementing an Internet-based supercomputing scheme. The book’s focus on the tradeoff between efficiency and fault tolerance will be welcome to such practitioners, as they will be able to obtain invaluable a priori information about the limits of efficiency and fault tolerance in various settings.

Reviewer:  Burkhard Englert Review #: CR135836 (0905-0415)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Distributed Systems (D.4.7 ... )
 
 
Intelligent Agents (I.2.11 ... )
 
 
General (F.2.0 )
 
 
General (D.4.0 )
 
Would you recommend this review?
yes
no
Other reviews under "Distributed Systems": Date
The design of the Saguaro distributed operating system
Andrews G., Schlichting R., Hayes R., Purdin T. IEEE Transactions on Software Engineering 13(1): 104-118, 1987. Type: Article
Sep 1 1987
Modern operating systems
Tanenbaum A., Prentice-Hall, Inc., Upper Saddle River, NJ, 1992. Type: Book (9780135881873)
Dec 1 1992
The drinking philosophers problem
Chandy K., Misra J. ACM Transactions on Programming Languages and Systems 6(4): 632-646, 1984. Type: Article
Jun 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy