Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Timely offloading of result-data in HPC centers
Monti H., Butt A., Vazhkudai S.  Supercomputing (Proceedings of the 22nd Annual International Conference on Supercomputing, Island of Kos, Greece, Jun 7-12, 2008)124-133.2008.Type:Proceedings
Date Reviewed: Jul 29 2008

“High-performance computing (HPC) is facing an exponential growth in job output dataset sizes,” which will necessitate a “significant commitment of supercomputing center resources--most notably, precious scratch space--in handling data staging and offloading.” In addition, the purge policies that are usually employed aren’t well matched to user needs in this regard.

The authors observe that in HPC environments like TeraGrid, much of the work is of a collaborative nature, with jobs being submitted by users from geographically dispersed sites. In many instances, there is a requirement for the dispatch of result data to several locations. Therefore, the authors have devised a mechanism to exploit these characteristics, by performing a collaborative offload of job output data. Such data is pushed to an initial “level 1” set of intermediate nodes in chunks, with protection against node/link failure provided by Reed-Solomon 4:5 coding. A BitTorrent scatter-gather protocol is used to transfer chunks to intermediate nodes at a number of levels, until the data arrives at nodes near the submission site(s). Retrieval from these is accomplished using a pull mechanism.

Each intermediate node has a network weather service (NWS) component that is used in the selection of intermediate nodes and the paths thereto. A set of extensions was devised for the Portable Batch System (PBS) job-scheduler, allowing users to nominate in their submission script a destination site, allowable intermediate nodes and capacities, and a deadline time.

Testing was done over 22 PlanetLab sites distributed across the US, and the results are presented in tabular and graphical formats. They are impressive, and I would encourage all who manage or use HPC facilities to take a look.

Reviewer:  G. K. Jenkins Review #: CR135882 (0909-0843)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Network Operating Systems (C.2.4 ... )
 
 
Storage Hierarchies (D.4.2 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Network Operating Systems": Date
Simulations of three adaptive, decentralized controlled, job scheduling algorithms
Stankovic J. (ed) Computer Networks and ISDN Systems 8(3): 199-217, 1984. Type: Article
Nov 1 1985
Models of the task assignment problem in distributed systems
Lucertini M. (ed), Springer-Verlag New York, Inc., New York, NY, 1984. Type: Book (9780387818160)
Jul 1 1985
Operating system design; vol. 2: internetworking with XINU
Comer D., Prentice-Hall, Inc., Upper Saddle River, NJ, 1987. Type: Book (9789780136374145)
Feb 1 1988
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy