“High-performance computing (HPC) is facing an exponential growth in job output dataset sizes,” which will necessitate a “significant commitment of supercomputing center resources--most notably, precious scratch space--in handling data staging and offloading.” In addition, the purge policies that are usually employed aren’t well matched to user needs in this regard.
The authors observe that in HPC environments like TeraGrid, much of the work is of a collaborative nature, with jobs being submitted by users from geographically dispersed sites. In many instances, there is a requirement for the dispatch of result data to several locations. Therefore, the authors have devised a mechanism to exploit these characteristics, by performing a collaborative offload of job output data. Such data is pushed to an initial “level 1” set of intermediate nodes in chunks, with protection against node/link failure provided by Reed-Solomon 4:5 coding. A BitTorrent scatter-gather protocol is used to transfer chunks to intermediate nodes at a number of levels, until the data arrives at nodes near the submission site(s). Retrieval from these is accomplished using a pull mechanism.
Each intermediate node has a network weather service (NWS) component that is used in the selection of intermediate nodes and the paths thereto. A set of extensions was devised for the Portable Batch System (PBS) job-scheduler, allowing users to nominate in their submission script a destination site, allowable intermediate nodes and capacities, and a deadline time.
Testing was done over 22 PlanetLab sites distributed across the US, and the results are presented in tabular and graphical formats. They are impressive, and I would encourage all who manage or use HPC facilities to take a look.