There are many variations on the subject of content replication and distribution for enhancing performance when serving Web-based content. This paper focuses on the particular set of problems associated with selecting and distributing static content across a set of geographically dispersed systems, with constraints on both internally generated traffic, and the storage available at each node. These constraints drove the authors of this paper to a set of algorithms that provided results to support their claim of achieving performance comparable to full-replication schemes, while consuming only half the storage.
The authors provide a well thought-out presentation of the problem, and a mathematical representation that provides a solid basis for the simulation and analysis that is provided throughout the remainder of the paper. The various algorithms are presented clearly, in pseudo code, and the text includes sufficient examples and discussions to walk the reader through the various subtleties and performance characteristics of each.
Two primary experiments are detailed by the authors. The first deals with server nodes that are in a geographically similar location, in which the issue of proximity to users does not play a role in the distribution algorithm. The second scenario goes on to consider the impact of a geographically dispersed configuration of server nodes, which impacts the performance of the overall system, and thus requires a different approach to content distribution. Both scenarios are well documented, and the results are clearly stated and graphically depicted.
The paper will be valuable for those interested in achieving a combination of improved response time in concert with load balancing, but who must accomplish these tasks with limited storage and load capacity.