This paper introduces the track join algorithm, which greatly reduces the network traffic and total execution time of join queries between distributed database tables. Even fast hardware still struggles with large datasets. Reducing communication via software is a good solution. But for a good solution to be perfect, the overload on the local central processing units (CPUs) must be reduced, too. Track join fulfills such an endeavor by finding a balance between network cost and CPU cost, which is achieved by optimizing a transfer schedule for each distinct join key after tracking the initial location of the data of that key, which in return allows for efficient and reduced data placement across all nodes over the network.
In fact, track join favors data locality, which means: move the computations, not the data, but create locality on purpose. This job is done through phases--the algorithms are clearly outlined and explained in the paper. In addition, two theorems on the optimality of the approach are instantiated and their proofs well elaborated. Besides the basic illustrations that are offered in order to exemplify the track join algorithm, there are also 17 figures that illustrate the analysis of simulating the overall system performance (network costs, CPU costs, query execution times, and so on).
The paper is very well organized and contains adequate material regarding its analysis of algorithms. It is recommended to distributed database designers and researchers, as well as optimization computer scientists.