Input/output benchmarks should stress the I/O subsystem. Patterson and Chen claim that many existing I/O benchmarks do not. Therefore, they propose self-scaling, by which the benchmark observes its own performance and drives the load it is generating into the range that stresses the I/O capacity of the system, and not, for example, the CPU or memory. The five parameters of I/O workload they use are the number of unique data bytes read or written; the average size of a request; the fraction of reads to total number of I/O requests; the fraction of requests that follow the previous one in sequence; and the number of processes running the I/O benchmark.
By varying the five-dimensional space of these parameters and observing the shape of the resulting curves, the authors develop a predictive methodology by which workload performance on other, unmeasured systems is projected.
This paper is worthwhile. Through examples, the authors illustrate the kind of information about system behavior that is usually obvious in retrospect, but not always beforehand. For example, when might a system perform better on writing than reading? The answer is, when it batches many small writes into a few large ones.
Unsurprisingly, the authors’ claims for I/O performance prediction are more arguable than the benefits of I/O benchmark self-scaling. Chen and Patterson observe that there are transition regions in the performance curves, generally when the amount of data touched in the benchmark increases past the size of the buffer cache. Although they allow themselves to predict performance separately in those two domains, it is not clear that they treat competing predictive techniques similarly in their comparisons.