The predictability of data cache misses in nonnumeric codes where the compiler is unable to analyze data locality is important, because there is a performance cost to any latency hiding technique. Performance is optimized if these techniques are applied only at each dynamic instance of a static memory reference that would have been a miss.
Summary profiling involves obtaining the miss ratios for each static memory reference that is executed frequently enough to make a nontrivial contribution to execution time. For any static memory reference whose miss ratio is either close to zero or close to 100 percent, the optimal case can be approximated closely by using the latency hiding techniques on every dynamic instance only for static references with miss ratios close to 100 percent. If there are static references with intermediate miss ratios, the paper proposes and evaluates correlation profiling, which helps predict which of the dynamic instances of the static memory reference will hit or miss. In this case, hits and misses are correlated with information such as recent control-flow paths, or whether recent memory references hit or missed in the cache, both globally and for this static memory reference in particular. Analogous forms of profiling have been used for branch prediction.
Results of correlation profiling are presented for 21 nonnumeric applications. To help readers understand why correlation profiling succeeds, the authors study some applications in detail.