This paper presents the authors’ insights into how customized cache subsystems can improve the performance of applications employing embedded microprocessor cores.
Part 1 of this six-part paper introduces the subject, and refers to previous works. Related work is covered next, followed by the authors’ own technical “exact algorithmic approach” to computing parameters. They then offer an exposition of experimental results, and conclude with an outline of future work and remarks. An extensive set of references, mostly from the late 1990s, provides a source for further reading.
From the authors’ point of view, previous work, based on simulations and heuristics, is time consuming and incomplete. The final designs are an attempt, after several iterations, to discover an optimized instantiation. While these techniques do produce a serviceable design, the total design space is too large for these solutions to fully explore, as the authors point out. Hence, their proposed solution approaches the problem algorithmically. Since most embedded systems are relatively small, the algorithm is only limited by the capacity of the host computing engine and its memory size. The paper describes the approach in detail; it is divided into pre-processing, processing, and post-processing phases.
The authors validated their approach on 16 embedded system applications, using a Pentium IV processor, running at 2.8 gigahertz, with 512 megabytes of memory. Several tables present results, and support the authors’ contention of improved results, when compared with existing alternative methods. The authors expect to extend their work to include more design parameters, multilevel caches, and bus architecture effects.
As embedded systems proliferate, and as users demand better performance, techniques such as the ones described here will take on increased significance.