Computing Reviews, the leading online review service for computing literature.

Search

Heterogeneously tagged caches for low-power embedded systems with virtual memory support
Zhou X., Petrov P. ACM Transactions on Design Automation of Electronic Systems13 (2):1-24,2008.Type:Article

Date Reviewed: Dec 5 2008

Zhou and Petrov propose a cache architecture that aims to provide fast data access and low power consumption. The memory hierarchy of modern computer systems includes a high-speed memory that is faster but smaller than the main memory, called cache memory. Caches reduce the average latency of memory accesses, and can be organized in multiple levels, where the size increases but the speed decreases with the level. The processor accesses the cache in two steps: cache indexing and tag comparison. In cache indexing, the least significant bits of the memory address are used to select a cache set, where a set consists of one (for direct-mapped caches) or several (for set-associative caches) cache lines. Each cache line consists of data, tags, and state bits, with the tags containing the memory address. During tag comparison, the tags of all the lines in the selected set are compared against the memory address. If a match is found, a cache hit occurs and the data from the cache is used. In systems with virtual memory, the processor issues virtual addresses that are translated into physical addresses using a combination of software and hardware. The virtual address space is divided into virtual pages, and the physical space is divided into page frames. A special structure in memory, called the page table, translates virtual page numbers to physical page numbers for each process, and a special cache called the translation lookaside buffer (TLB) caches the page table entries: “TLB is usually implemented as a highly associative cache structure which consumes a significant amount of power.” The memory address used for indexing and for tagging the cache can be either the virtual or the physical address. If both addresses are physical addresses, the cache architecture is called physical cache. Otherwise, it is called virtual cache. The most common kinds of virtual caches are those indexed and tagged with virtual address bits (V/V caches), and those indexed with virtual bits and tagged with physical bits (V/P caches). Physical caches require that address translation be performed before cache indexing for each memory access. For this purpose, the TLB is accessed, which incurs both a performance penalty (because it inserts the TLB in the memory access path) and a power overhead (because of the power consumption of the TLB). In contrast, V/V caches have the advantage that cache accessing does not require address translation (thus no TLB access), which results in fast access and low power consumption. However, V/V caches have the drawback of potential cache consistency problems. These problems can occur when the virtual-to-physical page mapping is changed by the operating system, or when multiple processes share some physical memory (that is, parts of the virtual address spaces of two processes are mapped to the same physical memory). The following kinds of cache consistency problems can occur: synonyms, aliases, homonyms, and cache coherence. (Cekleov and Dubois define these cache consistency problems in their paper [1].) In uniprocessor systems, cache coherence problems can occur when synonyms for shared writable data exist. Since information in instruction caches is not modified by processes, V/V caches can be safely used for instruction caches. The homonym problem is solved by extending the virtual tags with the process ID of the process that issues the virtual address. In V/P caches, cache indexing proceeds in parallel with address translation, hiding some of the address translation latency. Tag comparison occurs when both indexing and address translation are complete. V/P caches consume more power than the V/V caches, are slower than V/V caches, and are faster than physical caches. However, V/P caches have the advantage over V/V caches in that cache consistency problems can be easily avoided. Therefore, V/P caches can be safely used for data caches. The cache architecture proposed by the authors tries to combine the low power consumption and fast access of V/V caches with the elimination of consistency problems provided by V/P caches. The authors introduce a hybrid tagging scheme that uses virtual tags for private data and physical tags for shared data, employing application-specific information in order to decide which kind of tag to use for a certain virtual page. Shared pages are identified using a combination of source-code annotation, compiler support, and additional hardware. The source code of the application declares shared data using #pragma directives. A portion of the virtual address space is reserved for shared data and the compiler maps data declared as shared to that reserved space. Finally, combinational logic (for example, a three-input and gate if the reserved address space is identified by ones in the most significant three address bits) is used to detect shared pages. The merits of the proposed technique are questionable. First, the requirement to annotate the applications implies that existing applications need to be modified. Second, applying the techniques requires compiler support and changes to the processor logic. The presentation style of the authors is not very clear, some ideas are repeatedly stated using slightly different phrasing, and some technical mistakes are made by the authors. For example, the authors incorrectly define cache aliasing as “a situation where the same virtual address from different tasks is mapped to different physical addresses.” In fact, this defines homonyms [1].

Reviewer: Gabriel Mateescu	Review #: CR136312 (0909-0839)

1)	Cekleov, M.; Dubois, M. Virtual-address caches Part 1: problems and solutions in uniprocessors. IEEE Micro 17, 5(1997), 64–71.

Cache Memories (B.3.2 ... )

Virtual Memory (B.3.2 ... )

Would you recommend this review?

yes

Other reviews under "Cache Memories":	Date

The effects of processor architecture on instruction memory traffic Mitchell C., Flynn M. ACM Transactions on Computer Systems 8(3): 230-250, 2000. Type: Article	Oct 1 1991

Efficient sparse matrix factorization on high performance workstations--exploiting the memory hierarchy Rothberg E., Gupta A. ACM Transactions on Mathematical Software 17(3): 313-334, 1991. Type: Article	Dec 1 1992

Cache behavior of combinator graph reduction Philip J. J. (ed), Lee P. (ed), Siewiorek D. (ed) ACM Transactions on Programming Languages and Systems 14(2): 265-297, 1992. Type: Article	Feb 1 1993

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy