Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Utilizing shared data in chip multiprocessors with the Nahalal architecture
Guz Z., Keidar I., Kolodny A., Weiser U.  SPAA 2008 (Proceedings of the 20th Annual Symposium on Parallelism in Algorithms and Architectures, Munich, Germany, Jun 14-16, 2008)1-10.2008.Type:Proceedings
Date Reviewed: Dec 23 2008

In a multicore processor chip, the L2 cache may be organized as one private L2 cache per core or a single shared L2 cache. The private cache approach requires smaller caches than a shared cache, and thus has faster cache access time than its shared counterpart. However, the private cache approach requires cache coherence enforcement, and the aggregate L2 capacity may not be used efficiently due to the data duplication. The cache coherence enforcement overhead and data duplication can be eliminated by sharing the cache, but at the expense of higher access time.

As the authors point out, one reason for high shared-cache access time is the wire delay that is a function of the distance between the requesting core and the L2 cache bank or partition with the requested cache line. To reduce the shared-cache access time, they propose to partition the cache into shared and private. The shared partition will be placed in the center to minimize the distance from cores to the shared L2 partition. The private partitions are placed close to their corresponding cores. Initially, a fetched cache line is placed in the private partition associated with the requesting core; it may gradually migrate to the shared partition, depending on the frequency of access to the cache line. In the case of shared partition replacement, the migrating cache line is swapped with the shared cache line being replaced. As a result of this migration and replacement, a cache line may be in any of the private or shared partitions; this leads to the search mechanism proposed by the authors. The basic search algorithm is to search the private partition of the requesting core first, and then the shared partition, and to perform a sequential search of the other private partitions.

While the paper proposes a novel mechanism to solve an important problem, the simulation studies do not provide enough details on the simulation environment. In addition, a comparison of the multicore case with the private L2 cache case, as well as a small shared L2 with a larger private L1 cache, would have made the advantage of the proposed mechanism clearer and more convincing. The authors fail to address or show scalability of the proposed scheme.

Overall, the paper is well written and, for the most part, easy to follow.

Reviewer:  Farnaz Toussi Review #: CR136368 (1005-0481)
Bookmark and Share
  Editor Recommended
 
 
Cache Memories (B.3.2 ... )
 
 
Multiple Data Stream Architectures (Multiprocessors) (C.1.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Cache Memories": Date
The effects of processor architecture on instruction memory traffic
Mitchell C., Flynn M. ACM Transactions on Computer Systems 8(3): 230-250, 2000. Type: Article
Oct 1 1991
Efficient sparse matrix factorization on high performance workstations--exploiting the memory hierarchy
Rothberg E., Gupta A. ACM Transactions on Mathematical Software 17(3): 313-334, 1991. Type: Article
Dec 1 1992
Cache behavior of combinator graph reduction
Philip J. J. (ed), Lee P. (ed), Siewiorek D. (ed) ACM Transactions on Programming Languages and Systems 14(2): 265-297, 1992. Type: Article
Feb 1 1993
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy