Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A cost-effective load-balancing policy for tile-based, massive multi-core packet processors
Musoll E. ACM Transactions on Embedded Computing Systems9 (3):1-25,2010.Type:Article
Date Reviewed: Jul 9 2010

The paper presents a tile-based massive multi-core architecture with cost-effective load balancing, as well as low-power dissipation by switching off unused cores and higher processing throughput.

The basic idea behind the load-balancing policy is based on dynamic virtual clustering (DVC), where tile-based multi-cores are numbered starting from the lower left corner. The flow number is generated with a hash function, which provides a uniform distribution among all clusters. The power consumption is achieved by using three different techniques: DVC, round-robin block (RRB), and the square (SQR) method. The paper shows that DVC presents a lower latency, but the throughput is similar for all three techniques.

This particular architecture focuses mainly on thread-level parallelism, where different threads are scheduled on different cores efficiently and cores that are not used are switched off to save power, and by assigning the threads related to the adjacent cores; this way, high throughput can be achieved. However, the paper fails to address data dependency and branch hazards, where a single thread assigned to a particular core can cause a lot of delay. The thread-level parallelism should be complemented with the instructional-level parallelism, in order to achieve maximum performance for the whole architecture. Another issue that the paper fails to address is the fault tolerance of cores in a multi-core architecture: How does the thread switch to another core? If a thread is scheduled to one particular core, does it execute and finish from that particular core? What happens if a thread needs to switch to another core during its execution? Some scholars try to create a nonblocking thread, where each thread has a characteristic of a nonblocking thread that makes it more suitable in massive parallel architectures.

Unfortunately, the author does not provide a comparison to the TERAFLUX project, a massively parallel tiled computer architecture that ties hardware and software closely enough to use fine-grained parallelism processes, in a distributed network environment.

In order to achieve a high overall execution performance, together with the power-saving capability, one must aim to both increase processor throughput and reduce data transfer latency in a multi-core architecture. In order to achieve the best performance possible, the hardware and software must be closely tied. Furthermore, for the architecture to achieve its overall goal, thread-level parallelism must be complemented with instruction-level parallelism.

Reviewer:  J. Arul Review #: CR138156 (1012-1246)
Bookmark and Share
  Featured Reviewer  
 
Distributed Architectures (C.1.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Distributed Architectures": Date
Distributed and parallel computing
El-Rewini H., Lewis T. (ed), Manning Publications Co., Greenwich, CT, 1998. Type: Book (9780137955923)
Mar 1 1999
In search of clusters (2nd ed.)
Pfister G., Prentice-Hall, Inc., Upper Saddle River, NJ, 1998. Type: Book (9780138997090)
Nov 1 1998
A correctness condition for high-performance multiprocessors
Attiya H., Friedman R. SIAM Journal on Computing 27(6): 1637-1670, 1998. Type: Article
May 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy