The current trend in multicore technology on a single chip increases day by day; it may even reach up to 100-core versions in the future. As this trend continues, interconnection will play a key role in scaling up to teraFLOPS processors, as well as in reducing power and area consumption.
Earlier research on interconnection architecture focused on crossbars and packet buffers. In these methods, scholars tried to achieve power and area reduction mainly through decomposition and segmentation approaches. This particular study uses a tile-based, custom-made approach known as MoDe-X (short for modular decoupled crossbar), which combines decomposition and segmentation approaches.
The proposed MoDe-X router architecture varies in a few aspects from the RoCo router, which is known as the row (east-west) and column (north-south) crossbar. The MoDe-X router reduces the complex wiring on the input side; however, it may require additional wiring to direct flits to an appropriate subcrossbar, and does not have early ejection packets destined for the local node scheme, as the RoCo router does.
The internal architecture of MoDe-X is also different from RoCo: it uses a tri-state buffer, which is used in the middle and separates the link into two segments. Thus, it naturally reduces the complexity around the buffer without affecting performance. As a result, it can reduce the area and save power. The proposed new architecture provides a 40 percent area savings and less overhead, according to the study. The main difference between MoDe-X and RoCo is the internal architecture makeup of various logical elements.
Finally, a simulation comparison using various applications of MoDe-X and RoCo routers is presented. The applications used for the analysis of this study are image processing, game physics, matrix multiplication (sparse and dense), and so on. It is interesting to note that the cache hit rate varies for these applications from 100 percent to 27.80 percent. Does cache hit rate influence the power consumption? That is not addressed by the authors in this paper. Is there a direct relationship between cache hit and the power consumption of the architecture? That could have been discussed in order to highlight the overall power reduction.
This research does not present a totally new type of router for a multicore architecture; rather, it describes a slight variation in the logical elements of the hardware architecture of the router involved in multicore architecture, which leads to a reduction in the total consumption of hardware. As a result, less power is needed. The authors note that, while evaluating power optimization, they dynamically turn off inactive blocks. It would have been beneficial for the readers if they had elaborated on the mechanisms they used to do this. How does turning off inactive blocks influence power savings in multicore architectures? It also would be interesting to see the outcome of this approach in a real (versus simulated) multicore environment.