This paper focuses on improving many-core architectures via software programmable or scratchpad memory (SPM):
An SPM contains an array of [static random-access memory, SRAM] cells. A portion of the memory address space is dedicated to the SPM. Any address that falls within this dedicated address space can directly index into the SPM to access the corresponding data.
Thus, by maintaining a dedicated area, the “coherency among multiple SPMs” at the software level can be eliminated. This use of software-level access to the data “thereby eliminat[es] the hardware area/power required for cache coherence,” as well as cache access. In a many-core architecture environment, data access on many cores can drastically reduce performance due to coherency issues and long delays related to data access from different cores.
In a many-core, multi-threaded architecture, as well as on-chip and off-chip, data accesses can lead to nonuniform, long-latency, and irregular data accesses. To overcome these difficulties in nonuniform data accesses, the paper proposes “a compile-time, coordinated data management framework called CDM, for many-core SPMs.” For this paper, “the 16-core Epiphany SoC consists of an array of simple RISC processors (eCores) programmable in C connected together in a 2D-mesh NOC and supporting a single shared address space.” Because a Xilinx Zynq system on chip (SoC) supports these eCores on the same development board, it is more energy efficient, unlike traditional cache memory. The eCores are not only able to access local memory, but are also capable of accessing remote memory.
Several kernel applications from embedded, multithreaded benchmarks are used in the evaluation, including two benchmarks related to the decryption and encryption of data (AESD and AESE) and three long-term evolution (LTE) benchmarks (PHY_ACI, PHY_DEMAP, and PHY_MICF). The authors use a GREEDY approach as their baseline; SNAP-S allows only one copy of data, and SNAP-M uses a replication mechanism. As a result, “the SNAP-M approach provides an average speed-up of 1.84x and an energy reduction of 1.83x when compared to the GREEDY strategy.” The SNAP-S approach “provides an average speed-up and energy reduction of 1.09x.” Thus, these two approaches effectively speed up as well as reduce the energy usage due to no cache-like memory, which consumes more power when the data is accessed.
The authors take advantage of bringing in off-chip data to the on-chip memory and not using cache-like memory; the use of SoC reduces energy consumption. Currently, a new type of memory is on the rise that can drastically reduce power consumption and is faster than DRAM and cache. When such memory comes into use, this paper will be obsolete. The overhead of bringing in off-chip data to the on-chip memory must also be considered. Besides, the SNAP-S speed-up compared to the GREEDY strategy is not significant; only when the data is replicated is significant improvement observed. One would expect a significant reduction in the SNAP-S strategy, because even the remote memory access data is reduced to the local memory accesses; however, that is not seen in the experimental results.