Dynamic task scheduling and fault tolerance in multiprocessor systems-on-chip (MPSoCs) are explored in this paper. The paper describes a heterogeneous MPSoC system with a special fault-free node called RTM to manage the other processing nodes in the MPSoC. An application is started using a compile time mapping. The execution traces are for each node, and the communication edge is captured and analyzed to estimate throughput and energy consumption for a mapping with one less node. In case a fault occurs, the mapping that maximizes throughput and minimizes communication energy is selected.
I can see how this approach may work in an MPSoC with homogeneous nodes, but I am unable to understand how this approach will work in a heterogeneous system. The execution information for a node of type T1 is going to be very different from that of node type T2. I am not sure how the authors are going to estimate throughput using the trace belonging to a node of type T1 when it needs to be replaced by a node of type T2. Also, the use of trace in estimating throughput could have been discussed in more detail.
The authors haven’t justified the focus on minimizing the communication energy instead of overall energy. Is communication energy a significant proportion of the overall energy? In my opinion, it is possible to have a configuration that minimizes communication energy, but is not the optimal configuration when system energy is concerned.