Application-specific very long instruction word (VLIW) instruction processors are developed to reduce area and energy consumption. This interesting paper proposes to convert some of the most frequent opcodes to short opcodes, using additional decoding logic and mapping tables in the front end of the VLIW functional units. This way, a VLIW instruction will require fewer bits to keep the same number of instructions. This is because the original long opcodes are replaced by other short opcodes. The idea is to reduce cache area and pipeline path, without wasting performance. It is unnecessary to change the VLIW instruction set.
The paper does not adequately explain, discuss, or evaluate the additional effort involved, and some questions are not solved: What are the new real and effective requirements for the compilers? If it is necessary to recompile applications, why not make a new instruction set containing just short opcodes? (In this case, the additional hardware would be unnecessary.) What is the cost, in terms of time or number of cycles, for the additional hardware? Is it convenient to develop a VLIW processor for situations when any instructions can be issued to any functional units? (Nowadays, the use of generic functional units is not appropriate for those who want to save energy.)