A wave-pipelined 8-bit multiplier is presented. Its novelty lies in the use of a nonstatic variant of CMOS technology, NPCPL. The advantage of NPCPL over other technologies suitable for wave-pipelining, such as ECL and CML, is that it uses less area and power. (However, NPCPL dissipates more static power than static CMOS, which is not suitable for wave-pipelining.)
The advantage of wave-pipelining over regular (synchronous) pipelining is speed. Wave-pipelining is faster because it uses no register between pipeline stages, so its speed is not limited by the global clock. Data items in different pipeline stages are prevented from overtaking each other by making sure that all delay paths within each logic block are the same, which is, as the authors admit, very difficult.
Because of this design difficulty, I question the practicality of wave-pipelining (given the currently available techniques to equalize all delays within a logic block). There is a competing technique for fast pipelining that is also independent of the global clock: self-timed pipelines. In a self-timed pipeline, data items are prevented from overtaking each other by means of handshaking signals.
The authors claim that self-timed systems are “plagued by the extremely high overhead of the handshaking circuitry….” In fact, self-timed circuits are practical enough that commercially available self-timed circuits far larger than an 8-bit multiplier are available. For example, the SPARC64 processor contains a self-timed double-precision floating-point divider [1].