This paper presents algorithms for parallel implementation of multiplication by an iterative array of cells. Two of the algorithms have n-cell delays for n by n multiplication, compared to the 2n delay for the standard shift and add multiplier. Column compression techniques have been used to improve the speed of multipliers. For instance, a radix 4 multiplier is presented that gives an increase in speed by a factor of two; this, however, is 16 times as complex. The measure of complexity used is based on the number of ROM bits necessary to implement the function. The radix 4 multiplier presented illustrates column compression and the ROM bit measure of complexity.
The first of the new algorithms presented uses a triangular array of 5-counters. Each of the 5-counter cells takes a pair of bit products and three partial sums and produces three outputs: the count or sum of the inputs in binary. It is shown that the 5-counter multiplier has one-half the delay of the standard multiplier and is only three times as complex using the ROM bit count. The interconnections necessary are also not a severe problem as only one more connection is required per cell.
A second iterative multiplication algorithm using “generalized (6, 3, 4) counters” is presented. It is argued that this implementation has n cell delay (twice as fast as the standard multiplier) and is asymptotically no more complex. This asymptotic equivalence is only of theoretical interest as it is necessary to have a 50-bit multiplier before the complexity of the (6, 3, 4) multiplier is better than the 5-counter multiplier.
This paper is interesting and fairly well written. Readers interested in parallel implementations of arithmetic will find it quite useful.