The design and simulation of a computer consisting of 64k associative processing elements (APEs), each of which contains 32 one-bit arithmetic/associative logic units (A2LUs), are described. For 32-bit floating-point multiplication, a simulated rate of 3671 MFLOPS is reported, giving 57 kFLOPS per APE. By contrast, the massively parallel processor (MPP) reported 291 MFLOPS for 32-bit floating-point multiplication on a machine with 16k one-bit processing elements. The integration of 32 A2LUs into one APE is described in connection with associative and arithmetic operations. Carry propagation across the 32 A2LUs is one bit at a time, implying carry-save techniques for fast multiplication. As in hardware implementations of carry-save multipliers, this results in an operation time that is O ( m ) in the number m of bits in a number rather than O ( m 2 ).
Section 3 describes the primitive operation set of the A2LU, and subsequent sections exhibit algorithms for addition, subtraction, multiplication, matrix multiplication, and fast Fourier transform using this set. Associative operations are described only briefly because the emphasis of the paper is on adapting the “classical model” of an associative processor to the support of high-speed arithmetic.
The material is well organized and the writing good, making the paper suitable as an introduction for those who have not considered arithmetic operations on machines consisting of large arrays of one-bit ALUs. Those who have dealt with machines such as the STARAN, DAP, MPP, or Connection Machine at the one-bit processor level will find the material familiar.