Wang and Leeser describe in this paper a straightforward implementation of a QR decomposition (QRD) processor, based on Givens rotations. This specialized processor is a two-dimensional (2D) triangular semi-systolic array.
The authors use their own well-tested and proven floating-point arithmetic units for the design of a “classic” square root cell. They provide some background on the algorithm and architectural details for the computing cells. A timing schedule for a particular implementation and some performance data are also shown.
There are two major omissions in the paper: first, there is no comparison with other types of QRD in terms of precision, complexity, and speed; second, the authors do not even mention how their design could be used for arbitrary-sized matrices, also known as systolic scheduling. In short, the paper makes a better case for the authors’ arithmetic units than for the systolic design that uses them, which is central to the paper.