Hardware that quickly computes A × B mod M is described. The basic algorithm that this hardware uses for such computation is the one presented first by P. L. Montgomery [1] and further developed by others. This computation finds application in the RSA public key encryption algorithm [2].
The computational method described in the paper is estimated to be approximately twice as fast as previously existing methods. The new circuit requires as many clock pulses as does the best existing type of circuit, but the clock pulses in the new circuit can be twice as fast because of reduced path lengths (measured in terms of the number of gate delays). The timing estimates in this paper are done at the logic level, without any regard to wire lengths. These timing estimates may be misleading because the circuit described in the paper involves “wires that are, on average, at least half the length of an edge of the chip.” The authors think that their timing estimates might still be valid because they equally disregard wire length when they estimate the delay of the older and the newer circuits. Since very long wires can easily cause circuit delay to double, however, it does not appear that the authors have obtained a good timing estimate. All we can conclude at this point is that the new circuit is worth evaluating further because it involves simpler hardware and fewer gate delays.