This paper demonstrates that new parallel input/output (I/O) systolic designs (based on an extended Stein’s algorithm) outperform Guo and Wang’s parallel I/O designs (based on an extended Euclid’s algorithm) [1] for modular division in GF(2m).
The paper develops three new division algorithms: a basic prototype, based on improving the well-known extended Stein algorithm; a modification of algorithm 1; and a further modification of algorithm 2. The authors develop high-throughput parallel I/O systolic array implementations of algorithm 2 (time complexity O(log log m), area complexity O(m)) and algorithm 3 (time complexity O(1), area complexity O(m)), and an area-efficient parallel I/O systolic array implementation of algorithm 3 (time complexity O(m log log m), area complexity O(m)). All three implementations outperform corresponding Euclid algorithm implementations, mostly by constant factors. For example, the authors’ area-efficient implementations use 22 percent fewer flip-flops, and have a critical path delay 25 percent lower than the best know Euclid implementations. Since the differences are mostly constant factors, the time-complexity and area-complexity orders are unchanged, with one exception: algorithm 3 reduces the Euclid high-throughput time complexity from 0(log log m) to order(1).
The paper also contains a comprehensive survey of division algorithms in GF(2m), and a Web address for detailed technical information related to the paper. The paper serves a rather narrow audience, since the algorithms and systolic designs are specialized for one application.