This paper describes the design of a two-gigabit-per-second (Gb/s) cryptography chip that implements the American Encryption Standard (AES) algorithm, while supporting all standard operating modes and key lengths. The chip’s operating modes are electronic codebook (ECB), counter (CTR), output feedback (OFB), cipher feedback (CFB), and cipher block chaining (CBC). The key length for this chip is limited to 128 bits.
The basic AES encryption/decryption (E/D) involves several rounds of four basic steps, called SubBytes, ShiftRows, MixColumns, and AddRoundKey. The first two operations are simple to implement in hardware. The last two need efficient implementations. Since a chip will do both encryption and decryption, special attention should be paid to decryption, since the InvMixColumns operation is more complicated. Although parallel implementations and pipelining have been used in earlier designs, the latency that results does not support all modes.
The authors suggest that the critical path of the E/D rounds be reduced to increase throughput. Section 2 addresses this issue, and the issue of key generation, presenting three different ways to generate keys. Section 3 is on register transfer level (RTL) optimization, which is used to reduce the longest path to obtain maximum clock rate, and reduce power consumption. This section describes the reorganization of the round operation, by changing the order of specific operations, merging them or skipping them, if possible. This is done to reduce the delay of the entire operation. The SubBytes function can be realized using look-up tables (LUT) implemented in random access memory (RAM) or random logic. It appears that the RAM option was used.
Section 4 is the main contribution of the paper. It describes a fast and balanced implementation, called FASTCORE, that uses a 128-bit data block, and supports all modes. Two independent key generators are used. The chip is designed to operate fully in parallel, with 128 bits and two independent datapaths, for simultaneous E/D. Sections 4.1 and 4.2 describe how, in the encryption datapath, some operations are exchanged, and some others are moved around. This reduces the E/D critical path by 27 percent, compared to the original, but requires 35 percent more active area for decryption. The design was implemented using a 0.25 nanometer metal complementary metal-oxide semiconductor (CMOS) process, with a core area of 3.56 square millimeters (mm2). All available modes were run at 166 megahertz (MHz), using a 2.5-volt power supply, with FASTCORE consuming 66 milliwatts (mW) of power. The throughput is measured to be 2.12 Gb/s for all modes.
The paper is well written, and sufficient details are given. It is also well illustrated. This work will certainly be useful to designers working in this area.