Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
FPGA-based hardware acceleration of lithographic aerial image simulation
Cong J., Zou Y. ACM Transactions on Reconfigurable Technology and Systems2 (3):1-29,2009.Type:Article
Date Reviewed: Mar 16 2010

In high-performance reconfigurable computing (HPRC), a reconfigurable device is used to accelerate some parts of a computing-intensive application. HPRC is an emerging field, and its importance can be seen in the number of companies that have launched a system that has a field-programmable gate array (FPGA) connected to its computational nodes, such as SGI’s reconfigurable application-specific computing (RASC), Nallatech’s front-side bus (FSB) modules, and XtremeData’s XD1000 system--they all have a similar architecture, and the system’s microprocessor uses a high-speed channel to connect to the FPGA. The authors use an Opteron processor that connects to Altera’s Stratix 2 FPGA via hypertransport links.

The paper presents an optical lithography simulation algorithm that accelerates using reconfigurable hardware. “Optical lithography is the technology used for printing circuit patterns onto wafers. As the technology scales down and the feature size is even smaller than the wavelength of the light employed, significant light interference and diffraction may occur during the imaging process.” Therefore, it is necessary to simulate the imaging process prior to manufacturing, in order to ensure its correctness.

The method used to resolve the problem is based on decomposing the “system into many coherent systems with decreasing importance.” As the authors explain, “the image corresponding to each coherent system can be obtained via numerical image convolution, and the final image is the weighted sum of the image of each coherent system.”

In the frequency domain, the convolution is done by applying fast Fourier transforms to the data. Since the layout of the very large-scale integration (VLSI) circuits is only composed of rectangles, the convolution values are precomputed and stored. Although this method is accurate enough to solve the problem, it is computationally demanding. The authors present a new hardware architecture to solve the problem, and then compare it with other existing architectures. Using C, they explore the problem and propose an optimized architecture. Next, a synthesis tool--AutoPilot--generates the final hardware implementation. The algorithm kernel is a loop that can be rearranged to exploit its intrinsic parallelism. The authors analyze the results from this rearranged loop to decide a hardware/software partition and a communication pattern for the system.

The paper mainly discusses how to parallelize the hardware implementation and partition the memory, based on the data extracted from the high-level C implementation of the system. In Section 4.2, the authors describe how they rewrote the C code to implement specific architectural decisions. The section concludes that there is still a gap between the software C code and the C code suitable for hardware generation.

The paper ends with results from different experiments, and a critique of the Compute Unified Device Architecture (CUDA) version of the algorithm, running on a graphics processing unit (GPU). Unfortunately, the authors fail to explain the scalability advantages of FPGAs over GPUs. The authors conclude that, while using a C tool is both useful and reduces the design time, it is difficult to extract the algorithm’s parallelism and manage the system’s memory mapping. In summary, readers may find ideas in this paper for future research on HPRC machines.

Reviewer:  Javier Castillo Review #: CR137808 (1007-0688)
Bookmark and Share
 
Algorithms Implemented In Hardware (B.7.1 ... )
 
 
Types And Design Styles (B.7.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Algorithms Implemented In Hardware": Date
The performance of multilective VLSI algorithms
Savage J. Journal of Computer and System Sciences 29(2): 243-273, 1984. Type: Article
Dec 1 1985
Proving systolic systems correct
Hennessy M. ACM Transactions on Programming Languages and Systems 8(3): 344-387, 1986. Type: Article
Jul 1 1987
Algorithms for iterative array multiplication
Nakamura S. IEEE Transactions on Computers 35(9): 713-719, 1986. Type: Article
Jul 1 1987
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy