ComputingReviews.com

Reducing the performance gap between soft scalar CPUs and custom hardware with TILT
Tili I., Ovtcharov K., Steffan J. ACM Transactions on Reconfigurable Technology and Systems10(3):1-23,2017.Type:Article

Date Reviewed: 03/09/18

This paper is an extension of a seminal presentation of thread- and instruction-level parallel template architecture (TILT). TILT is a software-programmable custom computing engine that utilizes both thread- and instruction-level parallelism. It can be perceived as a many-threaded central processing unit (CPU) in which threads are independent and operands for functional units (FU) are read from and written to scratchpad memory. Compilers play an important role in configuring the instruction memory as the instructions are scheduled according to the latency of previous operations. TILT should not be considered a framework for high-level synthesis for field-programmable gate arrays (FPGAs), but it makes programming FPGAs easier.

Although this paper is an extension of a previous publication, the authors have done a good job in presenting an overview of the base architecture and compiler methodology. It has enough details to understand the functionality. A thorough analysis of the design approach is presented.

Seven benchmarks with varying design and programming requirements are evaluated and compared against a soft scalar CPU and custom designs. The authors present a detailed design-space exploration that focuses on compute density, function unit utilization, area, and throughput. This analysis reveals the flexibility and scalability as well as limitations of TILT. It shows that custom designs have better throughput by a significant margin at the expense of significantly higher resource utilization as well as time to design.

The design space of an algorithm depends on the functional units and dependencies. For each benchmark, an FU-mix calculation algorithm reduces the number of designs to be considered for simulation.

This framework’s performance is contingent upon the independence of threads. Programs with inter-thread communication may not perform well with this framework. It is not resource efficient for applications that have frequent control flow instructions because, for branch instructions, both taken and not taken codes are executed. Thus, more memory, as well as control logic, is required.

In summary, TILT can be termed a general-purpose yet configurable architecture that achieves decent performance while reducing or overcoming challenges in designing an application on FPGAs.

Reviewer: Krishna Nagar

Review #: CR145906 (1806-0309)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy