Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A study of pointer-chasing performance on shared-memory processor-FPGA systems
Weisz G., Melber J., Wang Y., Fleming K., Nurvitadhi E., Hoe J.  FPGA 2016 (Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, Feb 21-23, 2016)264-273.2016.Type:Proceedings
Date Reviewed: Jun 7 2017

Several vendors including Intel and IBM have announced devices that closely integrate processors and field programmable gate arrays (FPGAs) using low-latency shared memory interfaces. This paper examines the potential of such devices for accelerating applications that exhibit irregular memory access patterns. Irregular memory accesses, which are pervasive in data analytics and machine learning applications, manipulate in-memory data structures such as linked lists, trees, and graphs using pointer operations. Pointer operations have strict memory-access latency requirements, and therefore applications that rely on them are poorly matched for traditional FPGA add-on cards. This work evaluates tightly coupled processor-FPGA systems for pointer traversal operations. It proposes optimizations that improve application performance by simultaneously utilizing the processor and the FPGA fabric.

In order to model applications with irregular memory accesses, the authors first consider the case of a simple linked list traversal. They construct an analytical model to determine the average traversal time per node. They explore several implementations for the pointer traversal engine: (1) traversal and data operations are implemented entirely in software or entirely in the FPGA fabric, (2) pointer traversals are performed in the processor while data for computation is streamed into the FPGA fabric, and (3) pointer traversals are performed in processor while data for computation is fetched by the FPGA fabric. These strategies are evaluated on three shared-memory processor-FPGA systems: Xilinx ZynQ, Intel QuickAssist QPI FPGA platform, and Convey HC-2.

Experiments report the average traversal time per node for varying linked list parameters such as the number of nodes, the payload size, and the degree of concurrency during traversal. The paper identifies memory access latency as the dominant factor determining application performance. Experiments show that concurrent traversals and processor assistance greatly improve traversal performance. The paper doesn’t use realistic examples to convince the reader why data must be processed in the FPGA fabric. The authors have left out the time to process the data from their analytical model and experiments.

This work is certainly interesting because it uses a relatively simple model (linked lists) to understand and optimize the behavior of applications with irregular memory accesses on coherent FPGA processor systems. It is well written and easy to understand. It will help practitioners in high-performance computing gain useful insights on how to optimize workloads on coherent processor FPGA systems.

Reviewer:  Deepak Unnikrishnan Review #: CR145333 (1708-0536)
Bookmark and Share
  Reviewer Selected
 
 
Performance Analysis And Design Aids (B.3.3 )
 
 
Single Data Stream Architectures (C.1.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Performance Analysis And Design Aids": Date
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors
Blelloch G., Gibbons P., Matias Y., Zagha M. IEEE Transactions on Parallel and Distributed Systems 8(9): 943-958, 1997. Type: Article
Jun 1 1998
Architecting phase change memory as a scalable DRAM alternative
Lee B., Ipek E., Mutlu O., Burger D. ACM SIGARCH Computer Architecture News 37(3): 2-13, 2009. Type: Article
Oct 28 2009
Flash as cache extension for online transactional workloads
Kang W., Lee S., Moon B. The VLDB Journal: The International Journal on Very Large Data Bases 25(5): 673-694, 2016. Type: Article
Dec 20 2016
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy