Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Tiled QR decomposition and its optimization on CPU and GPU computing system
Kim D., Park K.  ICPP 2013 (Proceedings of the 2013 42nd International Conference on Parallel Processing, Oct 1-4, 2013)744-753.2013.Type:Proceedings
Date Reviewed: May 20 2014

Single-node heterogeneous computing systems comprised of multicore central processing units (CPUs) and accelerators such as graphics processing units (GPUs) are becoming the norm in high-performance computing (HPC) environments. Each computing device has strengths and weaknesses for a given application, and identifying the computations that should be performed on each computing device is an area of open research. Kim and Park present an algorithm that automatically distributes data and computation to an optimized number of devices in a single heterogeneous system. The application they target is tiled QR decomposition.

There are three primary contributions in their work. First, they break the QR decomposition of a single tile into multiple tasks, and distribute small or serial tasks to the CPU and large or highly parallel tasks to one or more GPUs. Second, they automatically optimize the number of devices utilized based on the properties of the devices in the system. Finally, a distribution guide array is used to map which tiles are being operated on by each device in use. The authors evaluate their algorithm on matrices of up to 4000 elements on a side with randomly generated values.

This paper is not for those looking for a new parallel algorithm for QR decomposition, as Kim and Park base their implementation on the Householder reflections method. What they really present is a load-balancing algorithm integrated into tiled QR decomposition. Some of the optimal computing configurations are not intuitive, which lends value to their automated technique. While not always the easiest paper to read, Kim and Park’s work should be of value to those trying to optimize a parallel algorithm across a number of heterogeneous computing devices in a single system.

Reviewer:  Chris Lupo Review #: CR142298 (1408-0645)
Bookmark and Share
 
Parallel Processors (C.1.2 ... )
 
 
Distributed Architectures (C.1.4 ... )
 
 
Graphics Processors (I.3.1 ... )
 
 
Parallel Algorithms (G.1.0 ... )
 
 
Parallel Processing (I.3.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Parallel Processors": Date
Spending your free time
Gelernter D. (ed), Philbin J. BYTE 15(5): 213-ff, 1990. Type: Article
Apr 1 1992
Higher speed transputer communication using shared memory
Boianov L., Knowles A. Microprocessors & Microsystems 15(2): 67-72, 1991. Type: Article
Jun 1 1992
On stability and performance of parallel processing systems
Bambos N., Walrand J. (ed) Journal of the ACM 38(2): 429-452, 1991. Type: Article
Sep 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy