Computing Reviews, the leading online review service for computing literature.

Search

Tiled QR decomposition and its optimization on CPU and GPU computing system
Kim D., Park K. ICPP 2013 (Proceedings of the 2013 42nd International Conference on Parallel Processing, Oct 1-4, 2013)744-753.2013.Type:Proceedings

Date Reviewed: May 20 2014

Single-node heterogeneous computing systems comprised of multicore central processing units (CPUs) and accelerators such as graphics processing units (GPUs) are becoming the norm in high-performance computing (HPC) environments. Each computing device has strengths and weaknesses for a given application, and identifying the computations that should be performed on each computing device is an area of open research. Kim and Park present an algorithm that automatically distributes data and computation to an optimized number of devices in a single heterogeneous system. The application they target is tiled QR decomposition. There are three primary contributions in their work. First, they break the QR decomposition of a single tile into multiple tasks, and distribute small or serial tasks to the CPU and large or highly parallel tasks to one or more GPUs. Second, they automatically optimize the number of devices utilized based on the properties of the devices in the system. Finally, a distribution guide array is used to map which tiles are being operated on by each device in use. The authors evaluate their algorithm on matrices of up to 4000 elements on a side with randomly generated values. This paper is not for those looking for a new parallel algorithm for QR decomposition, as Kim and Park base their implementation on the Householder reflections method. What they really present is a load-balancing algorithm integrated into tiled QR decomposition. Some of the optimal computing configurations are not intuitive, which lends value to their automated technique. While not always the easiest paper to read, Kim and Park’s work should be of value to those trying to optimize a parallel algorithm across a number of heterogeneous computing devices in a single system.

Reviewer: Chris Lupo	Review #: CR142298 (1408-0645)

Parallel Processors (C.1.2 ... )

Distributed Architectures (C.1.4 ... )

Graphics Processors (I.3.1 ... )

Parallel Algorithms (G.1.0 ... )

Parallel Processing (I.3.1 ... )

Would you recommend this review?

yes

Other reviews under "Parallel Processors":	Date

Spending your free time Gelernter D. (ed), Philbin J. BYTE 15(5): 213-ff, 1990. Type: Article	Apr 1 1992

Higher speed transputer communication using shared memory Boianov L., Knowles A. Microprocessors & Microsystems 15(2): 67-72, 1991. Type: Article	Jun 1 1992

On stability and performance of parallel processing systems Bambos N., Walrand J. (ed) Journal of the ACM 38(2): 429-452, 1991. Type: Article	Sep 1 1992

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy