ComputingReviews.com

An autotuning protocol to rapidly build autotuners
Liu J., Tan G., Luo Y., Li J., Mo Z., Sun N. ACM Transactions on Parallel Computing5(2):1-25,2018.Type:Article

Date Reviewed: 03/07/19

While autotuning has become a valuable tool for the high-performance computing (HPC) community to achieve “performance portability,” that is, the program runs on the new architecture correctly and with the expected performance, it is not easy. Currently, each auototuner seems to be built from scratch. The authors have a track record in autotuning, and have leveraged this to propose an autotuning framework, including a knowledge base and provision for a learner (they show a decision tree, but state that others are feasible).

The paper is hard to read, as a mixture of conventions are used for performance improvement, for example, “improves performance by 1.4-6.2 times over Baseline, 64%-120% over SDSL.” Is 1.4 times the same as 40 percent? Presumably.

They have two example domains: partial differential equation (PDE) stencils and sparse matrices. Data is quoted both for central processing unit (CPU) and graphics processing unit (GPU) applications, which shows the flexibility of the system. In the case of PDE stencils, it isn’t clear to me what is actually being tuned here and previous papers had to be studied.

The improvements (against serious baselines such as Intel Math Kernel Library (MKL), not toys) on some of the sparse matrix computations (figure 18) are quite impressive, though it isn’t clear exactly what is being tuned here--it seems to come from the authors’ prior work. This domain uses machine learning: 2055 matrices are used for training.

There is no description of where to get the code from, so this is another piece of irreproducible research spoiling the field for others.

Reviewer: J. H. Davenport

Review #: CR146461 (1905-0174)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy