Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A Hoare logic for GPU kernels
Kojima K., Igarashi A. ACM Transactions on Computational Logic18 (1):1-43,2017.Type:Article
Date Reviewed: Aug 8 2018

Graphics processing units (GPUs) are hardware accelerators, originally designed to facilitate the fast rendering of images. Later, they became popular for computing in performance-sensitive areas because their highly parallel structure is suitable for the simultaneous processing of big blocks of data. GPU kernels are parallel programs that run on GPUs. They serve as a template, describing the behavior on an arbitrary thread. Writing kernels is a difficult task because it requires planning, organization, and coordination of a large number of threads. This is an error-prone process; hence, kernel verification is important.

The authors follow a formal deduction approach to the problem of verification, which aims at proving that programs (in this case, GPU kernels) are correct. The considered model of execution is single-instruction, multiple-thread (SIMT). In this model, a parallel program runs on a GPU, whose threads are grouped into so-called warps. Each warp contains a fixed number of threads, which “execute the same instruction simultaneously.” This means that in SIMT models, “the execution of threads [inside] a ... warp never interleaves.” These are called lockstep executions. Conditional branching may create a situation in which some statements are executed only by some threads, for which the condition holds. These are called active threads and are represented by a mask: a set that indicates which threads are active.

The authors’ formalization uses a simplified model: all threads execute the same instruction simultaneously. This is called complete lockstep execution. As a formalism, a modified version of Hoare logic is proposed. The modification accommodates masks. The standard Hoare triple, precondition-program-postcondition, is extended by a mask and becomes a quadruple, precondition-mask-program-postcondition, which means: “if an initial state satisfies [the precondition], and the program is executed with a mask ... then after termination the state satisfies [the postcondition].”

Yet another simplified assumption concerns programs with loops. The authors consider a special class of such programs in which loops satisfy the monotonicity property, which requires that a thread should never “invalidate the loop ... condition of another thread through shared memory.” However, this is not a critical restriction because it is possible to transform a loop into a monotonic one without changing the operational semantics considered in the paper.

The main part of the paper formalizes complete lockstep execution; defines the operational semantics; introduces the assertion language and inference system for the extension of Hoare logic; and proves the soundness and relative completeness theorems. Soundness states that for programs with monotonic loops, if a Hoare quadruple is derivable in the inference system of the extension of Hoare logic, then the quadruple is valid. Relative completeness states, again for programs with monotonic loops, if a Hoare quadruple is valid then it is derivable.

The assumption about complete lockstep execution does not exactly correspond to GPU kernels in practice. It underapproximates the way in which actual SIMT execution works, by assuming that there is only one warp in which all threads are executing an instruction. The authors also go in the other direction and introduce an overapproximation, called interleaving semantics, which basically assumes that every warp consists of a single thread; in this way, it models interleaved execution of threads. It turns out that there is a special kind of GPU kernel, so-called race-free ones, for which the choice of execution semantics does not matter: for such programs, the complete lockstep and interleaving semantics are equivalent. This means that the proposed extension of Hoare logic can be used to reason about real-world GPU kernels, if they are race free. To appreciate the actual applicability of the latter result, one should know how common race-free GPU kernels are. According to the authors, “many GPU kernels are intended to be race free.” Note that the proposed framework does not prove race freedom, just assumes it.

This is a pretty long and technical paper, but informal discussions, especially in the introduction, help the reader to quickly get the main idea.

Reviewer:  Temur Kutsia Review #: CR146192 (1811-0580)
Bookmark and Share
 
Logics Of Programs (F.3.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Logics Of Programs": Date
Weak logic theory
Holden M. Theoretical Computer Science 79(2): 295-321, 1991. Type: Article
Mar 1 1992
Completing the temporal picture
Manna Z., Pnueli A. (ed) Theoretical Computer Science 83(1): 97-130, 1991. Type: Article
Apr 1 1992
Partial correctness: the term-wise approach
Sokolowski S. Science of Computer Programming 4(2): 141-157, 1984. Type: Article
Mar 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy