This paper describes a proposal for implementing “overlapped tiling in a general-purpose polyhedral compilation framework,” namely the polyhedral parallel code generator (PPCG) compiler. The technique improves the performance of state-of-the-art solutions by allowing tighter overlapped tile shapes. It also minimizes the memory footprint of overlapped tiles.
After the general polyhedral compilation framework performs standard rectangular/parallelogram tiling, this technique expands the bounding faces of tiles by considering the constraints caused by inter-tile dependencies. These dependencies are then expressed by specific nodes in the schedule tree defining the execution order of the program. A minor adaptation of a classical code generation algorithm allows the generation of code for both central processing units (CPUs) and graphics processing units (GPUs).
The technique has been designed mainly for image processing pipelines, but it has also been tested on iterated stencils. The composition of overlapped tiling and additional transformations allows for the optimization of a wide range of applications, “including iterated stencil codes with more than two spatial dimensions and stencils of multiple statements.”
An experimental evaluation of the implementation of the technique shows performance improvements in the PolyMage image processing benchmarks and three representative time-iterated stencils.
The paper is very technical and written for researchers in the field. Provided the reader has familiarity with the domain, the paper is clear, well structured, and detailed.