Derived from Sarkar’s thesis, this book provides an introduction to the issues of perceptual grouping and a detailed description of one implementation of a general grouping technique. The breakdown of chapters is typical of a thesis, with an introduction and review of past work followed by the detailed description of the thesis work, and finally by the results and conclusions.
The introductory chapter gives the what and why of perceptual grouping in the context of gestalt psychology--that is, the goal is to find orderly, rule-governed, and nonrandom organizations, based on proximity, connectedness, similarity, common region, good continuation, and symmetry.
The second chapter reviews earlier research in perceptual grouping in computer vision. The discussion is based on the spatial dimension (two-dimensional, three-dimensional, two-dimensional plus time, or three-dimensional plus time) and on the level (signal, primitive features, structures, or assembly), where the grouping proceeds from lower to higher levels (group signal data into primitive features, primitive features into structures, and structures into assemblies). Earlier work is reported in most of these categories, but is missing when time is included at the higher (structural) levels. A table summarizes this breakdown of the research.
After a brief overview of the technique used in the book, the fourth chapter discusses preattentive grouping. This corresponds roughly to human preattentive vision, that is, grouping image points into simple features (such as segments, closed figures, strands, junctions, parallels, and intersections).
Attentive grouping is used to combine the low-level features into more meaningful features. The perceptual inference network (PIN), which is an extension of Bayesian networks (explained in some detail in an appendix), is introduced. The PIN expresses the relationship of lower-level features to higher-level features and encodes the contribution of the features to the higher-level group. Grouping at this level is performed using a hand-generated network and an automatically generated network, which is described in a separate chapter.
After applying these two levels of grouping (preattentive and attentive), many ambiguous descriptions remain. The aim is to generate the best hierarchical description of the image in terms of salient features. Starting from the large set of redundant hypotheses, special-purpose routines resolve the ambiguous descriptions. The book presents results of each of the different levels of organization on several test images.
The book makes good use of tables and figures to explain the algorithm and describe the issues in perceptual organization. Figure references are sequential throughout the book, without a chapter reference, so that it can be hard to find a figure quickly. The bibliography is arranged in the order in which the references appear--not alphabetically--and there is no author index. The book has a small number of typos, of the type that a spelling checker would not detect.
Overall, the book provides an introduction to the area of perceptual grouping in computer vision and gives a detailed description of one approach for grouping at many levels. As the authors state, there is much more work to be done in this area, especially with grouping applied to three-dimensional features, time-varying data, and extraction of real-world object descriptions.