Face recognition of individuals from several million images is useful for solving crimes. How should hundreds of millions of untagged face images be grouped into an unspecified number of identities? Otto et al. present an effective algorithm for partitioning millions of images into significantly reduced clusters.
The authors succinctly summarize the strengths and weakness of the leading-edge spectral, hierarchical, partial, k-means, rank-order, and approximate rank-order clustering algorithms. These clustering methods have used features such as component-based, deep, gradient, and pixel intensity to group millions of face images into significantly reduced numbers of subject identities. Clearly, reasonably effective clustering algorithms of very large image datasets may still generate too many groups of face images for manual follow-up investigations. Moreover, the polynomial runtimes used by reputable algorithms to cluster images is unacceptable.
The authors propose an algorithm for identifying top nearby related images by computing the approximate rank-order among pairs of top neighboring image samples. They evaluate the performance of large-scale clustering by combining the reputable Labeled Faces in the Wild (LFW) dataset with nearly 123 million unlabeled images and an augmented face image dataset. The unique contributions of this paper include a scalable approximate nearest neighbor algorithm to enhance clustering accuracy, including images from videos; reliable experimental results from well-known deep networks of large-scale supervised face recognition; and a reliable measure for assessing the quality of subsets of images from clusters. I call on artificial intelligence experts to offer insights into ways of automatically selecting superior clusters in huge unlabeled image datasets.