As the number of artificial intelligence (AI)-based systems that support key aspects of our life, including health and security, increases exponentially, the demand for explainable AI has reached a peak in recent years. In the field of computer vision, image representations are at the heart of most automatic recognition and detection systems. However, their theoretical understanding is still superficial and their development largely empirical. Consequently, our ability to explain image-based decision systems is still limited.
This paper makes a contribution in the area of explainable image representations by investigating, formally and through numerical simulations, their equivariance and equivalence properties. The former refers to the robustness of the output to transformations of the input image. The latter refers to the similarity of representations captured by different models on the same data. While most recent works focus on how to achieve these properties, this paper instead aims at characterizing and quantifying them in a systematic manner, considering both handcrafted and learned representations.
A must-read for anyone working on image representation learning and for practitioners interested in accelerating structured regression classifiers. It is also recommended reading for master’s and PhD students who are approaching the study of convolutional neural networks.