Deep learning has become so dominant in machine learning, and artificial intelligence (AI) as a whole, that its intrinsic lack of interpretability is paradigmatic for the whole field of explainable AI (XAI). This is why progress in interpreting, explaining, and visualizing deep learning is of utmost importance. This book is a collection of completely independent chapters, compiled as the result of a Neural Information Processing Systems (NIPS) 2017 workshop, featuring some of the leading experts in XAI for deep learning. The volume is structured into six parts, each with its own preface and chapter summaries, which are very helpful.
The first part sets the scene, with a short introduction to AI and the facets of explanation: the recipient, the content (explanans), and the role. The characteristics of the explanandum and the intentions of the explanator are not considered; however, this is compensated by the distinction between the audience (the recipient) and the beneficiary (the explanator) of the explanation. There are some oddities in this part. The historical perspective is finally covered in chapter 3, where the desiderata for XAI are introduced: the classical tradeoff between fidelity and understandability and three other aspects that include sufficiency (enough to justify decision), low construction overhead (over an AI without explanation), and efficiency (not slowing down the AI system). This chapter also reminds us that many applications in machine learning and AI are not about perception problems using deep learning.
The second part covers methods to interpret AI systems, starting with activation maximization and how to synthesize the inputs that maximize the activation of a particular unit (output or internal). Using the trained networks in a backward direction has evolved to very complex pipelines that generate realistic exemplars. This is also illustrated when plausible images have to be generated from short textual descriptions. This part also covers XAI by design, that is, how we can devise new algorithms that ensure or facilitate explainability. For instance, can we build networks that are invariant to rotation and translation?
The final chapter of this part presents an important insight: the goal of interpretability from black-box models is not necessarily to mimic their behavior, but to extract characteristics of the model (“model attributes”) such as architecture, optimization, process, and so on. This is an important step forward, but there’s still much work to be done to make them more independent of the architecture or even the machine learning technique. More abstract characteristics--behavioral rather than structural--are needed.
The third part focuses on particular decisions through local explanations. Many popular systems in XAI (such as LIME) are based on perturbations of balls around the example to be explained. One chapter explores other kinds of perturbations, such as removing or modifying parts of an image and adapting salience maps and attribution methods. Two chapters develop state-of-the-art attribution methods, such as layer-wise relevance propagation (LRP). Principles such as energy conservation could be adapted to many architectures; but is this always straightforward as new deep learning methods are introduced? This question is partially answered at the end of this part, as we see how LRP is adapted to long short-term memory (LSTM), a very different neural network architecture.
The next part deals with evaluating interpretability and explanation, but focuses on good properties of explanations and pays little attention to whether humans really consider the adequate, trustworthy, and effective explanations. The first interpretability measure, network dissection, is defined as how many unique detectors (units) for concepts there are in the network. Isn’t this more about how well concepts are separated? In any case, the technique gives important insights about the network.
The other two chapters present axiomatic properties, such as conservation (very much like the energy in electricity networks), continuity (to small perturbations), implementation invariance (but still architecture-dependent), invariance to shifts (such as negative images), and other changes that keep functionality. They are more about neural network reliability than general properties about how good an explanation is.
The great diversity of XAI methods is seen in the next part, covering a wide range of applications such as quantum chemistry, drug discovery, and rainfall forecasting. Applications such as neural decoding (representing what a person is seeing from brain scanning) are spectacular and suggest the important connection between XAI and neuroscience. The last single-chapter part discusses some software tools. We expect to see XAI tools included in general machine learning and deep learning libraries in the near future.
Overall, despite the explicit focus on deep learning for images, this is a very valuable collection for those working in any application of deep learning that looks for the key techniques in XAI at the moment. Readers from other areas in AI or new to XAI can get a glimpse of where cutting-edge research is heading.