Anomaly detection is not only used in cybersecurity and defense, but also in stock markets, finance, and business administration, as well as medicine, astronomy, social networks, fraud, and anti-corruption. This book presents the interesting topic of anomaly detection for a very broad audience.
The book is split into two parts, “Principles” and “Algorithms,” but in reality, principles, examples, techniques, and algorithmic descriptions are scattered throughout. The examples are really motivating and extracted from realistic situations in different areas, not only security. The algorithmic descriptions are given at a very high level in pseudocode. I found both the examples and algorithms very useful.
The different principles and techniques are very well organized into five approaches: (1) distance-based detection, with an informal presentation in Section 3 and a more scientific presentation in Section 6; (2) cluster-based detection, presented in Section 4; (3) time series detection, with an informal presentation in Section 5 and a more scientific presentation in Section 9; (4) rank-based detection, presented in Section 7; and (5) ensemble techniques combining various previous detection techniques, presented in Section 8.
The presentation is really useful: for each technique, some motivation is given, including real-life situations, a comprehensible formalization, and pros and cons, which gives readers an idea of how useful the technique will be in practice. Experimental results are given for some of the techniques, whereas performance descriptions are given for some others. Probably the most important contribution of the book is its citations and references for further reading, which may help casual readers better understand each technique and search for extra documentation.
The take-home message is that there is no “winning” technique and finding anomalies in datasets is a domain-specific endeavor. Some techniques may provide false positives corresponding to unidentified legal behaviors or actions, while others may provide false negatives since their capabilities for finding anomalies in concrete domains are bulky or inefficient. The book subversively claims that a data scientist must work closely with domain-specific experts (analysts, physicians, astronomers) to find anomalies.
More reviews about this item: Amazon