Data science recently emerged as a hot topic. Mirkin’s book explores the strength of data analysis from both data summarization and knowledge discovery points of view. In addition to quantified summarization, correlation and visualization (graphical summary) are the core issues targeted. Both quantitative and categorical data are considered within an encoder-decoder paradigm involving interesting mathematical insights into the underlying concepts and techniques. The book has five chapters; however, rather than give a chapter-by-chapter description, this review will highlight the book’s salient features.
Two core chapters describe how to summarize categorical data: chapter 5 explains partitioning, separate cluster finding, and divisive clustering; chapter 2 describes several quantitative data summarization techniques, including principal component analysis (PCA) and PageRank. Chapter 4 thoroughly covers k-means clustering partitioning along with a Pythagorean decomposition of the data variation. Issues such as categorical and mixed scale data clustering, similarity and network data, anomalous clusters, and number of clusters are also discussed.
The book includes a lucid discussion of data-driven modeling involving statistical and geometrical concepts and their relation, consensus clustering, modularity clustering, and uniform partitioning.
This second edition covers several ranking issues, including Google PageRank, tied rankings median, semi-average, and one-cluster clustering. The intended audience includes undergraduate-level computer science (CS) students and data science practitioners. On the negative side, I would have loved to see a section on projection pursuit (parallel coordinates, Andrews plots, and so on), which is very much within the scope of the book.
More reviews about this item: Amazon