Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Best of 2016 Recommended by Editor Recommended by Reviewer Recommended by Reader
Search
Statistics for data scientists: an introduction to probability, statistics, and data analysis
Kaptein M., van den Heuvel E., Springer International Publishing, Cham, Switzerland, 2022. 348 pp.  Type: Book (978-3-030105-30-3)
Date Reviewed: Jul 7 2022

One would hope that professionals calling themselves data scientists would have extensive training in both statistical theory and practice. Yet current data analytics curricula, while naturally including at least one general statistics course, often neglect the integration of those two essential components. This can and does sometimes lead to simplistic explorations, analyses, and modeling of complex datasets.

Kaptein and van den Heuvel teach statistics and data science at the Eindhoven University of Technology and at Tilburg University in the Netherlands, and have developed a new course and textbook for data scientists incorporating a more rigorous foundation in probability and statistics than found in many other popular data science texts.

The authors assume some prerequisite coursework in mathematics and programming, and have taught their course to undergraduate students in computer science, economics, and even social sciences. Their text focuses on the use of modern applied statistical methods and includes the extremely important yet often minimal coverage sampling. The book’s extensive examples and exercises use the R language and include numerous datasets for illustrating basic data concepts, sampling and estimation, probability, distributions, multivariate techniques, and Bayesian analysis. The authors’ website for the textbook (http://www.nth-iteration.com/statistics-for-data-scientist/) includes access to the sample datasets, R source code, and recorded whiteboard lectures.

Each chapter begins with a general introduction to the major topic and presents detailed analytical examples using R paired with the relevant theoretical concepts and formulae. There is much mathematical notation used, which might be a challenge for some readers without the prerequisite backgrounds. One important chapter covers multivariate exploration and analysis of datasets and the concepts and measures of dependency and association for different data types. The final chapter on Bayesian statistics presents a readable and comprehensive discussion of that approach to estimation and decision-making, although entire libraries have been written on that topic. The authors nicely summarize and illustrate the differences between Bayesian and frequentist probability methods, yet admit that there is much more to learn about them.

Having taught data analytics at the introductory graduate level, I welcome the authors’ textbook as an essential resource for training well-grounded entry-level data scientists. As stated in the Data Science Association’s Code of Conduct [1], their first requirement is competence:

A data scientist shall provide competent data science professional services to a client. Competent data science professional services requires the knowledge, skill, thoroughness and preparation reasonably necessary for the services.

Training in both the theory and practice of data analytics is a requirement for such competence. The authors’ textbook definitely provides a valuable resource for such training.

Reviewer:  Harry J. Foxwell Review #: CR147468
1) Code of Conduct. Data Science Association. https://www.datascienceassn.org/code-of-conduct.html (accessed 7/6/2022).
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
Probability And Statistics (G.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Probability And Statistics": Date
Learning and decision-making from rank data
Xia L.,  Morgan&Claypool Publishers, San Rafael, CA, 2019. 160 pp. Type: Book (978-1-681734-40-8)
Dec 23 2020
Nonhomogeneous place-dependent Markov chains, unsynchronised AIMD, and optimisation
Wirth F., Stüdli S., Yu J., Corless M., Shorten R.  Journal of the ACM 66(4): 1-37, 2019. Type: Article
Oct 19 2020
 Probability and mathematical statistics: theory, applications, and practice in R
Meyer M.,  SIAM-Society for Industrial and Applied Mathematics, Philadelphia, PA, 2019. 707 pp. Type: Book (978-1-611975-77-2)
Jan 7 2020
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright © 2000-2022 ThinkLoud, Inc.
Terms of Use
| Privacy Policy