This book discusses the fundamental theory of data science, including its methods, validity, and scope. The author introduces and addresses key concepts of data science as an inductive methodology, in an appropriate order, in order to guide scientists and researchers who may use data science in their work. The author starts the book by defining the term “inductivism” and then arguing against it.
The introductory chapter defines data science as “the scientific practice of extracting knowledge from data.” It includes ten theses on data science, with the first thesis presenting data science as “the application of machine learning methods.” Unfortunately, the author fails to explain to readers--especially those with less experience in data science and machine learning--what the machine learning methods are. The remaining nine theses introduce the key concepts of data science, for example, conventional statistics and causality.
The author further presents data science as an inductive framework. Data science is seen as an inductive approach, that is, it should start with facts and rely on inductive inferences (inductivism). He addresses “the recent emergence of inductivist paradigms” and then presents arguments against it--hypothetico-deductivism is discussed as a broad argument against inductivism. He further discusses “the distinction between theoretical and phenomenological science,” where knowledge in phenomenological science is causal and aims to predict and manipulate. Chapter 3 presents a successful case study of data science as machine learning, including many algorithms (for example, convolutional neural networks) where epistemological questions such as the interpretation of the hierarchy of layers in deep neural networks followed.
The history of variational induction is briefly introduced. The author points out that machine learning relies on variational induction, while “enumerative and eliminative induction may occasionally play a role” in machine learning. He points out the distinctions between these three induction classes, defines them, and provides classic examples for each term.
The book is divided into nine chapters and is well structured. Readers are taken on a journey where they will discover step-by-step methodologies for data-driven research. Judiciously, each key concept of data science is concisely defined, and examples and the when, why, and how to use them are provided. The reader will gain a broad knowledge of the key concepts and evidence by reading chapters 5 to 9.
This book provides good support for computer scientists and data scientists, and I fully recommend it. It provides readers with both an epistemological perspective and a conceptual framework for data analysis.