Data science has existed for 40 years, but is increasingly receiving more attention since large amounts of structured and unstructured data (big data) are collected every day within industry, the healthcare sector, the environment, and so on. Every day, we are producing data through our behaviors, for example, using our credit cards at the store, visiting a public garden and being recorded by a camera, and driving. Modern information and communications technology (ICT) systems are able to collect, store, and process the data we produce. Big data includes large amounts of structured and unstructured raw data produced by various sources and collected by modern ICT-based systems, including the Internet of Things (IoT) and systems where different groups and classes of objects with networking capabilities like sensors and actuators collect and exchange data. The collected data contain certain information that needs to be analyzed to extract knowledge. This activity is called data science.
Extracting knowledge from a huge amount of collected raw data requires one to follow data analytics methodologies and is arduous. However, ICT systems can be used to perform such knowledge extraction. Various data analytics tools are available, from commercial products like SAS, SPSS, and Statistica, to free products like R. SAS and SPSS are the most widely used tools for data analytics. The use of R for extracting knowledge from data is increasing.
A brief comparison of the previously cited tools shows that R presents several advantages and limitations regarding memory management, programming efforts, and learning. Learning R is harder than SAS, for example. However, R is open source and features several functions that are not available in SAS.
This book presents how to proceed with data science, using R to perform data manipulation, extraction, and analysis. Overall, the book is well written and easy to read, and guides the reader through the use of R. The author judiciously presents many use cases, examples, and exercises at the end of each chapter.
The book contains 14 chapters, introducing R and covering R programming, testing, and optimization.
As the title states, the author really considers the reader as a beginner in R with a strong background in data science. Therefore, at the beginning of the book, the author guides the reader through the installation process and basic notions of R. I appreciate how the book is structured. The initial chapters are less complex, and the complexity level increases as the reader progresses through the material. The end-of-chapter exercises help readers test their knowledge. This self-evaluation is a good strategy to help the reader really understand the material before moving on. Unfortunately, the solutions to the exercises are not included in the book.
Given that SAS, SPSS, and other data science tools are extremely expensive, data scientists as well as researchers with a solid background in programming who are working on projects with small budgets can benefit from this book. The structure and the writing style are major strengths; researchers will quickly learn how to perform data analytics in R.
This book is an excellent teaching and learning tool for people who would like to quickly and easily learn R programming and data analysis. Therefore, it can be recommended for data science students and lecturers, as well as for researchers.
I recommend this book to anyone who plans to learn R. Previous object-oriented programming experience and a mathematical background are the minimum prerequisites.
More reviews about this item: Amazon