Today, artificial intelligence (AI) benefits many scientific fields, including Earth observation (EO). This is not new--machine learning met remote sensing in the early 70s. Since then, supervised classification has been extensively used to perform land cover and land use mapping, two fundamental tasks in EO. Still, tips and tricks are required to make proper use of these techniques. Remote sensing image classification in R, in less than 200 pages, aims to help the reader do so using R as the underlying language.
R is an open-source software environment for statistical computing and graphics. It is one of the main languages (along with Python) used in data science. However, it has received much less attention in terms of textbooks, especially those focused on a specific domain; Kamusoko’s book is thus the first in EO.
The book provides a hands-on approach to remotely sensed image classification, covering not only classification techniques but also some of the required prior/posterior steps, for example, data preparation, feature extraction, dataset analysis, model tuning, and performance assessment. The book is like a tutorial, with sample code provided throughout the different chapters to offer the reader a practical perspective with R. Additional material for reproducing the presented experiments--that is, code and data--is available online.
The book is organized in five chapters. It starts with a brief introduction to the R environment and the dataset used in the labs--a pair of multispectral images acquired over Harare, Zimbabwe, with the Landsat 5 Thematic Mapper (TM) sensor. The reader can easily consider other geographical areas thanks to the public availability of Landsat EO data. This introductory chapter is followed by four technical ones. Preprocessing aims to map raw data into exploitable values, considering mostly radiometric correction but also spatial reprojection. Transformation consists of deriving some features of the data, either vegetation indices such as the normalized difference vegetation index (NDVI) or texture measurements such as Haralick features.
The chapter on classification is the core of the book (70 pages) and deals with five popular approaches: k-nearest neighbors, decision trees, artificial neural networks, support vector machines, and random forests (RF). The author offers an interesting discussion on dataset assessment prior to classification; he extensively comments on the results he obtained with the different classifiers on a single date image before repeating the whole procedure with RF only in a multi-date scenario. Finally, the last chapter hides the effect of feature selection behind a rather large title (“Improving Image Classification”). To do so, the author first assesses the results obtained with the RF algorithm when the multi-date dataset is enriched with spectral and textural features before comparing with the additional use of one feature selection technique. This last chapter is not particularly convincing since no practical gain is reported.
Most parts of the book include sample R code together with numerical or visual results, along with some personal discussion that would definitely help the reader to transfer to another use case. When appropriate, the author relies on some dedicated R packages, relevantly illustrating the added value of the R software environment. The step-by-step use of R scripts, while valuable, does not come with a systematic explanation of all parameters set in the function calls. The presentation is sometimes repetitive, especially when the author applies the same pipeline with different attributes (for example, methods or bands). The code remains a bit hard to follow for someone not familiar with the R syntax.
Unsurprisingly, covering such a rich topic in 200 pages comes with some limitations. Many questions remain unanswered. For instance, the identification of classification challenges (for example, class imbalance, outliers, feature correlation) is very valuable, but no proper solution is given. Globally speaking, the choices made by the author in terms of methods or parameters are not justified, while discussing the pros and cons of each method would have been very valuable for readers. Some vocabulary approximations will probably bother computer scientists and machine learning specialists. Furthermore, and this is not related to the length of the book, one can wonder why a book published in 2019 focuses on some data from 1984, including a blurry reference map scanned from a black-and-white aerial photograph. Positioning the proposed methods with regards to the rise of deep learning is also missing.
This short book remains the first one to address remote sensing image classification in R. Its interest will be limited to those who have both some basic knowledge of R and some background in remote sensing; these readers will find in the book some guidance in applying popular supervised classifiers with their favorite language.