Computing Reviews, the leading online review service for computing literature.

Search

Data mining with R : learning with case studies
Torgo L., Chapman & Hall/CRC, Boca Raton, FL, 2010. 305 pp. Type: Book (978-1-439810-18-7)

Date Reviewed: Jul 25 2011

Data mining is a powerful approach for knowledge discovery in databases. Many data mining techniques exist, including statistics, neighborhoods and clustering, trees, neural networks, and rules. Programming languages and software packages assist researchers in analyzing large collections of data and visualizing their features. This book considers the use of R to illustrate the most important data mining techniques in four relevant case studies: algae bloom prediction (the second chapter), stock market return prediction (the third chapter), fraudulent transaction detection (the fourth chapter), and microarray sample classification (the fifth chapter). The first chapter covers an introduction to R (installation, objects, classes and methods, vectorization, factors, sequence and subset generation, matrices and arrays, and lists and data frames) and MySQL. In proving the power of R and MySQL in data mining, the author provides implementation details, and makes a clear presentation of the following methods and techniques. Chapter 2 presents data loading from files and basic data analysis and visualization. Reading data from the Web and MySQL databases; time series analysis and rare event prediction; the use of artificial neural networks, support vector machines, and multivariate adaptive regression splines; and financial modeling are covered in chapter 3. Chapter 4 discusses outlier detection and ranking; clustering methods; various classification methods (semi-supervised, naive Bayes, and AdaBoost); and interaction with Weka data mining and machine learning systems. Finally, chapter 5 presents information on handling microarray data; feature selection and filtering (based on distribution properties, analysis of variance between groups (ANOVA), random forests, and clustering); cytogenetic abnormality prediction; and cross-validation experiments. As a general overview, I appreciated the use of a dedicated font during presentation of the R code and experimental results. The book also includes some lists (figures and tables), some indexes (subject terms, data mining topics, and R functions), and a suitable bibliography. The code is available on the book’s Web site, with the authors providing the R package called DMwR. No previous knowledge of R or MySQL is required. If the reader needs a deeper understanding of some data mining methods, he or she can follow the mentioned references. As the author says, the presented case studies “should be taken as examples of possible paths in any data mining project and can be used as the basis for developing solutions for the reader’s own projects.” I recommend this book to students and researchers who are interested in data mining techniques using R and MySQL.

Reviewer: G. Albeanu	Review #: CR139276 (1203-0248)

Data Mining (H.2.8 ... )

Would you recommend this review?

yes

Other reviews under "Data Mining":	Date

Feature selection and effective classifiers Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article	May 1 1999

Rule induction with extension matrices Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article	Jul 1 1998

Predictive data mining Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)	Feb 1 1999

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy