Computing Reviews

Explaining mixture models through semantic pattern mining and banded matrix visualization
Adhikari P., Vavpetić A., Kralj J., Lavrać N., Hollmén J. Machine Learning105(1):3-39,2016.Type:Article
Date Reviewed: 01/03/17

Data analysis is concerned with making data comprehensible and amenable to interpretation by domain specialists. The central contribution of this paper is a three-part approach to data analysis.

First, the data is clustered using mixture models that combine different probability distributions. Mixture models are particularly suitable for analyzing heterogeneous data, including the DNA copy number amplification data used to identify chromosomal regions implicated in the development of various cancers that originally motivated this work. The clustered data is then mined for semantic patterns. Background knowledge in the form of ontologies is used in this process, which results in names that can be used by domain specialists to explain the results of clustering. Finally, the rules are used to create banded matrices that expose the structure of the data in a visually accessible form. The approach was applied to several public datasets (NY Daily, Tweets, and Cities) as well as to DNA copy number amplification data.

The results indicated that the method is highly versatile and provides an effective way to summarize and present large amounts of data in a form that is likely to lead to useful insight. Of particular interest is the use of visualization to help explain the clusters.

As well as presenting the central data analysis methodology, the paper includes a useful review of related literature, addressing mixture models, multi-resolution data analysis, semantic pattern mining, and data visualization using banded matrices.

The work as a whole is likely to be of interest to anyone with an interest in mining large datasets for useful information, or in visualizing the inner structure of large datasets.

Reviewer:  Edel Sherratt Review #: CR144984 (1703-0190)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy