Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Online estimation of discrete, continuous, and conditional joint densities using classifier chains
Geilke M., Karwath A., Frank E., Kramer S. Data Mining and Knowledge Discovery32 (3):561-603,2018.Type:Article
Date Reviewed: Dec 6 2018

When considering data streams, the entire data stream is not available in one shot and estimates are needed, thus it is difficult to apply traditional data analysis methods. Traditional data mining algorithms operate on the full dataset, while streams contain too much data to be stored in memory. So algorithms have to work on the current example and on the current estimate to update the estimate and to allow subsequent inference steps on the data.

The paper proposes a new family of online density estimators, that is, estimation of densities online (EDO). All of them model joint probability distributions using classifier chains. The chain aims to model the dependencies among features, while each classifier in the chain models the probability of a feature. Ensembles of such classifier chains are also proposed, as well as weighted ensembles. Moreover, the algorithms developed for the case of discrete data are extended to continuous variables and to mixed cases.

An interesting part of the paper addresses inference. To this end, the authors introduce the probabilistic condensed representation of data, which are density estimators together with infrastructures to operate on the densities, for example, drawing instances, incorporating evidence, and so on. Using those representations, stream mining is performed on online density estimates instead of the original data.

The EDO methods are compared with existing online density estimators on both synthetic and real-world datasets (discrete, continuous, and mixed). Performance results and the consequences for inference are discussed. The presented algorithms have been implemented in the MiDEO framework, available from GitHub.

This very complete technical paper includes significant research. It presents formal definitions and demonstrations, as well as empirical evaluations over various datasets. The fact that “EDO performs better or equally well compared to [other methods],” together with the availability of the software, are reasons why researchers in data mining should consider reading it.

Reviewer:  G. Gini Review #: CR146337 (1902-0044)
Bookmark and Share
  Featured Reviewer  
 
Data Mining (H.2.8 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy