Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Improving lazy decision tree for imbalanced classification by using skew-insensitive criteria
Su C., Cao J. Applied Intelligence49 (3):1127-1145,2019.Type:Article
Date Reviewed: Jul 18 2019

Decision trees are powerful graphic tools representing decisions and their related outcomes as branches of a tree. They enable people to see both the overall picture and the local details at the same time. First developed in the 60s in the field of operation research, they are now widely used in many fields, including machine learning. These particular decision trees are first trained to build paths by choosing outcomes based on selected inputs, and then left on their own to ingest large quantities of data and classify them according to the paths established during the first phase. They find several applications in the real word, such as in fraud detection, network configuration, medical diagnosis, and elsewhere.

Lazy decision trees are decision trees for machine learning; they usually find shorter paths in less time than their traditional counterparts. Unfortunately, they are prone to skew-sensitive, for example, in cases of unbalanced data, to lean toward the most represented ones; this is especially true in situations where there are many samples, many classes, and the number of samples varies wildly from class to class. To mitigate this problem, the common solution is to take less samples of highly occurring data (undersampling) and to take more samples of rarely occurring ones (oversampling). This work presents trees that use precisely this technique.

The paper first puts the authors’ work in perspective, presenting previous and ongoing studies on decision trees and the problems they raise and try to solve. It then builds the lazy decision trees themselves, using two skew-insensitive split criteria, the Hellinger distance and the Kullback–Leibler (K-L) divergence, which are long established methods used to quantify similarities between probability distributions. The authors build two lazy decision trees for each test situation, using one criterion for the first tree and the other for the second one. In both trees, they preprocess data using the synthetic minority oversampling technique (SMOTE) to increase data diversity by generating pseudo samples.

In the next section, both trees are tested on large datasets. The setup, methods used, and evaluation criteria are common for each test case; the actual experimental results are, of course, different. The final part shows detailed findings, which vary from case to case, but the authors’ overall conclusion is that their lazy trees outperform traditional ones, albeit at the cost of more computational complexity.

Reviewer:  Andrea Paramithiotti Review #: CR146624 (1910-0369)
Bookmark and Share
  Featured Reviewer  
 
Decision Tables (D.2.2 ... )
 
 
Classifier Design And Evaluation (I.5.2 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Decision Tables": Date
A comparison of the decision table and tree
Subramanian G., Nosek J., Raghunathan S., Kanitkar S. Communications of the ACM 35(1): 89-94, 1992. Type: Article
Oct 1 1993
Spectral interpretation of decision diagrams
Stankovic R., Astola J., Springer-Verlag New York, Inc., Secaucus, NJ, 2003.  304, Type: Book (9780387955452)
Nov 6 2003
Restructuring decision tables for elucidation of knowledge
Hewett R., Leuchner J. Data & Knowledge Engineering 46(3): 271-290, 2003. Type: Article
Jan 5 2004
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy