Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Feature selection with conditional mutual information maximin in text categorization
Wang G., Lochovsky F.  Information and knowledge management (Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, Washington, D.C, USA, Nov 8-13, 2004)342-349.2004.Type:Proceedings
Date Reviewed: Jan 26 2005

The main, and very often computationally overwhelming, characteristic of text data is its extremely high dimensionality, which could prove to be a severe obstacle for any classification algorithm. One of the most frequently used ways to reduce dimensionality is based on the distributional clustering of words, with each word cluster being treated in the sequel as a single feature. Attempts to address this, mostly based on information-theoretic approaches, have already been made.

The paper introduces a new feature selection method, conditional mutual information maximin (CMIM), as a trade-off technique between redundancy and individual discriminative power of the feature, for improving the effectiveness of automatic text categorization. According to the CMIM algorithm, the feature set for each class, C, results from a two-stage process; the first stage determines the most informative feature set, F, from the set of all available features, and the second stage extends F, by adding the optimal features from the point of view of a max-min informational criterion.

Several experimental evaluations of CMIM were performed on the WebKB and Newsgroups data sets, using the Bayes and support vector machine (SVM) classifiers, and the quality of the CMIM feature set was tested against the traditional information gain (IG) feature set, yielding encouraging conclusions. However, in my opinion, a comparative analysis with similar approaches performed by other authors would allow the reader to better place the proposed method in the context of already-known methodologies for text categorization.

Reviewer:  L. State Review #: CR130718 (0510-1175)
Bookmark and Share
  Reviewer Selected
 
 
Feature Evaluation And Selection (I.5.2 ... )
 
 
Induction (I.2.6 ... )
 
 
Text Processing (I.5.4 ... )
 
 
Learning (I.2.6 )
 
Would you recommend this review?
yes
no
Other reviews under "Feature Evaluation And Selection": Date
Labeled point pattern matching by Delaunay triangulation and maximal cliques
Ogawa H. Pattern Recognition 19(1): 35-40, 1986. Type: Article
Feb 1 1988
Features selection and ‘possibility theory’
Di Gesù V., Maccarone M. Pattern Recognition 19(1): 63-72, 1986. Type: Article
Dec 1 1987
An analytic-to-holistic approach for face recognition based on a single frontal view
Lam K., Yan H. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(7): 673-686, 1998. Type: Article
Oct 1 1998
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy