Computing Reviews, the leading online review service for computing literature.

Search

Feature selection with conditional mutual information maximin in text categorization
Wang G., Lochovsky F. Information and knowledge management (Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, Washington, D.C, USA, Nov 8-13, 2004)342-349.2004.Type:Proceedings

Date Reviewed: Jan 26 2005

The main, and very often computationally overwhelming, characteristic of text data is its extremely high dimensionality, which could prove to be a severe obstacle for any classification algorithm. One of the most frequently used ways to reduce dimensionality is based on the distributional clustering of words, with each word cluster being treated in the sequel as a single feature. Attempts to address this, mostly based on information-theoretic approaches, have already been made. The paper introduces a new feature selection method, conditional mutual information maximin (CMIM), as a trade-off technique between redundancy and individual discriminative power of the feature, for improving the effectiveness of automatic text categorization. According to the CMIM algorithm, the feature set for each class, C, results from a two-stage process; the first stage determines the most informative feature set, F, from the set of all available features, and the second stage extends F, by adding the optimal features from the point of view of a max-min informational criterion. Several experimental evaluations of CMIM were performed on the WebKB and Newsgroups data sets, using the Bayes and support vector machine (SVM) classifiers, and the quality of the CMIM feature set was tested against the traditional information gain (IG) feature set, yielding encouraging conclusions. However, in my opinion, a comparative analysis with similar approaches performed by other authors would allow the reader to better place the proposed method in the context of already-known methodologies for text categorization.

Reviewer: L. State	Review #: CR130718 (0510-1175)

Feature Evaluation And Selection (I.5.2 ... )

Induction (I.2.6 ... )

Text Processing (I.5.4 ... )

Learning (I.2.6 )

Would you recommend this review?

yes

Other reviews under "Feature Evaluation And Selection":	Date

Labeled point pattern matching by Delaunay triangulation and maximal cliques Ogawa H. Pattern Recognition 19(1): 35-40, 1986. Type: Article	Feb 1 1988

Features selection and ‘possibility theory’ Di Gesù V., Maccarone M. Pattern Recognition 19(1): 63-72, 1986. Type: Article	Dec 1 1987

An analytic-to-holistic approach for face recognition based on a single frontal view Lam K., Yan H. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(7): 673-686, 1998. Type: Article	Oct 1 1998

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy