The main, and very often computationally overwhelming, characteristic of text data is its extremely high dimensionality, which could prove to be a severe obstacle for any classification algorithm. One of the most frequently used ways to reduce dimensionality is based on the distributional clustering of words, with each word cluster being treated in the sequel as a single feature. Attempts to address this, mostly based on information-theoretic approaches, have already been made.
The paper introduces a new feature selection method, conditional mutual information maximin (CMIM), as a trade-off technique between redundancy and individual discriminative power of the feature, for improving the effectiveness of automatic text categorization. According to the CMIM algorithm, the feature set for each class, C, results from a two-stage process; the first stage determines the most informative feature set, F, from the set of all available features, and the second stage extends F, by adding the optimal features from the point of view of a max-min informational criterion.
Several experimental evaluations of CMIM were performed on the WebKB and Newsgroups data sets, using the Bayes and support vector machine (SVM) classifiers, and the quality of the CMIM feature set was tested against the traditional information gain (IG) feature set, yielding encouraging conclusions. However, in my opinion, a comparative analysis with similar approaches performed by other authors would allow the reader to better place the proposed method in the context of already-known methodologies for text categorization.