Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Improving text classification accuracy by training label cleaning
Esuli A., Sebastiani F. ACM Transactions on Information Systems31 (4):1-28,2013.Type:Article
Date Reviewed: Feb 12 2014

A large-scale study on the use of training label cleaning (TLC) to improve text classification is described in this paper. The purpose of TLC is to identify potentially mislabeled instances in a training dataset, and to flag them for closer inspection by human annotators. The underlying premise for doing this is that incorrect annotations can have a significant, adverse impact on the performance of classifiers. TLC is slightly different from active learning, where potentially useful, unlabeled instances are flagged for human annotation.

The paper makes use of several well-known datasets, and examines the impact that incorrect annotations can have on classifier performance. The authors also detail three main techniques for TLC, and evaluate how these can help identify instances of incorrect annotations, resulting in improvements to text classification performance.

This well-written paper was a joy to read. The experiments are extensive and sound. The authors share many useful insights into the importance of annotation integrity, and also present an illuminating discussion of the results they obtained.

Readers who want to find out more about TLC may be slightly disappointed, as the paper does not go into much depth on the actual techniques used. However, TLC is already well covered in existing literature [1,2], so this is not a big problem.

Some parts of the methodology and experiments could have been better structured for a more fluent read (for example, the section on using support vector machines (SVM) to refute doubts about the use of MP-Boost seems a lot like an afterthought), but the paper is worth reading nonetheless for the many observations and insights it contains.

Reviewer:  Jun-Ping Ng Review #: CR141997 (1405-0391)
1) Malik, H.; Bhardwaj, V. Automatic training data cleaning for text classification. In Proc. of ICDMW (Vancouver, BC), ), IEEE, 2011, 442–449.
2) Esuli, A.; Sebastiani, F. Advances in information retrieval theory (LNCS 5766). Springer, , 2009.
Bookmark and Share
  Featured Reviewer  
 
Classifier Design And Evaluation (I.5.2 ... )
 
 
Information Filtering (H.3.3 ... )
 
 
Search Process (H.3.3 ... )
 
 
Text Analysis (I.2.7 ... )
 
 
Information Search And Retrieval (H.3.3 )
 
 
Natural Language Processing (I.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Classifier Design And Evaluation": Date
Linear discrimination with symmetrical models
Bobrowski L. Pattern Recognition 19(1): 101-109, 1986. Type: Article
Feb 1 1988
An application of a graph distance measure to the classification of muscle tissue patterns
Sanfeliu A. (ed), Fu K., Prewitt J. International Journal of Pattern Recognition and Artificial Intelligence 1(1): 17-42, 1987. Type: Article
Dec 1 1989
Selective networks and recognition automata
George N. J., Edelman G.  Computer culture: the scientific, intellectual, and social impact of the computer (, New York,2011984. Type: Proceedings
May 1 1987
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy