A large-scale study on the use of training label cleaning (TLC) to improve text classification is described in this paper. The purpose of TLC is to identify potentially mislabeled instances in a training dataset, and to flag them for closer inspection by human annotators. The underlying premise for doing this is that incorrect annotations can have a significant, adverse impact on the performance of classifiers. TLC is slightly different from active learning, where potentially useful, unlabeled instances are flagged for human annotation.
The paper makes use of several well-known datasets, and examines the impact that incorrect annotations can have on classifier performance. The authors also detail three main techniques for TLC, and evaluate how these can help identify instances of incorrect annotations, resulting in improvements to text classification performance.
This well-written paper was a joy to read. The experiments are extensive and sound. The authors share many useful insights into the importance of annotation integrity, and also present an illuminating discussion of the results they obtained.
Readers who want to find out more about TLC may be slightly disappointed, as the paper does not go into much depth on the actual techniques used. However, TLC is already well covered in existing literature [1,2], so this is not a big problem.
Some parts of the methodology and experiments could have been better structured for a more fluent read (for example, the section on using support vector machines (SVM) to refute doubts about the use of MP-Boost seems a lot like an afterthought), but the paper is worth reading nonetheless for the many observations and insights it contains.