This well-written paper is a major contribution to the field of active learning by support vector machines (SVMs). Usually, machine learning methods are passive: machines learn from a set of labeled training data. In active learning, however, the machine sequentially chooses items from a training pool to be labeled. This paper adapts the venerable version space idea to SVMs, and introduces various query functions that, given a current version space, determine the next pool item to be labeled, so as to reduce the version space as fast as possible.
Three rival query functions (simple margin, max-min margin, and ratio margin) are compared with random pool choice and traditional labeled training data in two well-chosen text classification experiments. The thorough investigation of the large Reuters and Newsgroups corpora show, once again, that active learning is better than passive learning. Active learning can further improve the success of SVMs in applications like Web searching, email filtering, and relevance feedback.