Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Accessor variety criteria for Chinese word extraction
Feng H., Chen K., Deng X., Zheng W. Computational Linguistics30 (1):75-93,2004.Type:Article
Date Reviewed: Jan 7 2005

The extraction of unknown words--such as compound nouns, and the names of people and organizations--is crucial for success in broad areas of language processing in Chinese, including machine translation and information retrieval. Generally, statistical methods and rule-based methods dominate studies of unknown word extraction in Chinese. Statistical methods extract unknown words by exploring the associative relationships of the characters in a string, while rule-based methods investigate grammatical rules for unknown-word construction.

The main contribution of this paper is the proposal of what the authors call “accessor variety” for recognizing unknown words. The paper assumes that any meaningful and widely used Chinese character strings can be regarded as words. This kind of string is likely to occur in many different language environments, which are flagged by the predecessors or the successors of the strings, hence, high “accessor variety.” In addition, a set of rules called adhesive-judge rules is employed to filter out spurious extracted words whose accessors are not true words, but rather “adhesive characters,” such as tense markers.

As the paper claims, the accessor variety-based method distinguishes itself from previous work by its simplicity, while maintaining comparable performance. It also does not rely on the resolution of Chinese word segmentation, which is another problematic topic. It would be better, however, if the paper had explained a seeming inconsistency of the partial recall in Tables 7, 8, and 9.

Those who are interested in unknown word extraction in Chinese will find that this paper brings a new perspective on this topic.

Reviewer:  Graeme Hirst Review #: CR130623 (0506-0714)
Bookmark and Share
 
Language Models (I.2.7 ... )
 
 
Language Acquisition (I.2.6 ... )
 
 
Text Analysis (I.2.7 ... )
 
 
Learning (I.2.6 )
 
 
Natural Language Processing (I.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Language Models": Date
A framework for investigating language-mediated interaction with machines
Zoeppritz M. International Journal of Man-Machine Studies 25(3): 295-315, 1986. Type: Article
Oct 1 1987
Prolog and natural-language analysis
Pereira F., Shieber S., CSLI/Stanford, Stanford, CA, 1987. Type: Book (9789780937073186)
Jun 1 1988
Competence and performance in the design of natural language systems
Bara B., Guida G., Elsevier North-Holland, Inc., New York, NY, 1984. Type: Book (9789780444875983)
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy