Computing Reviews, the leading online review service for computing literature.

Search

Accessor variety criteria for Chinese word extraction
Feng H., Chen K., Deng X., Zheng W. Computational Linguistics30 (1):75-93,2004.Type:Article

Date Reviewed: Jan 7 2005

The extraction of unknown words--such as compound nouns, and the names of people and organizations--is crucial for success in broad areas of language processing in Chinese, including machine translation and information retrieval. Generally, statistical methods and rule-based methods dominate studies of unknown word extraction in Chinese. Statistical methods extract unknown words by exploring the associative relationships of the characters in a string, while rule-based methods investigate grammatical rules for unknown-word construction. The main contribution of this paper is the proposal of what the authors call “accessor variety” for recognizing unknown words. The paper assumes that any meaningful and widely used Chinese character strings can be regarded as words. This kind of string is likely to occur in many different language environments, which are flagged by the predecessors or the successors of the strings, hence, high “accessor variety.” In addition, a set of rules called adhesive-judge rules is employed to filter out spurious extracted words whose accessors are not true words, but rather “adhesive characters,” such as tense markers. As the paper claims, the accessor variety-based method distinguishes itself from previous work by its simplicity, while maintaining comparable performance. It also does not rely on the resolution of Chinese word segmentation, which is another problematic topic. It would be better, however, if the paper had explained a seeming inconsistency of the partial recall in Tables 7, 8, and 9. Those who are interested in unknown word extraction in Chinese will find that this paper brings a new perspective on this topic.

Reviewer: Graeme Hirst	Review #: CR130623 (0506-0714)

Language Models (I.2.7 ... )

Language Acquisition (I.2.6 ... )

Text Analysis (I.2.7 ... )

Learning (I.2.6 )

Natural Language Processing (I.2.7 )

Would you recommend this review?

yes

Other reviews under "Language Models":	Date

A framework for investigating language-mediated interaction with machines Zoeppritz M. International Journal of Man-Machine Studies 25(3): 295-315, 1986. Type: Article	Oct 1 1987

Prolog and natural-language analysis Pereira F., Shieber S., CSLI/Stanford, Stanford, CA, 1987. Type: Book (9789780937073186)	Jun 1 1988

Competence and performance in the design of natural language systems Bara B., Guida G., Elsevier North-Holland, Inc., New York, NY, 1984. Type: Book (9789780444875983)	Dec 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy