Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Automatic indexing of full texts
Jonák Z. Information Processing and Management: an International Journal20 (5-6):619-627,1984.Type:Article
Date Reviewed: Jul 1 1985

Techniques for automatic indexing of textual documents have been used for almost two decades by many commercially available systems (e.g., STAIRS, INQUIRE, MEAD DATA CENTRAL, KWIC). However, these systems are built upon the uniformity of the English language and the frequency of occurrence of specific word forms and word stems. In particular, they rely on the fact that the content of a document is directly related to the frequency of occurrence of key words in the text. Automatic key word extraction, therefore, has become a science unto itself with many algorithms being used to sort out the relevant key words from those that don’t really describe the indexed document.

In the paper, Jonák states that automatic indexing is unsatisfactory for full texts because current methods cannot guarantee required efficiency. Instead of full text indexing, Jonák postulates the use of semantic equivalents to the units of text. A semantic equivalent is a set of elementary units of meaning, called semes, which can represent the units of text. Semes are more than just word stems. For instance, a bank, an agency, a library, and a government building are all instances of the seme “AO” (institution).

Jonák’s experiments with semes included comparing the work of expert indexers against the coding of semes for textual documents. The aim was to provide better results than could be done with automatic indexing. However, the tables used to suport the finding are not consistent with the stated results of the experiment. Although the text states that the semes approach did a good job of matching the results of expert indexers, the tables clearly show little similarity between the results of the two approaches. This could be because the document was originally written in Czechoslovakian, and it lost something in its translation.

In short, the idea is interesting, but the results do not support the conclusions. I would not recommend the paper to someone unless he or she were interested in researching the original in its native language.

Reviewer:  R. J. Tufts Review #: CR109239
Bookmark and Share
 
Indexing Methods (H.3.1 ... )
 
 
Linguistic Processing (H.3.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Indexing Methods": Date
Computation of term/document discrimination values by use of the cover coefficient
Can F. (ed), Ozkarahan E. Journal of the American Society for Information Science 38(3): 171-183, 1987. Type: Article
Mar 1 1988
Evaluation of access methods to text documents in office systems
Rabitti F., Zizka J.  Research and development in information retrieval (, King’s College, Cambridge,401984. Type: Proceedings
Sep 1 1985
Retrieval test evaluation of a rule based automatic indexing (AIR/PHYS)
Fuhr N., Knorz G.  Research and development in information retrieval (, King’s College, Cambridge,4081984. Type: Proceedings
Aug 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy