Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Computation of term/document discrimination values by use of the cover coefficient
Can F. (ed), Ozkarahan E. Journal of the American Society for Information Science38 (3):171-183,1987.Type:Article
Date Reviewed: Mar 1 1988

This paper reviews one method of calculating the term discrimination parameter (a measure of document separability). This method, the cosine coefficient, is compared with another method devised by the authors, the coverage coefficient. The authors show by experimental simulation that one method for calculating the coverage coefficient is considerably faster than the cosine method and gets substantially the same results. The authors further discuss how the coverage coefficient can be used to identify document clusters and the “optimum” weighting of index terms.

The authors’ work is carefully and thoroughly documented and is important in the investigation of the theoretical and practical advancement of document retrieval methodologies. In fact, the utility of such methodologies should extend to any situation that involves the clustering of entities according to their similarities (as determined by assigned descriptors from a given set of descriptors).

Readers should be prepared to slog through some fairly hairy mathematics without benefit of examples (or even explanations of how the formulas reduce to simpler expressions in the binary weighting--strict Boolean--case).

What is most controversial is how significant these discrimination measures are for current and future document retrieval analysis. Modern retrieval systems involve much more than just selecting terms as simple units from a thesaurus; effective retrieval is accomplished by using qualifications such as specifying individual words and/or word stems and their proximity with each other or their appearance in particular document fields (e.g., title words versus descriptor or abstract words). Also, while several terms, or qualified terms, may achieve poor discrimination values because they are posted to a large fraction of all documents, their combination with each other, or with a previous, perhaps already narrow, search expression, may achieve just the kind of discrimination a searcher seeks. Thus, the supposition that there is a priori a best discrimination value for terms is highly suspect; one could argue that terms should express the actual facets of a document, whether or not they, as a set of individuated units, maximally separate documents in some multidimensional descriptor space.

Reviewer:  R. S. Marcus Review #: CR111992
Bookmark and Share
 
Indexing Methods (H.3.1 ... )
 
 
Clustering (H.3.3 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Indexing Methods": Date
Automatic indexing of full texts
Jonák Z. Information Processing and Management: an International Journal 20(5-6): 619-627, 1984. Type: Article
Jul 1 1985
Evaluation of access methods to text documents in office systems
Rabitti F., Zizka J.  Research and development in information retrieval (, King’s College, Cambridge,401984. Type: Proceedings
Sep 1 1985
Retrieval test evaluation of a rule based automatic indexing (AIR/PHYS)
Fuhr N., Knorz G.  Research and development in information retrieval (, King’s College, Cambridge,4081984. Type: Proceedings
Aug 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy