Computing Reviews, the leading online review service for computing literature.

Search

Computation of term/document discrimination values by use of the cover coefficient
Can F. (ed), Ozkarahan E. Journal of the American Society for Information Science38 (3):171-183,1987.Type:Article

Date Reviewed: Mar 1 1988

This paper reviews one method of calculating the term discrimination parameter (a measure of document separability). This method, the cosine coefficient, is compared with another method devised by the authors, the coverage coefficient. The authors show by experimental simulation that one method for calculating the coverage coefficient is considerably faster than the cosine method and gets substantially the same results. The authors further discuss how the coverage coefficient can be used to identify document clusters and the “optimum” weighting of index terms. The authors’ work is carefully and thoroughly documented and is important in the investigation of the theoretical and practical advancement of document retrieval methodologies. In fact, the utility of such methodologies should extend to any situation that involves the clustering of entities according to their similarities (as determined by assigned descriptors from a given set of descriptors). Readers should be prepared to slog through some fairly hairy mathematics without benefit of examples (or even explanations of how the formulas reduce to simpler expressions in the binary weighting--strict Boolean--case). What is most controversial is how significant these discrimination measures are for current and future document retrieval analysis. Modern retrieval systems involve much more than just selecting terms as simple units from a thesaurus; effective retrieval is accomplished by using qualifications such as specifying individual words and/or word stems and their proximity with each other or their appearance in particular document fields (e.g., title words versus descriptor or abstract words). Also, while several terms, or qualified terms, may achieve poor discrimination values because they are posted to a large fraction of all documents, their combination with each other, or with a previous, perhaps already narrow, search expression, may achieve just the kind of discrimination a searcher seeks. Thus, the supposition that there is a priori a best discrimination value for terms is highly suspect; one could argue that terms should express the actual facets of a document, whether or not they, as a set of individuated units, maximally separate documents in some multidimensional descriptor space.

Reviewer: R. S. Marcus	Review #: CR111992

Indexing Methods (H.3.1 ... )

Clustering (H.3.3 ... )

Would you recommend this review?

yes

Other reviews under "Indexing Methods":	Date

Automatic indexing of full texts Jonák Z. Information Processing and Management: an International Journal 20(5-6): 619-627, 1984. Type: Article	Jul 1 1985

Evaluation of access methods to text documents in office systems Rabitti F., Zizka J. Research and development in information retrieval (, King’s College, Cambridge,401984. Type: Proceedings	Sep 1 1985

Retrieval test evaluation of a rule based automatic indexing (AIR/PHYS) Fuhr N., Knorz G. Research and development in information retrieval (, King’s College, Cambridge,4081984. Type: Proceedings	Aug 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy