Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
On a model of distributed information retrieval systems based on thesauri
Mazur Z. Information Processing and Management: an International Journal20 (4):499-505,1984.Type:Article
Date Reviewed: Sep 1 1985

For each database in a document retrieval system, there is usually associated a thesaurus of terms which are used to index and retrieve the documents. The structure of thesauri is generally hierarchical; that is, there exists a “tree” of several levels of increasingly more specific terms.

An important current problem in the field of information retrieval is the need to fashion retrieval techniques which will be useful in a network of heterogeneous databases where a different thesaurus may be associated with each database. In this situation, any two thesauri will have some, but not all, of the same terms and hierarchical relations.

Mazur’s contribution is to develop a mathematical model of this situation with formalized definitions of such entities as an individual (local) retrieval system and its thesaurus, document collection, and query and retrieval sets. Futhermore, the formalization of a distributed system made up of a number of local systems is developed. In some simple situations, certain mathematical properties of the relationship between the local and distributed systems are derived.

The author has made a nice start in formalizing this situation. Unfortunately, there are four major kinds of difficulties that need to be overcome before this kind of work can be truly useful. First, the mathematical descriptions and relationships have to be clearly associated with known and understandable features of retrieval systems. This is a question of good, interpretive exposition, and should be doable.

The second difficulty is that modern retrieval systems are quite complicated. Besides indexing and searching by “controlled-vocabulary” terms from a thesaurus, there is free-vocabulary indexing (any word from titles and abstracts) and there is searching by masking, truncation, proximity, field specification, and weighting operations. The third difficulty is that even though the same terms superficially are used in indexing and searching, they may have different meanings in different contexts.

The fourth difficulty is that the utility (and, therefore, cost effectiveness) of any system is bound up in retrieving relevant and useful documents. Both of these parameters, as well as the “meaning” one, are highly subjective and not easily captured with formalized, mathematical constructs. These last three difficulties pose formidable problems for the formalization and modeling of document retrieval systems, in general, and distributed systems, in particular.

Reviewer:  R. S. Marcus Review #: CR108904
Bookmark and Share
 
Retrieval Models (H.3.3 ... )
 
 
Thesauruses (H.3.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Retrieval Models": Date
Evaluation of an inference network-based retrieval model
Turtle H., Croft W. (ed) ACM Transactions on Information Systems 9(3): 187-222, 1991. Type: Article
May 1 1993
Information processing in linear vector space
Kunz M. Information Processing and Management: an International Journal 20(4): 519-525, 1984. Type: Article
Mar 1 1985
Users and experts in the document retrieval system model
Danilowicz C. International Journal of Man-Machine Studies 21(3): 245-252, 1984. Type: Article
May 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy