Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Predicting information retrieval performance
Losee R., Morgan&Claypool Publishers, San Rafael, CA, 2019. 79 pp. Type: Book (978-1-681734-72-9)
Date Reviewed: Apr 10 2019

As any Internet user knows, searching is a major activity. Given the size of the total data content available, it is amazing how quickly various search engines are able to provide results. Search providers want to understand the performance of their systems in order to optimize the use of equipment and minimize power consumption, thus reducing the costs of doing business. End users want subsecond response times, and hence have an indirect wish for the same optimizations as the providers. Thus, how does one go about achieving good results?

In this short monograph, Losee seeks a scientific approach to developing predictive models of information retrieval performance using statistical methods. Models include definitions such as relevant versus not relevant documents, both of which may include terms upon which a search is based. A search produces a list of candidate documents. Ordering is important. For example, search performance would be seen as poor by the requestor if the list presented started with several non-relevant results. Other measures include precision, the percentage of relevant documents retrieved, recall (which is the fraction of relevant documents retrieved versus the total number of relevant documents available), fallout, the number of non-relevant documents retrieved, and generality (the percentage of relevant documents in the set queried).

From these definitions, one can develop statistical measures of the quality of a retrieval, including best and worst cases. Using the measures, one can predict the result of a query against a test set of documents, and then verify if the prediction matched the expected result. This methodology can be expanded to multiple terms, or terms that may appear in a document that are strongly related to the search term. The monograph concludes by expanding to consider metadata and indexing in the performance model.

The author notes that most information retrieval analyses focus on performance measured after retrieval, whereas his models are predictive, that is, before retrieval. His motivation is to develop rules or laws similar to physical laws, such as those from thermodynamics or mechanics: “Most sciences produce rules predicting outcomes consistent with system characteristics.” The difference with retrieval systems is that, unlike physical entities, each system differs from one to another. In this regard, one would like to see predictions made with multiple retrieval systems in real-world applications, to test the models. If information retrieval can be described by laws analogous to physical laws, such testing should reveal them.

Reviewer:  G. R. Mayforth Review #: CR146524 (1906-0218)
Bookmark and Share
 
Information Search And Retrieval (H.3.3 )
 
 
Retrieval Models (H.3.3 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Information Search And Retrieval": Date
Nested transactions in a combined IRS-DBMS architecture
Schek H. (ed)  Research and development in information retrieval (, King’s College, Cambridge,701984. Type: Proceedings
Nov 1 1985
An integrated fact/document information system for office automation
Ozkarahan E., Can F. (ed) Information Technology Research Development Applications 3(3): 142-156, 1984. Type: Article
Oct 1 1985
Access methods for text
Faloutsos C. ACM Computing Surveys 17(1): 49-74, 1985. Type: Article
Jan 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy