ComputingReviews.com

Predicting information retrieval performance
Losee R., Morgan&Claypool Publishers,San Rafael, CA,2019. 79 pp.Type:Book

Date Reviewed: 04/10/19

As any Internet user knows, searching is a major activity. Given the size of the total data content available, it is amazing how quickly various search engines are able to provide results. Search providers want to understand the performance of their systems in order to optimize the use of equipment and minimize power consumption, thus reducing the costs of doing business. End users want subsecond response times, and hence have an indirect wish for the same optimizations as the providers. Thus, how does one go about achieving good results?

In this short monograph, Losee seeks a scientific approach to developing predictive models of information retrieval performance using statistical methods. Models include definitions such as relevant versus not relevant documents, both of which may include terms upon which a search is based. A search produces a list of candidate documents. Ordering is important. For example, search performance would be seen as poor by the requestor if the list presented started with several non-relevant results. Other measures include precision, the percentage of relevant documents retrieved, recall (which is the fraction of relevant documents retrieved versus the total number of relevant documents available), fallout, the number of non-relevant documents retrieved, and generality (the percentage of relevant documents in the set queried).

From these definitions, one can develop statistical measures of the quality of a retrieval, including best and worst cases. Using the measures, one can predict the result of a query against a test set of documents, and then verify if the prediction matched the expected result. This methodology can be expanded to multiple terms, or terms that may appear in a document that are strongly related to the search term. The monograph concludes by expanding to consider metadata and indexing in the performance model.

The author notes that most information retrieval analyses focus on performance measured after retrieval, whereas his models are predictive, that is, before retrieval. His motivation is to develop rules or laws similar to physical laws, such as those from thermodynamics or mechanics: “Most sciences produce rules predicting outcomes consistent with system characteristics.” The difference with retrieval systems is that, unlike physical entities, each system differs from one to another. In this regard, one would like to see predictions made with multiple retrieval systems in real-world applications, to test the models. If information retrieval can be described by laws analogous to physical laws, such testing should reveal them.

Reviewer: G. R. Mayforth

Review #: CR146524 (1906-0218)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy