A modification to the mechanism suggested in previous work to improve information retrieval performance via relevance feedback is presented in this paper. In this case, the authors consider pseudo-relevance feedback (PRF), which takes the top P documents retrieved in response to a query and assumes those are relevant, so that K new terms in those documents can be added to the original query. The authors note that this is a form of unsupervised learning. The authors also consider flexible PRF, where the parameters can be optimized for each search.
The real contribution of this paper is that the authors have extended this methodology by using selective sampling, so some of the top documents can be skipped, which is related to document clustering. The authors add the notion of memory resetting, which involves taking a few documents, discarding the next few, taking the next few, and so on, based on the number of search terms in the document at each rank.
The algorithm is tested on standard Japanese and Japanese/English test collections used in cross-language retrieval work. The authors use their previously presented bi-directional retriever/information distiller for Japanese and English (BRIDJE) system, and measure effectiveness via mean average precision for documents deemed highly relevant, relevant, or partially relevant. They are interested in seeing when improvement is reached over all topics. They find that PRF is an improvement, but that their algorithm does not always provide the best improvement. Analysis of some of the subtopics, however, shows that their new algorithm can be promising.