In information retrieval, a term spectrum is a complex-valued entity whose magnitude and phase are respectively related to the corresponding term’s frequency and position. By using query terms’ spectral locality information, fast proximity calculations can be performed in text searching.
In this paper, the authors propose a new spectral-based information retrieval method using a wavelet transform. They aim to provide the benefits of proximity searches with the speed of the vector-space model. The authors also show that the vector-space model is a special case of their approach. It is shown that their approach provides a four percent retrieval precision improvement compared with the vector-space model. Retrieval effectiveness is proportional to index size. The index size of the approach depends on the precision of quantization of the stored information; it can be 20 to 160 percent larger than that of the vector-space model. A typical query execution time is reported as 0.02 seconds for the vector-space model, and 0.1 seconds for the proposed method.
The paper is not easy to read for a person with a typical computer science (CS) background. It requires knowledge of signal processing. A CS person with an electrical engineering background will be better able to understand the paper. The authors provide no statistical evidence for the significance of the effectiveness improvements. Cross-fertilization among different fields of knowledge is important. The authors present an attractive idea. However, much more work needs to be done to show the effectiveness and efficiency of this approach.