An original approach for search engine query recommendation based on query log analysis is presented in this paper. The analysis process proposed in the approach is composed of three steps: query log parsing, query sequence identification, and query similarity computation. The originality of the approach is that it combines query sequence analysis and query similarity computation. This combination improves the precision rate of results over traditional query log analysis. This improvement has been measured on real data using both subjective and objective evaluations. Subjective evaluation has been conducted by collecting editors’ feedback on query results. Objective evaluation has been conducted by comparing the query recommendations produced against a set of actual query sessions.
Many enhancements could be made to the approach. First of all, damping factors used in the similarity graphs could rely more on existing measures from the literature, like association measures, scalar measures, and metrics measures [1]. Second, the assumption that used queries are more relevant to future users should be tested and quantified. Finally, the results should be compared to those obtained using the existing approaches. These enhancements could be combined with those already mentioned in the future work section.
In general, this paper will be interesting to search engine researchers and developers. General prerequisites in information retrieval are needed to understand the similarity computation. Prerequisites in Web usage mining are also needed to understand the query log analysis process.