This paper looks at a new mechanism using genetic algorithms (GAs) to generate an improved query for an information retrieval system. There has been some research on using GAs for generating better queries using user relevance feedback.
Cecchini et al. note that their methodology can help with task-based search, deep Web search, and even knowledge management. Moreover, they note that GAs are appropriate, as finding a better query is an optimization problem in a high-dimensional space where exploration and exploitation can help find a reasonable--if not optimal--solution.
The authors experiment with different fitness functions, including the involvement of standard document-query similarity functions, such as the cosine or Jacquard function. They consider a maximum query quality measure and a mean quality measure, involving the cosine similarity function as their fitness function. What is new here is that they use estimated relevance, rather than user feedback, based on thematic context. The standard GA mechanisms of crossover and mutation are employed. The authors also consider an elitism mechanism where promising queries whose fitness function is highest are brought to the succeeding population of queries without modification. They also consider novelty-driven fitness and generate a new similarity function to consider novelty.
The authors run various experiments to look at different mutation rates and consider a test database of topics from a journal. The results look promising. The authors conclude with a desire to extend their work with additional settings for the GAs, considering special domains with specific syntaxes, as well as considering additional fitness functions.