Efficient Query Delegation by Detecting Redundant Retrieval Strategies


The task of combining the output of several retrieval strategies into a single relevance prediction per document is known as data fusion. The LETOR dataset provides three corpora with predictions of 25 or 44 strategies (depending on the corpus) per document/query pair. Given such a large number of basic strategies, a point which is equally crucial as optimality of the combination, in our view, is its sparseness: Which strategies should be used in a real application when each strategy consumes resources? We hence focus on the question of query delegation, a special case of weighting strategies: Which strategies should be weighted greater than zero, i.e., asked in the first place? We propose several similarity measures between strategies like various correlation measures or precision@n. Assuming that similar strategies may not contribute much to each other's results, we perform a clustering based on these similarities and only consider the best representative of each cluster. We show that this fusion strategy performs comparably to other fusion approaches like RankSVM or RankBoost, but only needs to consult a fraction of the available retrieval strategies.

  author = {Christian Scheel and Nicolas Neubauer and Andreas Lommatzsch and
	Klaus Obermayer and Sahin Albayrak},
  title = {Efficient Query Delegation by Detecting Redundant Retrieval Strategies},
  booktitle = {Proceedings of SIGIR 2007 Workshop: Learning to Rank for Information
  year = {2007}
Christian Scheel, Nicolas Neubauer, Andreas Lommatzsch, Klaus Obermayer, Sahin Albayrak
Conference Paper
SIGIR 2007 Workshop: Learning to Rank for Information Retrieval