Distance Measures in Query Space: How Strongly to Use Feedback from Past Queries
Abstract
Feedback on past queries is a valuable resource for improving retrieval performance on new queries. We introduce a modular approach to incorporating feedback information into given retrieval architectures. We propose to fusion the original ranking with those returned by rerankers, each of which trained on feedback given for a distinct, single query. Here, we examine the basic case of improving a query's original ranking qtest by only using one reranker: the one trained on feedback on the "closest" query qtrain. We examine the use of various distance measures between queries to first identify qtrain and then determine the best linear combination of the original and the reranker's ratings, that is: to find out which feedback to learn from, and how strongly to use it. We show the cosine distance between the term vectors of the two queries, each enriched by representations of the top N originally returned documents, to reliably answer both questions. The fusion performs equally well or better than a) always using only the original ranker or the reranker, b) selecting a hard distance threshold to decide between the two, or c) fusioning results with a ratio that is globally optimized, but fixed across all tested queries.