Defining a Meaningful Baseline for News Recommender Systems
Abstract
Evaluation protocols for news recommender systems typically involve comparing the performance of methods to a baseline. The difference in performance ought to tell us what benefit we can expect from using a more sophisticated method. Ultimately, there is a trade-off between performance and effort in implementing and maintaining a system. This work explores what baselines have been used, what criteria baselines must fulfil, and evaluates a variety of baselines in a news recommender evaluation setting with multiple publishers. We find that circular buffers and trend-based predictions score highly, need little effort to implement, and require no additional data. Besides, we observe variations among publishers, suggesting that not all baselines are equally competitive in different circumstances.