Performance Measures for Multi-Graded Relevance


We extend performance measures commonly used in semantic web applications to be capable of handling multi-graded relevance data. Most of today's recommender social web applications offer the possibility to rate objects with different levels of relevance. Nevertheless most performance measures in Information Retrieval and recommender systems are based on the assumption that retrieved objects (e. g. entities or documents) are either relevant or irrelevant. Hence, thresholds have to be applied to convert multi-graded relevance labels to binary relevance labels. With regard to the necessity of evaluating information retrieval strategies on multi-graded data, we propose an extended version of the performance measure average precision that pays attention to levels of relevance without applying thresholds, but keeping and respecting the detailed relevance information. Furthermore we propose an improvement to the NDCG measure avoiding problems caused by different scales in different datasets.

author = {Scheel, Christian and Lommatzsch, Andreas and Albayrak, Sahin},
title = {Performance Measures for Multi-Graded Relevance},
booktitle = {Proceedings of the second Workshop on Semantic Personalized Information Management: Retrieval and Recommendation 2011; Workshop in conjunction with the 10th International Semantic Web Conference 2011 (ISWC 2011)},
year = {2011},
editor = {de Gemmis, Marco and De Luca, Ernesto William and Di Noia, Tommaso and Gangemi, Aldo and Hausenblas, Michael and Lops, Pasquale and Lukasiewicz, Thomas and Plumbaum, Till and Semeraro, Giovanni},
publisher = {CEUR Workshop Proceedings (CEUR-WS)},
issn = {1613-0073},
language = {english}
2nd Workshop on Semantic Personalized Information Management: Retrieval and Recommendation, Bonn, Germany, October 24, 2011