NewsImages: Addressing the Depiction Gap with an Online News Dataset for Text-Image Rematching

Abstract

We present NewsImages, a dataset of online news items, and the related NewsImages rematching task. The goal of NewsImages is to provide multimedia researchers with a means of studying the depiction gap, which we define to be the difference between what an image literally depicts and the way in which it is connected to the text that it accompanies. Online news is a domain in which the image-text connection is known to be indirect, in other words, the news article does not describe what is literally depicted in the image. We validate NewsImages with a series of experimental results that show that the dataset and the task are useful for studying naturally occurring connections between image and text, as well as addressing the challenges of the depiction gap, which include sparse data, diversity of content, and importance of background knowledge.

Authors:
Andreas Lommatzsch, Benjamin Kille, Özlem Özgöbek, Mingliang Liang, Yuxiao Zhou, Jelena Tesic, Cláudio Bartolomeu, David Semedo, Lidia Pivovarova, Martha Larson
Category:
Conference Paper
Year:
2022
Location:
MMSys '22, 13th ACM Multimedia Systems Conference, June 14-17, 2022, Athlone, Ireland
Link: