NewsImages: Addressing the Depiction Gap with an Online News Dataset for Text-Image Rematching
Abstract
We present NewsImages, a dataset of online news items, and the related NewsImages rematching task. The goal of NewsImages is to provide multimedia researchers with a means of studying the depiction gap, which we define to be the difference between what an image literally depicts and the way in which it is connected to the text that it accompanies. Online news is a domain in which the image-text connection is known to be indirect, in other words, the news article does not describe what is literally depicted in the image. We validate NewsImages with a series of experimental results that show that the dataset and the task are useful for studying naturally occurring connections between image and text, as well as addressing the challenges of the depiction gap, which include sparse data, diversity of content, and importance of background knowledge.