SPIGA – A Multilingual News Aggregator

Abstract

News aggregation web sites collect and group news articles from a multitude of sources in order to help users navigate and consume large amounts of news material. In this context, Topic Detection and Tracking (TDT) methods address the challenges of identifying new events in streams of news articles, and of threading together related articles. We propose a novel model for a multilingual news aggregator that groups together news articles in different languages, and thus allows users to get an overview of important events and their reception in different countries. Our model combines a vector space model representation of documents based on a multilingual lexicon of Wikipedia-derived concepts with named entity disambiguation and multilingual clustering methods for TDT. We describe an implementation of our approach on a large-scale, real-life data stream of English and German newswire sources, and present an evaluation of the Named Entity Disambiguation module, which achieves state-of-the-art performance on a German and an English evaluation dataset.

@inproceedings{hennig11a,
author = {Hennig, Leonhard and Ploch, Danuta and Prawdzik, Daniel and Armbruster, Benjamin and Düwiger, Holger and De Luca, Ernesto William and Albayrak, Sahin},
title = {SPIGA - A Multilingual News Aggregator},
booktitle = {Proceedings of GSCL 2011},
year = {2011},
}

Autoren:

Leonhard Hennig, Danuta Ploch, Daniel Prawdzik, Benjamin Armbruster, Christoph Büscher, Holger Düwiger, Ernesto William De Luca, Sahin Albayrak

Kategorie:

Tagungsbeitrag

Jahr:

2011

Ort:

Proceedings of the Biennal GSCL Conference 2011