Named Entity Disambiguation for German News Articles

Abstract

Named entity disambiguation has become an important research area providing the basis for improving search engine precision and for enabling semantic search. Current approaches for the named entity disambiguation are usually based on exploiting structured semantic and lingual resources (e.g. WordNet, DBpedia). Unfortunately, each of these resources cover independently from each other insufficient information for the task of named entity disambiguation. On the one hand WordNet comprises a relative small number of named entities while on the other hand DBpedia provides only little context for named entities. Our approach is based on the use of multi-lingual Wikipedia data. We show how the combination of multi-lingual resources can be used for named entity disambiguation. Based on a German and an English document corpus, we evaluate various similarity measures and algorithms for extracting data for named entity disambiguation. We show that the intelligent filtering of context data and the combination of multi-lingual information provides high quality named entity disambiguation results.

@inproceedings{ir3,
  address = {Kassel, Germany},
  author = {Andreas Lommatzsch and Danuta Ploch and Ernesto William De Luca and Sahin Albayrak.},
  booktitle = {Proceedings of LWA2010 - Workshop-Woche: Lernen, Wissen {&} Adaptivitaet},
  crossref = {lwa2010},
  editor = {Martin Atzmüller and Dominik Benz and Andreas Hotho and Gerd Stumme},
  title = {Named Entity Disambiguation for German News Articles},
  url = {http://www.kde.cs.uni-kassel.de/conf/lwa10/papers/ir3.pdf},
  year = 2010,
  keywords = {disambiguation entity extraction information mining multilingual},
  session = {ir3},
  track = {ir},
  biburl = {http://www.bibsonomy.org/bibtex/2ef033b004e2588678a381af288797d86/lwa2010},
  presentation_end = {2010-10-06 09:30:00},
  abstract = {Named entity disambiguation has become an important research area providing the basis for improving search engine precision and for enabling semantic search.  Current approaches for the named entity disambiguation are usually based on exploiting structured semantic and lingual resources (e.g. WordNet, DBpedia).  Unfortunately, each of these resources cover independently from each other insufficient information for the task of named entity disambiguation.  On the one hand WordNet comprises a relative small number of named entities while on the other hand DBpedia provides only little context for named entities. Our approach is based on the use of multi-lingual Wikipedia data.  We show how the combination of multi-lingual resources can be used for named entity disambiguation.  Based on a German and an English document corpus, we evaluate various similarity measures and algorithms for extracting data for named entity disambiguation.  We show that the intelligent filtering of context data and the combination of multi-lingual information provides high quality named entity disambiguation results.}
}

Authors:

Andreas Lommatzsch, Danuta Ploch, Ernesto William De Luca, Sahin Albayrak

Category:

Conference Paper

Year:

2010

Location:

LWA 2010 Kassel, Workshop IR