Towards the Automatic Sentiment Analysis of German News and Forum Documents


The fully automated sentiment analysis on large text collections is an important task in many applications scenarios. The sentiment analysis is a challenging task due to the domain-specific language style and the variety of sentiment indicators. The basis for learning powerful sentiment classifiers are annotated datasets, but for many domains and especially with non-English texts hardly any datasets exist. In order to support the development of sentiment classifiers, we have created two corpora: The first corpus is build based on German news articles. Although news articles should be objective, they often excite subjective emotions. The second corpus consists of annotated messages from a German telecommunication forum. In this paper we describe the process of creating the corpora and discuss our approach for tracing sentiment values, defining clear rules for assigning sentiments scores. Given the corpora we train classifiers that yield good classification results and establish valuable baselines for sentiment analysis. We compare the learned classification strategies and discuss how the approaches can be transferred to new scenarios.

author = {Andreas Lommatzsch and Florian B"utow and Danuta Ploch and Sahin Albayrak},
title = {Towards the Automatic Sentiment Analysis of German News and Forum Documents},
booktitle = {Proceedings of the 17th I4CS Conference, CCIS 717},
year = {2017},
doi = {10.1007/978-3-319-60447-3 2},
pages = {1--16},
location = {Darmstadt, Germany},
publisher = {Springer International Publishing AG}
Proceedings of the 17th I4CS Conference, Darmstadt, Germany