Towards the Automatic Sentiment Analysis of German News and Forum Documents
Abstract
The fully automated sentiment analysis on large text collections is an important task in many applications scenarios. The sentiment analysis is a challenging task due to the domain-specific language style and the variety of sentiment indicators. The basis for learning powerful sentiment classifiers are annotated datasets, but for many domains and especially with non-English texts hardly any datasets exist. In order to support the development of sentiment classifiers, we have created two corpora: The first corpus is build based on German news articles. Although news articles should be objective, they often excite subjective emotions. The second corpus consists of annotated messages from a German telecommunication forum. In this paper we describe the process of creating the corpora and discuss our approach for tracing sentiment values, defining clear rules for assigning sentiments scores. Given the corpora we train classifiers that yield good classification results and establish valuable baselines for sentiment analysis. We compare the learned classification strategies and discuss how the approaches can be transferred to new scenarios.
