TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning
Abstract
This paper describes the participation of the TUB-IRML group to the MediaEval 2014 Violent Scenes Detection (VSD) affect task. We employ low- and mid-level audio-visual features fused at the decision level. We perform feature space partitioning of training samples through k-means clustering and train a different model for each cluster. These models are then used to predict the violence level of videos by employing two-class support vector machines (SVMs) and a classifier selection approach. The experimental results obtained on Hollywood movies and short Web videos show the superiority of mid-level audio features over visual features in terms of discriminative power, and a further enhanced performance resulting from the fusion of audio-visual cues at the decision-level. Finally, the results also demonstrate a performance gain obtained by partitioning the feature space and training multiple models, compared to a unique violence detection model.
