MediaEval 2011 Affect Task: Violent Scene Detection combining Audio and Visual Features with SVM
Abstract
We propose an approach for violence analysis of movies in a multi-modal (visual and audio) manner with one-class and two-class support vector machine (SVM). We use the scale-invariant feature transform (SIFT) features with the Bag-of-Words (BoW) approach for visual content description of movies, where audio content description is performed with the Mel-frequency Cepstral Coefficients (MFCCs) features. We investigate the performance of combining visual and audio features in an early fusion manner to describe the violence in movies. The experimental results suggest that one-class SVM is a promising approach for the task.