Detecting Violent Content in Hollywood Movies by Mid-level Audio Representations

Abstract

Movie violent content detection e.g., for providing automated youth protection services is a valuable video content analysis functionality. Choosing discriminative features for the representation of video segments is a key issue in designing violence detection algorithms. In this paper, we employ mid-level audio features which are based on a Bag-of-Audio Words (BoAW) method using Mel-Frequency Cepstral Coefficients (MFCCs). BoAW representations are constructed with two different methods, namely the vector quantization-based (VQ-based) method and the sparse coding-based (SC-based) method. We choose two-class support vector machines (SVMs) for classifying video shots as (non-)violent. Our experiments on detecting violent video shots in Hollywood movies show that the mid-level audio features provide promising results. Additionally, we establish that the SC-based method outperforms the VQ-based one. More importantly, the SC-based method outperforms the unimodal submissions in the MediaEval Violent Scenes Detection (VSD) task, except one vision-based method in terms of average precision.

@inproceedings{acar2013detecting,
  title={Detecting violent content in hollywood movies by mid-level audio representations},
  author={Acar, Esra and Hopfgartner, Frank and Albayrak, Sahin},
  booktitle={Content-Based Multimedia Indexing (CBMI), 2013 11th International Workshop on},
  pages={73--78},
  issn={1949-3983},
  doi={10.1109/CBMI.2013.6576556},
  year={2013},
  organization={IEEE}
}
Authors:
Esra Acar Celik, Frank Hopfgartner, Sahin Albayrak
Category:
Conference Paper
Year:
2013
Location:
Workshop on Content-Based Multimedia Indexing (CBMI)