ABSTRACT
Perceptual understanding of media content has many applications, including content-based retrieval, marketing, content optimization, psychological assessment, and affect-based learning. In this paper, we model audio visual features extracted from videos via machine learning approaches to estimate the affective responses of the viewers. We use the LIRIS-ACCEDE dataset and the MediaEval 2017 Challenge setting to evaluate the proposed methods. This dataset is composed of movies of professional or amateur origin, annotated with viewers' arousal, valence, and fear scores. We extract a number of audio features, such as Mel-frequency Cepstral Coefficients, and visual features, such as dense SIFT, hue-saturation histogram, and features from a deep neural network trained for object recognition. We contrast two different approaches in the paper, and report experiments with different fusion and smoothing strategies. We demonstrate the benefit of feature selection and multimodal fusion on estimating affective responses to movie segments.
- Tadas Baltruvsaitis, Marwa Mahmoud, and Peter Robinson. 2015. Cross-dataset learning and person-specific normalisation for automatic action unit detection. In IEEE FG.Google Scholar
- Yoann Baveye, Emmanuel Dellandréa, Christel Chamaret, and Liming Chen. 2015 a. Deep learning vs. kernel methods: Performance for emotion prediction in videos Proc. ACII. IEEE, 77--83. Google ScholarDigital Library
- Yoann Baveye, Emmanuel Dellandrea, Christel Chamaret, and Liming Chen. 2015 b. LIRIS-ACCEDE: A video database for affective content analysis. IEEE Transactions on Affective Computing Vol. 6, 1 (2015), pages43--55.Google ScholarDigital Library
- Anna Bosch, Andrew Zisserman, and Xavier Munoz. 2007. Image classification using random forests and ferns ICCV.Google Scholar
- KL Brunick, JE Cutting, and JE DeLong. 2013. Low-level features of film: What they are and why we would be lost without them. Psychocinematics: Exploring cognition at the movies (2013), 133--148.Google Scholar
- Luca Canini, Sergio Benini, and Riccardo Leonardi. 2013 a. Affective recommendation of movies based on selected connotative features. Circuits and Systems for Video Technology, IEEE Transactions on Vol. 23, 4 (2013), 636--647. Google ScholarDigital Library
- Luca Canini, Sergio Benini, and Riccardo Leonardi. 2013 b. Classifying cinematographic shot types. Multimedia tools and applications Vol. 62, 1 (2013), pages51--73.Google Scholar
- James E Cutting, Kaitlin L Brunick, Jordan E DeLong, Catalina Iricinschi, and Ayse Candan. 2011. Quicker, faster, darker: Changes in Hollywood film over 75 years. i-Perception Vol. 2, 6 (2011), 569--576.Google Scholar
- Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z Wang. 2006. Studying aesthetics in photographic images using a computational approach. In ECCV. 288--301. Google ScholarDigital Library
- Emmanuel Dellandréa, Liming Chen, Yoann Baveye, Mats Viktor Sjöberg, Christel Chamaret, et almbox.. 2016. The mediaeval 2016 emotional impact of movies task MediaEval 2016 Multimedia Benchmark Workshop Working Notes.Google Scholar
- Florian Eyben and Björn Schuller. 2015. openSMILE: the Munich open-source large-scale multimedia feature extractor. Proc. ACM Multimedia Vol. 6, 4 (2015), 4--13. Google ScholarDigital Library
- Weiming Hu, Nianhua Xie, Li Li, Xianglin Zeng, and Stephen Maybank. 2011. A survey on visual content-based video indexing and retrieval. IEEE Trans. Systems, Man, and Cybernetics, Part C Vol. 41, 6 (2011), pages797--819. Google ScholarDigital Library
- Guang-Bin Huang, Hongming Zhou, Xiaojian Ding, and Rui Zhang. 2012. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) Vol. 42, 2 (2012), 513--529. Google ScholarDigital Library
- Zitong Jin, Yuqi Yao, Ye Ma, and Mingxing Xu. 2017. THUHCSI in MediaEval 2017 Emotional Impact of Movies Task Proc. MediaEval.Google Scholar
- Nihan Karslioglu, Yasemin Timar, Albert Ali Salah, and Heysem Kaya. 2017. BOUN-NKU in MediaEval 2017 Emotional Impact of Movies Task Proc. MediaEval.Google Scholar
- Heysem Kaya, Tuugcce Özkaptan, Albert Ali Salah, and Fikret Gürgen. 2015. Random discriminative projection based feature selection with application to conflict recognition. IEEE Signal Processing Letters Vol. 22, 6 (2015), bibinfopages671--675.Google ScholarCross Ref
- Davis E King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research Vol. 10, Jul (2009), pages 1755--1758. Google ScholarDigital Library
- Vu Lam, Sang Phan Le, Duy-Dinh Le, Shin'ichi Satoh, and Duc Anh Duong. 2015. NII-UIT at MediaEval 2015 Affective Impact of Movies Task Proc. MediaEval.Google Scholar
- Michael S Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006 Content-based multimedia information retrieval: State of the art and challenges. ACM TOMCCAP Vol. 2, 1 (2006), 1--19. Google ScholarDigital Library
- Weisi Lin and C-C Jay Kuo. 2011. Perceptual visual quality metrics: A survey. Journal of Visual Communication and Image Representation Vol. 22, 4 (2011), 297--312. Google ScholarDigital Library
- Florent Perronnin and Christopher Dance. 2007. Fisher kernels on visual vocabularies for image categorization CVPR.Google Scholar
- James A Russell. 1980. A circumplex model of affect. Journal of personality and social psychology Vol. 39, 6 (1980), 1161--1178.Google ScholarCross Ref
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Mats Sjöberg, Yoann Baveye, Hanli Wang, Vu Lam Quang, Bogdan Ionescu, Emmanuel Dellandréa, Markus Schedl, Claire-Hélène Demarty, and Liming Chen. 2015. The MediaEval 2015 Affective Impact of Movies Task. Proc. MediaEval.Google Scholar
- Greg M Smith. 1999. Local emotions, global moods, and film structure. Passionate views: Film, cognition, andemotion (1999), 103--26.Google Scholar
- Greg M Smith. 2003. Film structure and the emotion system. Cambridge University Press.Google Scholar
- A. Vedaldi and K. Lenc. 2015. MatConvNet -- Convolutional Neural Networks for MATLAB Proceeding of the ACM Int. Conf. on Multimedia. Google ScholarDigital Library
- Peter R Winters. 1960. Forecasting sales by exponentially weighted moving averages. Management science Vol. 6, 3 (1960), 324--342. Google ScholarDigital Library
- Yun Yi, Hanli Wang, Bowen Zhang, and Jian Yu. 2015. MIC-TJU in MediaEval 2015 Affective Impact of Movies Task Proc. MediaEval.Google Scholar
- Wei Zeng, Wen Gao, and Debin Zhao. 2002. Video indexing by motion activity maps. In Proc. ICIP, Vol. Vol. 1. pages I--912.Google Scholar
- Zhihong Zeng, Maja Pantic, Glenn I Roisman, and Thomas S Huang. 2009. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence Vol. 31, 1 (2009), 39--58. Google ScholarDigital Library
- Zoran Zivkovic and Ferdinand van der Heijden. 2006. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern recognition lettersVol. 27, 7 (2006), pages 773--780. Google ScholarDigital Library
- Weiwei Zong, Guang-Bin Huang, and Yiqiang Chen. 2013. Weighted extreme learning machine for imbalance learning. Neurocomputing Vol. 101(2013), 229--242. Google ScholarDigital Library
Index Terms
- Feature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie Clips
Recommendations
Recognizing Induced Emotions of Movie Audiences from Multimodal Information
Recognizing emotional reactions of movie audiences to affective movie content is a challenging task in affective computing. Previous research on induced emotion recognition has mainly focused on using audio-visual movie content. Nevertheless, the ...
Predicting Evoked Emotions in Video
ISM '14: Proceedings of the 2014 IEEE International Symposium on MultimediaUnderstanding how human emotion is evoked from visual content is a task that we as people do every day, but machines have not yet mastered. In this work we address the problem of predicting the intended evoked emotion at given points within movie ...
Real-time classification of evoked emotions using facial feature tracking and physiological responses
We present automated, real-time models built with machine learning algorithms which use videotapes of subjects' faces in conjunction with physiological measurements to predict rated emotion (trained coders' second-by-second assessments of sadness or ...
Comments