research-article

Feature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie Clips

Authors:
Yasemin Timar

Bogazici University, Istanbul, Turkey

Bogazici University, Istanbul, Turkey
View Profile

,
Nihan Karslioglu

Bogazici University, Istanbul, Turkey

Bogazici University, Istanbul, Turkey
View Profile

,
Heysem Kaya

Namik Kemal University, Corlu, Tekirdag, Turkey

Namik Kemal University, Corlu, Tekirdag, Turkey
View Profile

,
Albert Ali Salah

Bogazici University & Nagoya University, Nagoya, Japan

Bogazici University & Nagoya University, Nagoya, Japan
View Profile

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia RetrievalJune 2018Pages 405–412https://doi.org/10.1145/3206025.3206074

Published:05 June 2018Publication History

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

Pages 405–412

ABSTRACT

Perceptual understanding of media content has many applications, including content-based retrieval, marketing, content optimization, psychological assessment, and affect-based learning. In this paper, we model audio visual features extracted from videos via machine learning approaches to estimate the affective responses of the viewers. We use the LIRIS-ACCEDE dataset and the MediaEval 2017 Challenge setting to evaluate the proposed methods. This dataset is composed of movies of professional or amateur origin, annotated with viewers' arousal, valence, and fear scores. We extract a number of audio features, such as Mel-frequency Cepstral Coefficients, and visual features, such as dense SIFT, hue-saturation histogram, and features from a deep neural network trained for object recognition. We contrast two different approaches in the paper, and report experiments with different fusion and smoothing strategies. We demonstrate the benefit of feature selection and multimodal fusion on estimating affective responses to movie segments.

References

Tadas Baltruvsaitis, Marwa Mahmoud, and Peter Robinson. 2015. Cross-dataset learning and person-specific normalisation for automatic action unit detection. In IEEE FG.Google Scholar
Yoann Baveye, Emmanuel Dellandréa, Christel Chamaret, and Liming Chen. 2015 a. Deep learning vs. kernel methods: Performance for emotion prediction in videos Proc. ACII. IEEE, 77--83. Google ScholarDigital Library
Yoann Baveye, Emmanuel Dellandrea, Christel Chamaret, and Liming Chen. 2015 b. LIRIS-ACCEDE: A video database for affective content analysis. IEEE Transactions on Affective Computing Vol. 6, 1 (2015), pages43--55.Google ScholarDigital Library
Anna Bosch, Andrew Zisserman, and Xavier Munoz. 2007. Image classification using random forests and ferns ICCV.Google Scholar
KL Brunick, JE Cutting, and JE DeLong. 2013. Low-level features of film: What they are and why we would be lost without them. Psychocinematics: Exploring cognition at the movies (2013), 133--148.Google Scholar
Luca Canini, Sergio Benini, and Riccardo Leonardi. 2013 a. Affective recommendation of movies based on selected connotative features. Circuits and Systems for Video Technology, IEEE Transactions on Vol. 23, 4 (2013), 636--647. Google ScholarDigital Library
Luca Canini, Sergio Benini, and Riccardo Leonardi. 2013 b. Classifying cinematographic shot types. Multimedia tools and applications Vol. 62, 1 (2013), pages51--73.Google Scholar
James E Cutting, Kaitlin L Brunick, Jordan E DeLong, Catalina Iricinschi, and Ayse Candan. 2011. Quicker, faster, darker: Changes in Hollywood film over 75 years. i-Perception Vol. 2, 6 (2011), 569--576.Google Scholar
Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z Wang. 2006. Studying aesthetics in photographic images using a computational approach. In ECCV. 288--301. Google ScholarDigital Library
Emmanuel Dellandréa, Liming Chen, Yoann Baveye, Mats Viktor Sjöberg, Christel Chamaret, et almbox.. 2016. The mediaeval 2016 emotional impact of movies task MediaEval 2016 Multimedia Benchmark Workshop Working Notes.Google Scholar
Florian Eyben and Björn Schuller. 2015. openSMILE: the Munich open-source large-scale multimedia feature extractor. Proc. ACM Multimedia Vol. 6, 4 (2015), 4--13. Google ScholarDigital Library
Weiming Hu, Nianhua Xie, Li Li, Xianglin Zeng, and Stephen Maybank. 2011. A survey on visual content-based video indexing and retrieval. IEEE Trans. Systems, Man, and Cybernetics, Part C Vol. 41, 6 (2011), pages797--819. Google ScholarDigital Library
Guang-Bin Huang, Hongming Zhou, Xiaojian Ding, and Rui Zhang. 2012. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) Vol. 42, 2 (2012), 513--529. Google ScholarDigital Library
Zitong Jin, Yuqi Yao, Ye Ma, and Mingxing Xu. 2017. THUHCSI in MediaEval 2017 Emotional Impact of Movies Task Proc. MediaEval.Google Scholar
Nihan Karslioglu, Yasemin Timar, Albert Ali Salah, and Heysem Kaya. 2017. BOUN-NKU in MediaEval 2017 Emotional Impact of Movies Task Proc. MediaEval.Google Scholar
Heysem Kaya, Tuugcce Özkaptan, Albert Ali Salah, and Fikret Gürgen. 2015. Random discriminative projection based feature selection with application to conflict recognition. IEEE Signal Processing Letters Vol. 22, 6 (2015), bibinfopages671--675.Google ScholarCross Ref
Davis E King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research Vol. 10, Jul (2009), pages 1755--1758. Google ScholarDigital Library
Vu Lam, Sang Phan Le, Duy-Dinh Le, Shin'ichi Satoh, and Duc Anh Duong. 2015. NII-UIT at MediaEval 2015 Affective Impact of Movies Task Proc. MediaEval.Google Scholar
Michael S Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006 Content-based multimedia information retrieval: State of the art and challenges. ACM TOMCCAP Vol. 2, 1 (2006), 1--19. Google ScholarDigital Library
Weisi Lin and C-C Jay Kuo. 2011. Perceptual visual quality metrics: A survey. Journal of Visual Communication and Image Representation Vol. 22, 4 (2011), 297--312. Google ScholarDigital Library
Florent Perronnin and Christopher Dance. 2007. Fisher kernels on visual vocabularies for image categorization CVPR.Google Scholar
James A Russell. 1980. A circumplex model of affect. Journal of personality and social psychology Vol. 39, 6 (1980), 1161--1178.Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Mats Sjöberg, Yoann Baveye, Hanli Wang, Vu Lam Quang, Bogdan Ionescu, Emmanuel Dellandréa, Markus Schedl, Claire-Hélène Demarty, and Liming Chen. 2015. The MediaEval 2015 Affective Impact of Movies Task. Proc. MediaEval.Google Scholar
Greg M Smith. 1999. Local emotions, global moods, and film structure. Passionate views: Film, cognition, andemotion (1999), 103--26.Google Scholar
Greg M Smith. 2003. Film structure and the emotion system. Cambridge University Press.Google Scholar
A. Vedaldi and K. Lenc. 2015. MatConvNet -- Convolutional Neural Networks for MATLAB Proceeding of the ACM Int. Conf. on Multimedia. Google ScholarDigital Library
Peter R Winters. 1960. Forecasting sales by exponentially weighted moving averages. Management science Vol. 6, 3 (1960), 324--342. Google ScholarDigital Library
Yun Yi, Hanli Wang, Bowen Zhang, and Jian Yu. 2015. MIC-TJU in MediaEval 2015 Affective Impact of Movies Task Proc. MediaEval.Google Scholar
Wei Zeng, Wen Gao, and Debin Zhao. 2002. Video indexing by motion activity maps. In Proc. ICIP, Vol. Vol. 1. pages I--912.Google Scholar
Zhihong Zeng, Maja Pantic, Glenn I Roisman, and Thomas S Huang. 2009. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence Vol. 31, 1 (2009), 39--58. Google ScholarDigital Library
Zoran Zivkovic and Ferdinand van der Heijden. 2006. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern recognition lettersVol. 27, 7 (2006), pages 773--780. Google ScholarDigital Library
Weiwei Zong, Guang-Bin Huang, and Yiqiang Chen. 2013. Weighted extreme learning machine for imbalance learning. Neurocomputing Vol. 101(2013), 229--242. Google ScholarDigital Library

Index Terms

Feature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie Clips
1. Information systems
  1. Information retrieval

Recommendations

Recognizing Induced Emotions of Movie Audiences from Multimodal Information
Recognizing emotional reactions of movie audiences to affective movie content is a challenging task in affective computing. Previous research on induced emotion recognition has mainly focused on using audio-visual movie content. Nevertheless, the ...
Read More
Predicting Evoked Emotions in Video
ISM '14: Proceedings of the 2014 IEEE International Symposium on Multimedia

Understanding how human emotion is evoked from visual content is a task that we as people do every day, but machines have not yet mastered. In this work we address the problem of predicting the intended evoked emotion at given points within movie ...
Read More
Real-time classification of evoked emotions using facial feature tracking and physiological responses

We present automated, real-time models built with machine learning algorithms which use videotapes of subjects' faces in conjunction with physiological measurements to predict rated emotion (trained coders' second-by-second assessments of sadness or ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval
June 2018
550 pages
ISBN:9781450350464
DOI:10.1145/3206025
Conference Chairs:
Kiyoharu Aizawa
The Univ. of Tokyo, Japan
,
Michael Lew
Leiden Univ., Netherlands
,
Shin'ichi Satoh
National Inst. of Informatics, Japan
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
ICMR '18 Paper Acceptance Rate44of136submissions,32%Overall Acceptance Rate254of830submissions,31%
More
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 172
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Feature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie Clips

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Recognizing Induced Emotions of Movie Audiences from Multimodal Information

Predicting Evoked Emotions in Video

Real-time classification of evoked emotions using facial feature tracking and physiological responses

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Feature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie Clips

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Recognizing Induced Emotions of Movie Audiences from Multimodal Information

Predicting Evoked Emotions in Video

Real-time classification of evoked emotions using facial feature tracking and physiological responses

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media