skip to main content
10.1145/2661806.2661811acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video

Authors Info & Claims
Published:07 November 2014Publication History

ABSTRACT

Understanding nonverbal behaviors in human machine interaction is a complex and challenge task. One of the key aspects is to recognize human emotion states accurately. This paper presents our effort to the Audio/Visual Emotion Challenge (AVEC'14), whose goal is to predict the continuous values of the emotion dimensions arousal, valence and dominance at each moment in time. The proposed method utilizes deep belief network based models to recognize emotion states from audio and visual modalities. Firstly, we employ temporal pooling functions in the deep neutral network to encode dynamic information in the features, which achieves the first time scale temporal modeling. Secondly, we combine the predicted results from different modalities and emotion temporal context information simultaneously. The proposed multimodal-temporal fusion achieves temporal modeling for the emotion states in the second time scale. Experiments results show the efficiency of each key point of the proposed method and competitive results are obtained

References

  1. J. Tao and T. Tan, Affective Computing: A Review, Proc. First Int'l Conf. Affective Computing and Intelligent Interaction, J. Tao, T. Tan, and R.W. Picard, eds., pp. 981--995, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T. Gedeon, Emotion recognition in the wild challenge 2013. In ACM International Conference on Multimodal Interaction, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Gunes, M. Pantic. Automatic, dimensional and continuous emotion recognition{J}. International Journal of Synthetic Emotions (IJSE), 2010, 1(1): 68--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Mehrabian, and J. Russell, An approach to environmental psychology. Cambridge, MA: MIT Press,Google ScholarGoogle Scholar
  5. J. Davitz, Auditory correlates of vocal expression of emotional feeling. In J. Davitz (Ed.), The communication of emotional meaning (pp. 101--112).New York: McGraw-Hill, 1964.Google ScholarGoogle Scholar
  6. P. Ekman and E. L. Rosenberg, What the Face Reveals : Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System, seconded. Oxford Univ. Press, 2005.Google ScholarGoogle Scholar
  7. K. R. Scherer, Appraisal Theory, Handbook of Cognition and Emotion, T. Dalgleish and M. J. Power, eds., pp. 637--663, Wiley,1999.Google ScholarGoogle Scholar
  8. N. Sebe, I. Cohen, T. Gevers, and T. S. Huang. Multimodal approaches for emotion recognition: a survey. In S. Santini, R. Schettini, and T. Gevers, editors, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 5670 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, pages56--67, Dec.2004.Google ScholarGoogle Scholar
  9. T. Balomenos, A. Raouzaiou, S. Ioannou, A. Drosopoulos, K. Karpouzis, and S. Kollias. Emotion analysis in man machine interaction systems. In in Proc. MLMI, LNCS3361, pages 318--328, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Lane and L. Nadel, Cognitive Neuroscience of Emotion. Oxford Univ. Press, 2000.Google ScholarGoogle Scholar
  11. P. A. Lewisetal , Neural correlates of processing valence and arousal in affective words, CerebralCortex,vol.17,no.3, pp. 742--748, Mar 2007.Google ScholarGoogle Scholar
  12. M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, and M. Pantic, AVEC 2014 -- 3D Dimensional Affect and Depression Recognition Challenge, proc. 4th ACM international workshop on Audio/visual emotion challenge, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Meng, N. Bianchi-Berthouze, Naturalistic Affective Expression Classification by a Multi-stage Approach Based on Hidden Markov Models, In Affective Computing and Intelligent Interaction (pp. 378--387). Springer Berlin Heidelberg, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Glodek, S. Tschechne, G. Layher, M. Schels, T. Brosch, S. Scherer, F. Schwenker, Multiple Classifier Systems for the Classification of Audio-Visual Emotion States, In Affective Computing and Intelligent Interaction (pp. 378--387). Springer Berlin Heidelberg, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. A. Ramirez, T. Baltrušaitis, and L. P. Morency, Modeling latent discriminative dynamic of multi-dimensional affective signals. In Affective Computing and Intelligent Interaction (pp. 396--406). Springer Berlin Heidelberg, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Wöllmer, M. Kaiser, F. Eyben, B. Schuller, LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework, Image and Vision Computing, 2012.Google ScholarGoogle Scholar
  17. M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, R. Cowie, Abandoning emotion classes -- towards continuous emotion recognition with modeling of long-range dependences, In Proc. Interspeech, pp. 597--600, 2008.Google ScholarGoogle Scholar
  18. M. Wöllmer, B. Schuller, F. Eyben, G. Rigoll, Combining Long Short-Term Memory and Dynamic Bayesian Networks for Incremental Emotion-Sensitive Artificial Listening, IEEE Journal of Selected Topics in Signal Processing (J-STSP), Special Issue on Speech Processing for Natural Interaction with Intelligent Environments, Vol 4, Issue 5, 867--881, 2010Google ScholarGoogle Scholar
  19. J. Nicolle, V. Rapp, K. Bailly, L. Prevost and M. Chetouani, Robust continuous prediction of human emotions using multi-scale dynamic cues, In Proceedings of the 14th ACM international conference on Multimodal interaction, ACM, pp.477--484,2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Meng, D. Huang, H. Wang, H. Yang, M. Al-Shuraifi and Y. Wang, Depression Recognition based on Dynamic Facial and Vocal Expression Features using Partial Least Square Regression, proc. 3rd ACM international workshop on Audio/visual emotion challenge, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Gunes, M. Piccardi, and M. Pantic, From the Lab to the Real World: Affect Recognition Usng, Affective Computing: Focus on Emotion Expression, Synthesis, and Recognition. I-Tech Education and Publishing, Vienna, Austria, pp. 185 - 218, 2008.Google ScholarGoogle Scholar
  22. M. A. Nicolaou, H. Gunes, and M. Pantic, Output-associative RVM regression for dimensional and continuous emotion prediction. Image and Vision Computing, 30(3), 186--196, (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Stuhlsatz, C. Meyer, F. Eyben, T. ZieIke, G. Meier, and B. Schuller, Deep neural networks for acoustic emotion recognition: raising the benchmarks. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on (pp. 5688--5691). IEEE.Google ScholarGoogle Scholar
  24. S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, C. Gülçehre, *, R. Memisevic, P. Vincent, A. Courville, and Y. Bengio, Combining Modality Specific Deep Neural Networks for Emotion Recognition in Video. In Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI '13) pp. 543--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Le and E. M. Provost. Emotion Recognition from Spontaneous Speech using Hidden Markov Models with Deep Belief Networks, Automatic Speech Recognition and Understanding (ASRU). Olomouc, Czech Republic. December, 2013.Google ScholarGoogle Scholar
  26. Y. Kim, H. Lee, and E. M. Provost, Deep Learning for Robust Feature Generation in Audio-Visual Emotion Recognition, International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Vancouver, British Columbia, Canada. May, 2013.Google ScholarGoogle Scholar
  27. P. Hamel, S. Lemieux, Y. Bengio, and D. Eck, Temporal pooling and multiscale learning for automatic annotation and ranking of music audio. In ISMIR (pp. 729--734), 2011.Google ScholarGoogle Scholar
  28. M. Valstar, K. Smith, F. Eyben, S. Schnieder, and R. Cowie. Avec 2013-the continuous audio/visual emotion and depression recognition challenge. In Proc.3rd ACM international workshop on Audio/visual emotion challenge, pages3--10, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Mathieu, S. Essid, T. Fillon, J. Prado, G. Richard, YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software, proceedings of the 11th ISMIR conference, Utrecht, Netherlands, 2010.Google ScholarGoogle Scholar
  30. X. Xiong and F. De la Torre, Supervised descent method and its applications to face alignment{C}//Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013: 532--539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. Viola and M. Jones, Robust real time object detection {J}, International Journal of Computer Vision (IJCV), 2001.Google ScholarGoogle Scholar
  32. A. Coates, H. Lee, and A. Y. Ng. An Analysis of single-layer networks in unsupervised feature learning. In AISTATS, 2011.Google ScholarGoogle Scholar
  33. Y. Bengio, Deep learning of representations for unsupervised and transfer learning, ICML Unsupervised and Transfer Learning, 2012: 17--36Google ScholarGoogle Scholar
  34. L. Chao, J. Tao and M. Yang, Combining Emotional History Through Multimodal Fusion Methods. Asia Pacific Signal and Information Processing Association (APSIPA 2013), Oct.29-Nov.1 2013 .Google ScholarGoogle Scholar
  35. C. Soladié, H. Salam, C. Pelachaud, N. Stoiber and R. Séguier, A Multimodal Fuzzy Inference System using a Continuous Facial Expression Representation for Emotion Detection, In Proceedings of the 14th ACM international conference on Multimodal interaction, ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. G. E. Hinton, and R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006.Google ScholarGoogle Scholar
  37. S. Dobrisek, R. Gajsek, F. Mihelic, N. Pavesic and V. Struc, Towards efficient multi-modal emotion recognition. Int J Adv Robot Syst 10:1--10, 2013.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge
        November 2014
        110 pages
        ISBN:9781450331197
        DOI:10.1145/2661806

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 November 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        AVEC '14 Paper Acceptance Rate8of22submissions,36%Overall Acceptance Rate52of98submissions,53%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader