research-article

Facial Expression Recognition Enhanced by Thermal Images through Adversarial Learning

Authors:

Shangfei WangAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 1346 - 1353

https://doi.org/10.1145/3240508.3240608

Published: 15 October 2018 Publication History

Abstract

Currently, fusing visible and thermal images for facial expression recognition requires two modalities during both training and testing. Visible cameras are commonly used in real-life applications, and thermal cameras are typically only available in lab situations due to their high price. Thermal imaging for facial expression recognition is not frequently used in real-world situations. To address this, we propose a novel thermally enhanced facial expression recognition method which uses thermal images as privileged information to construct better visible feature representation and improved classifiers by incorporating adversarial learning and similarity constraints during training. Specifically, we train two deep neural networks from visible images and thermal images. We impose adversarial loss to enforce statistical similarity between the learned representations of two modalities, and a similarity constraint to regulate the mapping functions from visible and thermal representation to expressions. Thus, thermal images are leveraged to simultaneously improve visible feature representation and classification during training. To mimic real-world scenarios, only visible images are available during testing. We further extend the proposed expression recognition method for partially unpaired data to explore thermal images' supplementary role in visible facial expression recognition when visible images and thermal images are not synchronously recorded. Experimental results on the MAHNOB Laughter database demonstrate that our proposed method can effectively regularize visible representation and expression classifiers with the help of thermal images, achieving state-of-the-art recognition performance.

References

[1]

Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In International Conference on Machine Learning. 1247--1255.

Digital Library

[2]

Vinay Bettadapura. 2012. Face expression recognition and analysis: the state of the art. arXiv preprint arXiv:1203.6722 (2012).

[3]

Jason Farquhar, David Hardoon, Hongying Meng, John S Shawetaylor, and Sandor Szedmak. 2006. Two view learning: SVM-2K, theory and practice. In Advances in neural information processing systems. 355--362.

Digital Library

[4]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.

Digital Library

[5]

Li He, Xing Xu, Huimin Lu, Yang Yang, Fumin Shen, and Heng Tao Shen. 2017. Unsupervised cross-modal retrieval through adversarial learning. In Multimedia and Expo (ICME), 2017 IEEE International Conference on. IEEE, 1153--1158.

[6]

Christian Ledig, Lucas Theis, Ferenc Husz´ar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2016. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint (2016).

[7]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov (2008), 2579--2605.

[8]

Stavros Petridis, Brais Martinez, and Maja Pantic. 2013. The MAHNOB laughter database. Image and Vision Computing 31, 2 (2013), 186--202.

Digital Library

[9]

Stavros Petridis, Varun Rajgarhia, and Maja Pantic. 2015. Comparison of Single-model and Multiple-model Prediction-based Audiovisual Fusion. In The Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing. 457--462.

[10]

Ognjen Rudovic, Stavros Petridis, and Maja Pantic. 2013. Bimodal log-linear regression for fusion of audio and visual features. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 789--792.

Digital Library

[11]

Nandita Sharma, Abhinav Dhall, Tom Gedeon, and Roland Goecke. 2013. Modeling stress using thermal facial patterns: A spatio-temporal approach. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, 387--392.

Digital Library

[12]

Xiaoxiao Shi, Shangfei Wang, and Yachen Zhu. 2015. Expression recognition from visible images with the help of thermal images. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, 563--566.

Digital Library

[13]

Nitish Srivastava and Ruslan R Salakhutdinov. 2012. Multimodal learning with deep boltzmann machines. In Advances in neural information processing systems. 2222--2230.

Digital Library

[14]

Vladimir Vapnik and Akshay Vashist. 2009. A new learning paradigm: Learning using privileged information. Neural networks 22, 5 (2009), 544--557.

Digital Library

[15]

Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1. IEEE, I--I.

[16]

Shangfei Wang, Zhilei Liu, Siliang Lv, Yanpeng Lv, Guobing Wu, Peng Peng, Fei Chen, and Xufa Wang. 2010. A natural visible and infrared facial expression database for expression recognition and emotion inference. IEEE Transactions on Multimedia 12, 7 (2010), 682--691.

Digital Library

[17]

Shangfei Wang, Bowen Pan, Huaping Chen, and Qiang Ji. 2018. Thermal Augmented Expression Recognition. IEEE Transactions on Cybernetics (2018).

[18]

Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. 2015. On deep multi-view representation learning. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 1083--1092.

Digital Library

[19]

Avinash Wesley, Pradeep Buddharaju, Robert Pienta, and Ioannis Pavlidis. 2012. A comparative analysis of thermal and visual modalities for automated facial expression recognition. In International Symposium on Visual Computing. Springer, 51--60.

[20]

Yasunari Yoshitomi, Sung-Ill Kim, Takako Kawano, and Tetsuro Kilazoe. 2000. Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face. In Robot and Human Interactive Communication, 2000. RO-MAN 2000. Proceedings. 9th IEEE International Workshop on. IEEE, 178-- 183.

[21]

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, and Dimitris Metaxas. 2017. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In IEEE Int. Conf. Comput. Vision (ICCV). 5907--5915.

[22]

Zheng Zhang, Jeff M Girard, Yue Wu, Xing Zhang, Peng Liu, Umur Ciftci, Shaun Canavan, Michael Reale, Andy Horowitz, Huiyuan Yang, et al. 2016. Multimodal spontaneous emotion corpus for human behavior analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3438-- 3446.

Cited By

Wang ZShen HJiang WHuang K(2024)A Fourier-Transform-Based Framework With Asymptotic Attention for Mobile Thermal InfraRed Object DetectionIEEE Sensors Journal10.1109/JSEN.2024.339919324:13(21012-21024)Online publication date: 1-Jul-2024
https://doi.org/10.1109/JSEN.2024.3399193
Wang ZColonnier FZheng JAcharya JJiang WHuang KEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)TIRDet: Mono-Modality Thermal InfraRed Object Detection Based on Prior Thermal-To-Visible TranslationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613849(2663-2672)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3613849
Xia BWang S(2021)Unpaired Multimodal Facial Expression RecognitionComputer Vision – ACCV 202010.1007/978-3-030-69541-5_4(54-69)Online publication date: 26-Feb-2021
https://doi.org/10.1007/978-3-030-69541-5_4

Index Terms

Facial Expression Recognition Enhanced by Thermal Images through Adversarial Learning
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods

Recommendations

Occluded Facial Expression Recognition Enhanced through Privileged Information
MM '19: Proceedings of the 27th ACM International Conference on Multimedia

In this paper, we propose a novel approach of occluded facial expression recognition under the help of non-occluded facial images. The non-occluded facial images are used as privileged information, which is only required during training, but not ...
Expression-invariant face recognition by facial expression transformations

In this paper, we present a method of expression-invariant face recognition that transforms input face image with an arbitrary expression into its corresponding neutral facial expression image. When a new face image with an arbitrary expression is ...
Pose-robust feature learning for facial expression recognition

Automatic facial expression recognition (FER) from non-frontal views is a challenging research topic which has recently started to attract the attention of the research community. Pose variations are difficult to tackle and many face analysis methods ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation of China
Anhui Science and Technology Agency

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
462
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang ZShen HJiang WHuang K(2024)A Fourier-Transform-Based Framework With Asymptotic Attention for Mobile Thermal InfraRed Object DetectionIEEE Sensors Journal10.1109/JSEN.2024.339919324:13(21012-21024)Online publication date: 1-Jul-2024
https://doi.org/10.1109/JSEN.2024.3399193
Wang ZColonnier FZheng JAcharya JJiang WHuang KEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)TIRDet: Mono-Modality Thermal InfraRed Object Detection Based on Prior Thermal-To-Visible TranslationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613849(2663-2672)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3613849
Xia BWang S(2021)Unpaired Multimodal Facial Expression RecognitionComputer Vision – ACCV 202010.1007/978-3-030-69541-5_4(54-69)Online publication date: 26-Feb-2021
https://doi.org/10.1007/978-3-030-69541-5_4

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten