skip to main content
10.1145/3240508.3240608acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Facial Expression Recognition Enhanced by Thermal Images through Adversarial Learning

Published: 15 October 2018 Publication History

Abstract

Currently, fusing visible and thermal images for facial expression recognition requires two modalities during both training and testing. Visible cameras are commonly used in real-life applications, and thermal cameras are typically only available in lab situations due to their high price. Thermal imaging for facial expression recognition is not frequently used in real-world situations. To address this, we propose a novel thermally enhanced facial expression recognition method which uses thermal images as privileged information to construct better visible feature representation and improved classifiers by incorporating adversarial learning and similarity constraints during training. Specifically, we train two deep neural networks from visible images and thermal images. We impose adversarial loss to enforce statistical similarity between the learned representations of two modalities, and a similarity constraint to regulate the mapping functions from visible and thermal representation to expressions. Thus, thermal images are leveraged to simultaneously improve visible feature representation and classification during training. To mimic real-world scenarios, only visible images are available during testing. We further extend the proposed expression recognition method for partially unpaired data to explore thermal images' supplementary role in visible facial expression recognition when visible images and thermal images are not synchronously recorded. Experimental results on the MAHNOB Laughter database demonstrate that our proposed method can effectively regularize visible representation and expression classifiers with the help of thermal images, achieving state-of-the-art recognition performance.

References

[1]
Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In International Conference on Machine Learning. 1247--1255.
[2]
Vinay Bettadapura. 2012. Face expression recognition and analysis: the state of the art. arXiv preprint arXiv:1203.6722 (2012).
[3]
Jason Farquhar, David Hardoon, Hongying Meng, John S Shawetaylor, and Sandor Szedmak. 2006. Two view learning: SVM-2K, theory and practice. In Advances in neural information processing systems. 355--362.
[4]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.
[5]
Li He, Xing Xu, Huimin Lu, Yang Yang, Fumin Shen, and Heng Tao Shen. 2017. Unsupervised cross-modal retrieval through adversarial learning. In Multimedia and Expo (ICME), 2017 IEEE International Conference on. IEEE, 1153--1158.
[6]
Christian Ledig, Lucas Theis, Ferenc Husz´ar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2016. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint (2016).
[7]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov (2008), 2579--2605.
[8]
Stavros Petridis, Brais Martinez, and Maja Pantic. 2013. The MAHNOB laughter database. Image and Vision Computing 31, 2 (2013), 186--202.
[9]
Stavros Petridis, Varun Rajgarhia, and Maja Pantic. 2015. Comparison of Single-model and Multiple-model Prediction-based Audiovisual Fusion. In The Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing. 457--462.
[10]
Ognjen Rudovic, Stavros Petridis, and Maja Pantic. 2013. Bimodal log-linear regression for fusion of audio and visual features. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 789--792.
[11]
Nandita Sharma, Abhinav Dhall, Tom Gedeon, and Roland Goecke. 2013. Modeling stress using thermal facial patterns: A spatio-temporal approach. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, 387--392.
[12]
Xiaoxiao Shi, Shangfei Wang, and Yachen Zhu. 2015. Expression recognition from visible images with the help of thermal images. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, 563--566.
[13]
Nitish Srivastava and Ruslan R Salakhutdinov. 2012. Multimodal learning with deep boltzmann machines. In Advances in neural information processing systems. 2222--2230.
[14]
Vladimir Vapnik and Akshay Vashist. 2009. A new learning paradigm: Learning using privileged information. Neural networks 22, 5 (2009), 544--557.
[15]
Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1. IEEE, I--I.
[16]
Shangfei Wang, Zhilei Liu, Siliang Lv, Yanpeng Lv, Guobing Wu, Peng Peng, Fei Chen, and Xufa Wang. 2010. A natural visible and infrared facial expression database for expression recognition and emotion inference. IEEE Transactions on Multimedia 12, 7 (2010), 682--691.
[17]
Shangfei Wang, Bowen Pan, Huaping Chen, and Qiang Ji. 2018. Thermal Augmented Expression Recognition. IEEE Transactions on Cybernetics (2018).
[18]
Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. 2015. On deep multi-view representation learning. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 1083--1092.
[19]
Avinash Wesley, Pradeep Buddharaju, Robert Pienta, and Ioannis Pavlidis. 2012. A comparative analysis of thermal and visual modalities for automated facial expression recognition. In International Symposium on Visual Computing. Springer, 51--60.
[20]
Yasunari Yoshitomi, Sung-Ill Kim, Takako Kawano, and Tetsuro Kilazoe. 2000. Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face. In Robot and Human Interactive Communication, 2000. RO-MAN 2000. Proceedings. 9th IEEE International Workshop on. IEEE, 178-- 183.
[21]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, and Dimitris Metaxas. 2017. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In IEEE Int. Conf. Comput. Vision (ICCV). 5907--5915.
[22]
Zheng Zhang, Jeff M Girard, Yue Wu, Xing Zhang, Peng Liu, Umur Ciftci, Shaun Canavan, Michael Reale, Andy Horowitz, Huiyuan Yang, et al. 2016. Multimodal spontaneous emotion corpus for human behavior analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3438-- 3446.

Cited By

View all
  • (2024)A Fourier-Transform-Based Framework With Asymptotic Attention for Mobile Thermal InfraRed Object DetectionIEEE Sensors Journal10.1109/JSEN.2024.339919324:13(21012-21024)Online publication date: 1-Jul-2024
  • (2023)TIRDet: Mono-Modality Thermal InfraRed Object Detection Based on Prior Thermal-To-Visible TranslationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613849(2663-2672)Online publication date: 26-Oct-2023
  • (2021)Unpaired Multimodal Facial Expression RecognitionComputer Vision – ACCV 202010.1007/978-3-030-69541-5_4(54-69)Online publication date: 26-Feb-2021

Index Terms

  1. Facial Expression Recognition Enhanced by Thermal Images through Adversarial Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '18: Proceedings of the 26th ACM international conference on Multimedia
    October 2018
    2167 pages
    ISBN:9781450356657
    DOI:10.1145/3240508
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adversarial learning
    2. facial expression recognition
    3. privileged information

    Qualifiers

    • Research-article

    Funding Sources

    • National Science Foundation of China
    • Anhui Science and Technology Agency

    Conference

    MM '18
    Sponsor:
    MM '18: ACM Multimedia Conference
    October 22 - 26, 2018
    Seoul, Republic of Korea

    Acceptance Rates

    MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Fourier-Transform-Based Framework With Asymptotic Attention for Mobile Thermal InfraRed Object DetectionIEEE Sensors Journal10.1109/JSEN.2024.339919324:13(21012-21024)Online publication date: 1-Jul-2024
    • (2023)TIRDet: Mono-Modality Thermal InfraRed Object Detection Based on Prior Thermal-To-Visible TranslationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613849(2663-2672)Online publication date: 26-Oct-2023
    • (2021)Unpaired Multimodal Facial Expression RecognitionComputer Vision – ACCV 202010.1007/978-3-030-69541-5_4(54-69)Online publication date: 26-Feb-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media