short-paper

Training deep networks for facial expression recognition with crowd-sourced label distribution

Authors:
Emad Barsoum

Microsoft Research, USA

Microsoft Research, USA
View Profile

,
Cha Zhang

Microsoft Research, USA

Microsoft Research, USA
View Profile

,
Cristian Canton Ferrer

Microsoft Research, USA

Microsoft Research, USA
View Profile

,
Zhengyou Zhang

Microsoft Research, USA

Microsoft Research, USA
View Profile

ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal InteractionOctober 2016Pages 279–283https://doi.org/10.1145/2993148.2993165

Published:31 October 2016Publication History

ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

Pages 279–283

ABSTRACT

Crowd sourcing has become a widely adopted scheme to collect ground truth labels. However, it is a well-known problem that these labels can be very noisy. In this paper, we demonstrate how to learn a deep convolutional neural network (DCNN) from noisy labels, using facial expression recognition as an example. More specifically, we have 10 taggers to label each input image, and compare four different approaches to utilizing the multiple labels: majority voting, multi-label learning, probabilistic label drawing, and cross-entropy loss. We show that the traditional majority voting scheme does not perform as well as the last two approaches that fully leverage the label distribution. An enhanced FER+ data set with multiple labels for each face image will also be shared with the research community.

References

Amazon mechanical turk. https://www.mturk.com, 2016 (accessed April 26, 2016).Google Scholar
Fer+ emotion label. https://github.com/Microsoft/FERPlus, 2016 (accessed September 14, 2016).Google Scholar
V. Ambati. Active Learning and Crowdsourcing for Machine Translation in Low Resource Scenarios. PhD thesis, Pittsburgh, PA, USA, 2012. AAI3528171. Google ScholarDigital Library
A. Batliner, S. Steidl, C. Hacker, and E. Nöth. Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech. User Modeling and User-Adapted Interaction, 18(1):175–206, 2007. Google ScholarDigital Library
A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM international conference on Image and video retrieval, pages 401–408. ACM, 2007. Google ScholarDigital Library
A. Burmania, S. Parthasarathy, and C. Busso. Increasing the reliability of crowdsourcing evaluations using online quality assessment. IEEE Transactions on Affective Computing, PP(99):1–1, 2015.Google Scholar
H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova, and R. Verma. CREMA-D: crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affective Computing, 5(4):377–390, 2014.Google ScholarCross Ref
L. Devillers, L. Vidrascu, and L. Lamel. Challenges in real-life emotion annotation and machine learning based detection. Neural Networks, 18(4):407–422, 2005. Google ScholarDigital Library
C. Eickhoff and A. P. de Vries. Increasing cheat robustness of crowdsourcing tasks. Inf. Retr., 16(2):121–137, 2013. Google ScholarDigital Library
P. Ekman and W. V. Friesen. Constants across cultures in the face and emotion. Journal of personality and social psychology, 17(2):124, 1971.Google Scholar
P. Ekman and W. V. Friesen. Facial action coding system. 1977.Google Scholar
I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, et al. Challenges in representation learning: A report on three machine learning contests. In Neural information processing, pages 117–124. Springer, 2013.Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.Google Scholar
S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, ¸ C. Gül¸ cehre, R. Memisevic, P. Vincent, A. Courville, Y. Bengio, R. C. Ferrari, et al. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 15th ACM on International conference on multimodal interaction, pages 543–550. ACM, 2013. Google ScholarDigital Library
A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In M. Czerwinski, A. M. Lund, and D. S. Tan, editors, Proceedings of the 2008 Conference on Human Factors in Computing Systems, CHI 2008, 2008, Florence, Italy, April 5-10, 2008, pages 453–456. ACM, 2008. Google ScholarDigital Library
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1106–1114, 2012. Google ScholarDigital Library
M. Liu, S. Li, S. Shan, R. Wang, and X. Chen. Deeply learning deformable facial action parts model for dynamic expression analysis. In Computer Vision–ACCV 2014, pages 143–157. Springer, 2014.Google Scholar
P. Liu, S. Han, Z. Meng, and Y. Tong. Facial expression recognition via a boosted deep belief network. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1805–1812. IEEE, 2014. Google ScholarDigital Library
E. Mower, A. Metallinou, C.-C. Lee, A. Kazemzadeh, C. Busso, S. Lee, and S. Narayanan. Interpreting ambiguous emotional expressions. In Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on, pages 1–8. IEEE, 2009.Google Scholar
V. Ojansivu and J. Heikkilä. Blur insensitive texture classification using local phase quantization. In Image and signal processing, pages 236–243. Springer, 2008. Google ScholarDigital Library
R. Rosenthal. Conducting judgment studies: Some methodological issues. The new handbook of methods in nonverbal behavior research, pages 199–234, 2005.Google Scholar
N. Sadoughi, Y. Liu, and C. Busso. Speech-driven animation constrained by appropriate discourse functions. In A. A. Salah, J. F. Cohn, B. W. Schuller, O. Aran, L. Morency, and P. R. Cohen, editors, Proceedings of the 16th International Conference on Multimodal Interaction, ICMI 2014, Istanbul, Turkey, November 12-16, 2014, pages 148–155. ACM, 2014. Google ScholarDigital Library
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.Google Scholar
R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 254–263, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics. Google ScholarDigital Library
T. Sobol-Shikler and P. Robinson. Classification of complex information: Inference of co-occurring affective states from their expressions in speech. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(7):1284–1297, 2010. Google ScholarDigital Library
M. Soleymani and M. Larson. Crowdsourcing for affective annotation of video: Development of a viewer-reported boredom corpus. 2010.Google Scholar
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.Google ScholarCross Ref
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 1–9, 2015.Google ScholarCross Ref
Y. Tang. Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239, 2013.Google Scholar
Y.-l. Tian, T. Kanade, and J. F. Cohn. Recognizing action units for facial expression analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(2):97–115, 2001. Google ScholarDigital Library
K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas. Multi-label classification of music into emotions. In ISMIR, volume 8, pages 325–330, 2008.Google Scholar
L. Xie, J. Wang, Z. Wei, M. Wang, and Q. Tian. Disturblabel: Regularizing cnn on the loss layer. arXiv preprint arXiv:1605.00055, 2016.Google Scholar
Z. Yu and C. Zhang. Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI ’15, pages 435–442, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
Z. Zhang. Feature-based facial expression recognition: Sensitivity analysis and experiments with a multi-layer perceptron. International Journal of Pattern Recognition and Artificial Intelligence, 13(6):893–911, 1999.Google ScholarCross Ref
G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(6):915–928, 2007. Google ScholarDigital Library
Y. Zhou, H. Xue, and X. Geng. Emotion distribution recognition from facial expressions. In Proceedings of the 23rd ACM International Conference on Multimedia, MM ’15, pages 1247–1250, New York, NY, USA, 2015. ACM. Google ScholarDigital Library

Index Terms

Training deep networks for facial expression recognition with crowd-sourced label distribution
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Self-Paced Label Distribution Learning for In-The-Wild Facial Expression Recognition
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Label distribution learning (LDL) has achieved great progress in facial expression recognition (FER), where the generating label distribution is a key procedure for LDL-based FER. However, many existing researches have shown the common problem with ...
Read More
Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning
Abstract
Comprehending different categories of facial expressions plays a great role in the design of computational model analyzing human perceived and affective state. Authoritative studies have revealed that facial expressions in human daily life are ...
Read More
Facial expression recognition boosted by soft label with a diverse ensemble
Highlights
- Constructed soft labels describe the natural correlation among expressions.
- ...
Abstract
Facial expression recognition (FER) has recently attracted increasing attention with its growing applications in human-computer interaction and other fields. But a well-performing convolutional neural network (CNN) model learned using ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction
October 2016
605 pages
ISBN:9781450345569
DOI:10.1145/2993148
General Chairs:
Yukiko I. Nakano
Seikei University, Japan
,
Elisabeth André
Augsburg University, Germany
,
Toyoaki Nishida
Kyoto University, Japan
,
Program Chairs:
Louis-Philippe Morency
Carnegie Mellon University, USA
,
Carlos Busso
University of Texas at Dallas, USA
,
Catherine Pelachaud
ISIR, France / University of Paris6, France
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Annotation
Convolutional Neural Network
Crowd sourcing
Emotion recognition
Facial Expression Recognition
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 423
  Total Citations
  View Citations
- 2,009
  Total Downloads
- Downloads (Last 12 months)136
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Training deep networks for facial expression recognition with crowd-sourced label distribution

ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Self-Paced Label Distribution Learning for In-The-Wild Facial Expression Recognition

Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning

Facial expression recognition boosted by soft label with a diverse ensemble