ABSTRACT
Crowd sourcing has become a widely adopted scheme to collect ground truth labels. However, it is a well-known problem that these labels can be very noisy. In this paper, we demonstrate how to learn a deep convolutional neural network (DCNN) from noisy labels, using facial expression recognition as an example. More specifically, we have 10 taggers to label each input image, and compare four different approaches to utilizing the multiple labels: majority voting, multi-label learning, probabilistic label drawing, and cross-entropy loss. We show that the traditional majority voting scheme does not perform as well as the last two approaches that fully leverage the label distribution. An enhanced FER+ data set with multiple labels for each face image will also be shared with the research community.
- Amazon mechanical turk. https://www.mturk.com, 2016 (accessed April 26, 2016).Google Scholar
- Fer+ emotion label. https://github.com/Microsoft/FERPlus, 2016 (accessed September 14, 2016).Google Scholar
- V. Ambati. Active Learning and Crowdsourcing for Machine Translation in Low Resource Scenarios. PhD thesis, Pittsburgh, PA, USA, 2012. AAI3528171. Google ScholarDigital Library
- A. Batliner, S. Steidl, C. Hacker, and E. Nöth. Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech. User Modeling and User-Adapted Interaction, 18(1):175–206, 2007. Google ScholarDigital Library
- A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM international conference on Image and video retrieval, pages 401–408. ACM, 2007. Google ScholarDigital Library
- A. Burmania, S. Parthasarathy, and C. Busso. Increasing the reliability of crowdsourcing evaluations using online quality assessment. IEEE Transactions on Affective Computing, PP(99):1–1, 2015.Google Scholar
- H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova, and R. Verma. CREMA-D: crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affective Computing, 5(4):377–390, 2014.Google ScholarCross Ref
- L. Devillers, L. Vidrascu, and L. Lamel. Challenges in real-life emotion annotation and machine learning based detection. Neural Networks, 18(4):407–422, 2005. Google ScholarDigital Library
- C. Eickhoff and A. P. de Vries. Increasing cheat robustness of crowdsourcing tasks. Inf. Retr., 16(2):121–137, 2013. Google ScholarDigital Library
- P. Ekman and W. V. Friesen. Constants across cultures in the face and emotion. Journal of personality and social psychology, 17(2):124, 1971.Google Scholar
- P. Ekman and W. V. Friesen. Facial action coding system. 1977.Google Scholar
- I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, et al. Challenges in representation learning: A report on three machine learning contests. In Neural information processing, pages 117–124. Springer, 2013.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.Google Scholar
- S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, ¸ C. Gül¸ cehre, R. Memisevic, P. Vincent, A. Courville, Y. Bengio, R. C. Ferrari, et al. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 15th ACM on International conference on multimodal interaction, pages 543–550. ACM, 2013. Google ScholarDigital Library
- A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In M. Czerwinski, A. M. Lund, and D. S. Tan, editors, Proceedings of the 2008 Conference on Human Factors in Computing Systems, CHI 2008, 2008, Florence, Italy, April 5-10, 2008, pages 453–456. ACM, 2008. Google ScholarDigital Library
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1106–1114, 2012. Google ScholarDigital Library
- M. Liu, S. Li, S. Shan, R. Wang, and X. Chen. Deeply learning deformable facial action parts model for dynamic expression analysis. In Computer Vision–ACCV 2014, pages 143–157. Springer, 2014.Google Scholar
- P. Liu, S. Han, Z. Meng, and Y. Tong. Facial expression recognition via a boosted deep belief network. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1805–1812. IEEE, 2014. Google ScholarDigital Library
- E. Mower, A. Metallinou, C.-C. Lee, A. Kazemzadeh, C. Busso, S. Lee, and S. Narayanan. Interpreting ambiguous emotional expressions. In Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on, pages 1–8. IEEE, 2009.Google Scholar
- V. Ojansivu and J. Heikkilä. Blur insensitive texture classification using local phase quantization. In Image and signal processing, pages 236–243. Springer, 2008. Google ScholarDigital Library
- R. Rosenthal. Conducting judgment studies: Some methodological issues. The new handbook of methods in nonverbal behavior research, pages 199–234, 2005.Google Scholar
- N. Sadoughi, Y. Liu, and C. Busso. Speech-driven animation constrained by appropriate discourse functions. In A. A. Salah, J. F. Cohn, B. W. Schuller, O. Aran, L. Morency, and P. R. Cohen, editors, Proceedings of the 16th International Conference on Multimodal Interaction, ICMI 2014, Istanbul, Turkey, November 12-16, 2014, pages 148–155. ACM, 2014. Google ScholarDigital Library
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.Google Scholar
- R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 254–263, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics. Google ScholarDigital Library
- T. Sobol-Shikler and P. Robinson. Classification of complex information: Inference of co-occurring affective states from their expressions in speech. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(7):1284–1297, 2010. Google ScholarDigital Library
- M. Soleymani and M. Larson. Crowdsourcing for affective annotation of video: Development of a viewer-reported boredom corpus. 2010.Google Scholar
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.Google ScholarCross Ref
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 1–9, 2015.Google ScholarCross Ref
- Y. Tang. Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239, 2013.Google Scholar
- Y.-l. Tian, T. Kanade, and J. F. Cohn. Recognizing action units for facial expression analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(2):97–115, 2001. Google ScholarDigital Library
- K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas. Multi-label classification of music into emotions. In ISMIR, volume 8, pages 325–330, 2008.Google Scholar
- L. Xie, J. Wang, Z. Wei, M. Wang, and Q. Tian. Disturblabel: Regularizing cnn on the loss layer. arXiv preprint arXiv:1605.00055, 2016.Google Scholar
- Z. Yu and C. Zhang. Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI ’15, pages 435–442, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- Z. Zhang. Feature-based facial expression recognition: Sensitivity analysis and experiments with a multi-layer perceptron. International Journal of Pattern Recognition and Artificial Intelligence, 13(6):893–911, 1999.Google ScholarCross Ref
- G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(6):915–928, 2007. Google ScholarDigital Library
- Y. Zhou, H. Xue, and X. Geng. Emotion distribution recognition from facial expressions. In Proceedings of the 23rd ACM International Conference on Multimedia, MM ’15, pages 1247–1250, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
Index Terms
- Training deep networks for facial expression recognition with crowd-sourced label distribution
Recommendations
Self-Paced Label Distribution Learning for In-The-Wild Facial Expression Recognition
MM '22: Proceedings of the 30th ACM International Conference on MultimediaLabel distribution learning (LDL) has achieved great progress in facial expression recognition (FER), where the generating label distribution is a key procedure for LDL-based FER. However, many existing researches have shown the common problem with ...
Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning
AbstractComprehending different categories of facial expressions plays a great role in the design of computational model analyzing human perceived and affective state. Authoritative studies have revealed that facial expressions in human daily life are ...
Facial expression recognition boosted by soft label with a diverse ensemble
Highlights- Constructed soft labels describe the natural correlation among expressions.
- ...
AbstractFacial expression recognition (FER) has recently attracted increasing attention with its growing applications in human-computer interaction and other fields. But a well-performing convolutional neural network (CNN) model learned using ...
Comments