skip to main content
10.1145/2993148.2993165acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Training deep networks for facial expression recognition with crowd-sourced label distribution

Published:31 October 2016Publication History

ABSTRACT

Crowd sourcing has become a widely adopted scheme to collect ground truth labels. However, it is a well-known problem that these labels can be very noisy. In this paper, we demonstrate how to learn a deep convolutional neural network (DCNN) from noisy labels, using facial expression recognition as an example. More specifically, we have 10 taggers to label each input image, and compare four different approaches to utilizing the multiple labels: majority voting, multi-label learning, probabilistic label drawing, and cross-entropy loss. We show that the traditional majority voting scheme does not perform as well as the last two approaches that fully leverage the label distribution. An enhanced FER+ data set with multiple labels for each face image will also be shared with the research community.

References

  1. Amazon mechanical turk. https://www.mturk.com, 2016 (accessed April 26, 2016).Google ScholarGoogle Scholar
  2. Fer+ emotion label. https://github.com/Microsoft/FERPlus, 2016 (accessed September 14, 2016).Google ScholarGoogle Scholar
  3. V. Ambati. Active Learning and Crowdsourcing for Machine Translation in Low Resource Scenarios. PhD thesis, Pittsburgh, PA, USA, 2012. AAI3528171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Batliner, S. Steidl, C. Hacker, and E. Nöth. Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech. User Modeling and User-Adapted Interaction, 18(1):175–206, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM international conference on Image and video retrieval, pages 401–408. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Burmania, S. Parthasarathy, and C. Busso. Increasing the reliability of crowdsourcing evaluations using online quality assessment. IEEE Transactions on Affective Computing, PP(99):1–1, 2015.Google ScholarGoogle Scholar
  7. H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova, and R. Verma. CREMA-D: crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affective Computing, 5(4):377–390, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  8. L. Devillers, L. Vidrascu, and L. Lamel. Challenges in real-life emotion annotation and machine learning based detection. Neural Networks, 18(4):407–422, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Eickhoff and A. P. de Vries. Increasing cheat robustness of crowdsourcing tasks. Inf. Retr., 16(2):121–137, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Ekman and W. V. Friesen. Constants across cultures in the face and emotion. Journal of personality and social psychology, 17(2):124, 1971.Google ScholarGoogle Scholar
  11. P. Ekman and W. V. Friesen. Facial action coding system. 1977.Google ScholarGoogle Scholar
  12. I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, et al. Challenges in representation learning: A report on three machine learning contests. In Neural information processing, pages 117–124. Springer, 2013.Google ScholarGoogle Scholar
  13. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.Google ScholarGoogle Scholar
  14. S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, ¸ C. Gül¸ cehre, R. Memisevic, P. Vincent, A. Courville, Y. Bengio, R. C. Ferrari, et al. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 15th ACM on International conference on multimodal interaction, pages 543–550. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In M. Czerwinski, A. M. Lund, and D. S. Tan, editors, Proceedings of the 2008 Conference on Human Factors in Computing Systems, CHI 2008, 2008, Florence, Italy, April 5-10, 2008, pages 453–456. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1106–1114, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Liu, S. Li, S. Shan, R. Wang, and X. Chen. Deeply learning deformable facial action parts model for dynamic expression analysis. In Computer Vision–ACCV 2014, pages 143–157. Springer, 2014.Google ScholarGoogle Scholar
  18. P. Liu, S. Han, Z. Meng, and Y. Tong. Facial expression recognition via a boosted deep belief network. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1805–1812. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Mower, A. Metallinou, C.-C. Lee, A. Kazemzadeh, C. Busso, S. Lee, and S. Narayanan. Interpreting ambiguous emotional expressions. In Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on, pages 1–8. IEEE, 2009.Google ScholarGoogle Scholar
  20. V. Ojansivu and J. Heikkilä. Blur insensitive texture classification using local phase quantization. In Image and signal processing, pages 236–243. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Rosenthal. Conducting judgment studies: Some methodological issues. The new handbook of methods in nonverbal behavior research, pages 199–234, 2005.Google ScholarGoogle Scholar
  22. N. Sadoughi, Y. Liu, and C. Busso. Speech-driven animation constrained by appropriate discourse functions. In A. A. Salah, J. F. Cohn, B. W. Schuller, O. Aran, L. Morency, and P. R. Cohen, editors, Proceedings of the 16th International Conference on Multimodal Interaction, ICMI 2014, Istanbul, Turkey, November 12-16, 2014, pages 148–155. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.Google ScholarGoogle Scholar
  24. R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 254–263, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Sobol-Shikler and P. Robinson. Classification of complex information: Inference of co-occurring affective states from their expressions in speech. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(7):1284–1297, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Soleymani and M. Larson. Crowdsourcing for affective annotation of video: Development of a viewer-reported boredom corpus. 2010.Google ScholarGoogle Scholar
  27. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.Google ScholarGoogle ScholarCross RefCross Ref
  28. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 1–9, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  29. Y. Tang. Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239, 2013.Google ScholarGoogle Scholar
  30. Y.-l. Tian, T. Kanade, and J. F. Cohn. Recognizing action units for facial expression analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(2):97–115, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas. Multi-label classification of music into emotions. In ISMIR, volume 8, pages 325–330, 2008.Google ScholarGoogle Scholar
  32. L. Xie, J. Wang, Z. Wei, M. Wang, and Q. Tian. Disturblabel: Regularizing cnn on the loss layer. arXiv preprint arXiv:1605.00055, 2016.Google ScholarGoogle Scholar
  33. Z. Yu and C. Zhang. Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI ’15, pages 435–442, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Z. Zhang. Feature-based facial expression recognition: Sensitivity analysis and experiments with a multi-layer perceptron. International Journal of Pattern Recognition and Artificial Intelligence, 13(6):893–911, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  35. G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(6):915–928, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Zhou, H. Xue, and X. Geng. Emotion distribution recognition from facial expressions. In Proceedings of the 23rd ACM International Conference on Multimedia, MM ’15, pages 1247–1250, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Training deep networks for facial expression recognition with crowd-sourced label distribution

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction
      October 2016
      605 pages
      ISBN:9781450345569
      DOI:10.1145/2993148

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 31 October 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate453of1,080submissions,42%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader