skip to main content
10.1145/3267851.3267860acmconferencesArticle/Chapter ViewAbstractPublication PagesivaConference Proceedingsconference-collections
research-article

NADiA: Neural Network Driven Virtual Human Conversation Agents

Authors Info & Claims
Published:05 November 2018Publication History

ABSTRACT

Advances in artificial intelligence and in particular machine learning and neural networks have given rise to a new generation of virtual assistants and chatbots. Within this work, we present NADiA - Neurally Animated Dialog Agent - that leverages both the user's verbal input as well as their facial expressions to respond in a meaningful way. NADiA combines a neural language model that generates appropriate responses to user prompts, a convolutional neural network for facial expression analysis, and virtual human technology that is deployed on a mobile phone. Here, we evaluate NADiA's anthropomorphic characteristics and its ability to understand the human interlocutor using both subjective as well as objective measures. We find that NADiA significantly outperforms state of the art chatbot technology and produces comparable behavior to human generated reference outputs.

References

  1. Tadas Baltrušaitis, Peter Robinson, and Louis-Philippe Morency. 2016. Openface: an open source facial behavior analysis toolkit. In Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  2. Christoph Bartneck, Elizabeth Croft, and Dana Kulic. 2008. Measuring the anthropomorphism, animacy, likeability, perceived intelligence and perceived safety of robots. In Metrics for HRI workshop, technical report, Vol. 471. 37--44.Google ScholarGoogle Scholar
  3. Elisabetta Bevacqua, Sylwia Julia Hyniewska, and Catherine Pelachaud. 2010. Evaluation of a virtual listenerâĂŹs smiling behavior. In Proceedings of the 23rd International Conference on Computer Animation and Social Agents, Saint-Malo, France.Google ScholarGoogle Scholar
  4. Timothy W Bickmore and Rosalind W Picard. 2005. Establishing and maintaining long-term human-computer relationships. ACM Transactions on Computer-Human Interaction (TOCHI) 12, 2 (2005), 293--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michael Buhrmester, Tracy Kwang, and Samuel D Gosling. 2011. Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on psychological science 6, 1 (2011), 3--5.Google ScholarGoogle Scholar
  6. Justine Cassell. 2000. Embodied conversational agents. MIT press.Google ScholarGoogle Scholar
  7. Boxing Chen and Colin Cherry. 2014. A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU.. In WMT@ ACL. 362--367.Google ScholarGoogle Scholar
  8. Mathieu Chollet, Magalie Ochs, Chloé Clavel, and Catherine Pelachaud. 2013. A multimodal corpus approach to the design of virtual recruiters. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, 19--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Christopher Cieri, David Miller, and Kevin Walker. 2004. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text.. In LREC, Vol. 4. 69--71.Google ScholarGoogle Scholar
  10. David DeVault, Ron Artstein, Grace Benn, Teresa Dey, Ed Fast, Alesia Gainer, Kallirroi Georgila, Jon Gratch, Arno Hartholt, Margaux Lhommet, et al. 2014. SimSensei Kiosk: A virtual human interviewer for healthcare decision support. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 1061--1068. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hamdi Dibeklioğlu, Albert Ali Salah, and Theo Gevers. 2012. Are you really smiling at me? Spontaneous versus posed enjoyment smiles. In European Conference on Computer Vision. Springer, 525--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hao Fang, Hao Cheng, Elizabeth Clark, Ariel Holtzman, Maarten Sap, Mari Ostendorf, Yejin Choi, and Noah A Smith. 2017. Sounding Board--University of WashingtonâĂŹs Alexa Prize Submission. Alexa Prize Proceedings (2017).Google ScholarGoogle Scholar
  13. Will Feng, Anitha Kannan, Georgia Gkioxari, and C Lawrence Zitnick. {n. d.}. Learn2Smile: Learning Non-Verbal Interaction Through Observation. ({n. d.}).Google ScholarGoogle Scholar
  14. Sayan Ghosh, Mathieu Chollet, Eugene Laksana, Louis-Philippe Morency, and Stefan Scherer. 2017. Affect-LM: A Neural Language Model for Customizable Affective Text Generation. arXiv preprint arXiv:1704.06851 (2017).Google ScholarGoogle Scholar
  15. Sayan Ghosh, Eugene Laksana, Stefan Scherer, and Louis-Philippe Morency. 2015. A multi-label convolutional neural network approach to cross-domain action unit detection. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on. IEEE, 609--615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Shizhong Han, Zibo Meng, Ahmed-Shehab Khan, and Yan Tong. 2016. Incremental Boosting Convolutional Neural Network for Facial Action Unit Recognition. In Advances in Neural Information Processing Systems. 109--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Marcel Heerink, Ben Krose, Vanessa Evers, and Bob Wielinga. 2009. Measuring acceptance of an assistive social robot: a suggested toolkit. In Robot and Human Interactive Communication, 2009. RO-MAN 2009. The 18th IEEE International Symposium on. IEEE, 528--533.Google ScholarGoogle ScholarCross RefCross Ref
  18. Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755--1758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Christine Lisetti, Reza Amini, Ugan Yasavur, and Naphtali Rishe. 2013. I can help you change! an empathic virtual agent delivers behavior change health interventions. ACM Transactions on Management Information Systems (TMIS) 4, 4 (2013), 19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Steven R Livingstone, Katlyn Peck, and Frank A Russo. 2012. Ravdess: The ryerson audio-visual database of emotional speech and song. In Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science.Google ScholarGoogle Scholar
  21. Stacy Marsella, Yuyu Xu, Margaux Lhommet, Andrew Feng, Stefan Scherer, and Ari Shapiro. 2013. Virtual character performance from speech. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation. ACM, 25--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Michael L. Mauldin. 1994. Chatterbots, Tinymuds, and the Turing Test Entering the Loebner Prize Competition. In Proceedings of the Twelfth AAAI National Conference on Artificial Intelligence. 16--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ari Shapiro. 2011. Building a character animation system. Motion in Games (2011), 98--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Catherine J Stevens, Bronwyn Pinchbeck, Trent Lewis, Martin Luerssen, Darius Pfitzner, David MW Powers, Arman Abrahamyan, Yvonne Leung, and Guillaume Gibert. 2016. Mimicry and expressiveness of an ECA in human-agent interaction: familiarity breeds content! Computational cognitive science 2, 1 (2016), 1.Google ScholarGoogle Scholar
  26. Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (2012), 26--31.Google ScholarGoogle Scholar
  27. Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869 (2015).Google ScholarGoogle Scholar
  28. Richard S Wallace. 2009. The anatomy of ALICE. In Parsing the Turing Test. Springer, 181--210.Google ScholarGoogle Scholar
  1. NADiA: Neural Network Driven Virtual Human Conversation Agents

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      IVA '18: Proceedings of the 18th International Conference on Intelligent Virtual Agents
      November 2018
      381 pages
      ISBN:9781450360135
      DOI:10.1145/3267851

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 November 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      IVA '18 Paper Acceptance Rate17of82submissions,21%Overall Acceptance Rate53of196submissions,27%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader