ABSTRACT
Advances in artificial intelligence and in particular machine learning and neural networks have given rise to a new generation of virtual assistants and chatbots. Within this work, we present NADiA - Neurally Animated Dialog Agent - that leverages both the user's verbal input as well as their facial expressions to respond in a meaningful way. NADiA combines a neural language model that generates appropriate responses to user prompts, a convolutional neural network for facial expression analysis, and virtual human technology that is deployed on a mobile phone. Here, we evaluate NADiA's anthropomorphic characteristics and its ability to understand the human interlocutor using both subjective as well as objective measures. We find that NADiA significantly outperforms state of the art chatbot technology and produces comparable behavior to human generated reference outputs.
- Tadas Baltrušaitis, Peter Robinson, and Louis-Philippe Morency. 2016. Openface: an open source facial behavior analysis toolkit. In Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 1--10.Google ScholarCross Ref
- Christoph Bartneck, Elizabeth Croft, and Dana Kulic. 2008. Measuring the anthropomorphism, animacy, likeability, perceived intelligence and perceived safety of robots. In Metrics for HRI workshop, technical report, Vol. 471. 37--44.Google Scholar
- Elisabetta Bevacqua, Sylwia Julia Hyniewska, and Catherine Pelachaud. 2010. Evaluation of a virtual listenerâĂŹs smiling behavior. In Proceedings of the 23rd International Conference on Computer Animation and Social Agents, Saint-Malo, France.Google Scholar
- Timothy W Bickmore and Rosalind W Picard. 2005. Establishing and maintaining long-term human-computer relationships. ACM Transactions on Computer-Human Interaction (TOCHI) 12, 2 (2005), 293--327. Google ScholarDigital Library
- Michael Buhrmester, Tracy Kwang, and Samuel D Gosling. 2011. Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on psychological science 6, 1 (2011), 3--5.Google Scholar
- Justine Cassell. 2000. Embodied conversational agents. MIT press.Google Scholar
- Boxing Chen and Colin Cherry. 2014. A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU.. In WMT@ ACL. 362--367.Google Scholar
- Mathieu Chollet, Magalie Ochs, Chloé Clavel, and Catherine Pelachaud. 2013. A multimodal corpus approach to the design of virtual recruiters. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, 19--24. Google ScholarDigital Library
- Christopher Cieri, David Miller, and Kevin Walker. 2004. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text.. In LREC, Vol. 4. 69--71.Google Scholar
- David DeVault, Ron Artstein, Grace Benn, Teresa Dey, Ed Fast, Alesia Gainer, Kallirroi Georgila, Jon Gratch, Arno Hartholt, Margaux Lhommet, et al. 2014. SimSensei Kiosk: A virtual human interviewer for healthcare decision support. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 1061--1068. Google ScholarDigital Library
- Hamdi Dibeklioğlu, Albert Ali Salah, and Theo Gevers. 2012. Are you really smiling at me? Spontaneous versus posed enjoyment smiles. In European Conference on Computer Vision. Springer, 525--538. Google ScholarDigital Library
- Hao Fang, Hao Cheng, Elizabeth Clark, Ariel Holtzman, Maarten Sap, Mari Ostendorf, Yejin Choi, and Noah A Smith. 2017. Sounding Board--University of WashingtonâĂŹs Alexa Prize Submission. Alexa Prize Proceedings (2017).Google Scholar
- Will Feng, Anitha Kannan, Georgia Gkioxari, and C Lawrence Zitnick. {n. d.}. Learn2Smile: Learning Non-Verbal Interaction Through Observation. ({n. d.}).Google Scholar
- Sayan Ghosh, Mathieu Chollet, Eugene Laksana, Louis-Philippe Morency, and Stefan Scherer. 2017. Affect-LM: A Neural Language Model for Customizable Affective Text Generation. arXiv preprint arXiv:1704.06851 (2017).Google Scholar
- Sayan Ghosh, Eugene Laksana, Stefan Scherer, and Louis-Philippe Morency. 2015. A multi-label convolutional neural network approach to cross-domain action unit detection. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on. IEEE, 609--615. Google ScholarDigital Library
- Shizhong Han, Zibo Meng, Ahmed-Shehab Khan, and Yan Tong. 2016. Incremental Boosting Convolutional Neural Network for Facial Action Unit Recognition. In Advances in Neural Information Processing Systems. 109--117. Google ScholarDigital Library
- Marcel Heerink, Ben Krose, Vanessa Evers, and Bob Wielinga. 2009. Measuring acceptance of an assistive social robot: a suggested toolkit. In Robot and Human Interactive Communication, 2009. RO-MAN 2009. The 18th IEEE International Symposium on. IEEE, 528--533.Google ScholarCross Ref
- Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755--1758. Google ScholarDigital Library
- Christine Lisetti, Reza Amini, Ugan Yasavur, and Naphtali Rishe. 2013. I can help you change! an empathic virtual agent delivers behavior change health interventions. ACM Transactions on Management Information Systems (TMIS) 4, 4 (2013), 19. Google ScholarDigital Library
- Steven R Livingstone, Katlyn Peck, and Frank A Russo. 2012. Ravdess: The ryerson audio-visual database of emotional speech and song. In Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science.Google Scholar
- Stacy Marsella, Yuyu Xu, Margaux Lhommet, Andrew Feng, Stefan Scherer, and Ari Shapiro. 2013. Virtual character performance from speech. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation. ACM, 25--35. Google ScholarDigital Library
- Michael L. Mauldin. 1994. Chatterbots, Tinymuds, and the Turing Test Entering the Loebner Prize Competition. In Proceedings of the Twelfth AAAI National Conference on Artificial Intelligence. 16--21. Google ScholarDigital Library
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318. Google ScholarDigital Library
- Ari Shapiro. 2011. Building a character animation system. Motion in Games (2011), 98--109. Google ScholarDigital Library
- Catherine J Stevens, Bronwyn Pinchbeck, Trent Lewis, Martin Luerssen, Darius Pfitzner, David MW Powers, Arman Abrahamyan, Yvonne Leung, and Guillaume Gibert. 2016. Mimicry and expressiveness of an ECA in human-agent interaction: familiarity breeds content! Computational cognitive science 2, 1 (2016), 1.Google Scholar
- Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (2012), 26--31.Google Scholar
- Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869 (2015).Google Scholar
- Richard S Wallace. 2009. The anatomy of ALICE. In Parsing the Turing Test. Springer, 181--210.Google Scholar
NADiA: Neural Network Driven Virtual Human Conversation Agents
Recommendations
NADiA - Towards Neural Network Driven Virtual Human Conversation Agents
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsAdvances in artificial intelligence and machine learning - in particular neural networks - have given rise to a new generation of virtual assistants and chatbots. Within this work, we describe the motivation and architecture of NADiA - Neurally Animated ...
Action-a-Bot: Exploring Human-Chatbot Conversations for Actionable Instruction Giving and Following
CSCW'22 Companion: Companion Publication of the 2022 Conference on Computer Supported Cooperative Work and Social ComputingConversation serves as one critical mechanism for knowledge-sharing and instruction-giving in collaborative work. Conversation allows people to take turns to make contributions, plan joint actions, align shared understanding of work status and resolve ...
Toward RNN Based Micro Non-verbal Behavior Generation for Virtual Listener Agents
Social Computing and Social Media. Design, Human Behavior and AnalyticsAbstractThis work aims to develop a model to generate fine grained and reactive non-verbal idling behaviors of a virtual listener agent when a human user is talking to it. The target micro behaviors are facial expressions, head movements, and postures. ...
Comments