Abstract
In face-to-face conversations, speakers are continuously checking whether the listener is engaged in the conversation, and they change their conversational strategy if the listener is not fully engaged. With the goal of building a conversational agent that can adaptively control conversations, in this study we analyze listener gaze behaviors and develop a method for estimating whether a listener is engaged in the conversation on the basis of these behaviors. First, we conduct a Wizard-of-Oz study to collect information on a user's gaze behaviors. We then investigate how conversational disengagement, as annotated by human judges, correlates with gaze transition, mutual gaze (eye contact) occurrence, gaze duration, and eye movement distance. On the basis of the results of these analyses, we identify useful information for estimating a user's disengagement and establish an engagement estimation method using a decision tree technique. The results of these analyses show that a model using the features of gaze transition, mutual gaze occurrence, gaze duration, and eye movement distance provides the best performance and can estimate the user's conversational engagement accurately. The estimation model is then implemented as a real-time disengagement judgment mechanism and incorporated into a multimodal dialog manager in an animated conversational agent. This agent is designed to estimate the user's conversational engagement and generate probing questions when the user is distracted from the conversation. Finally, we evaluate the engagement-sensitive agent and find that asking probing questions at the proper times has the expected effects on the user's verbal/nonverbal behaviors during communication with the agent. We also find that our agent system improves the user's impression of the agent in terms of its engagement awareness, behavior appropriateness, conversation smoothness, favorability, and intelligence.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Gaze awareness in conversational agents: Estimating a user's conversational engagement from eye gaze
- Anderson, A. H., Bard, E., Sotillo, C., Doherty-Sneddon, G., and Newlands, A. 1997. The effects of face-to-face communication on the intelligibility of speech. Percept. Psychophys. 59, 580--592.Google ScholarCross Ref
- Argyle, M., Ingham, R., Alkema, F., and Mccallin, M. 1973. The different functions of gaze. Semiotica 7, 19--32.Google ScholarCross Ref
- Argyle, M., and Cook, M. 1976. Gaze and Mutual Gaze. Cambridge University Press.Google Scholar
- Argyle, M., and Graham, J. A. 1977. The central europe experiment - Looking at persons and looking at thing. J. Environ. Psychol. Nonverbal Behav. 1, 6--16.Google ScholarCross Ref
- Bohus, D., and Horvitz, E. 2009. Learning to predict engagement with a spoken dialog system in open-world settings. In Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial'09). Google ScholarDigital Library
- Bouckaert, R. R., Frank, E., Hall, M. A., Holmes, G., Phahringer. B., Reutemann, P., and Witten, I. H. 2010. WEKA-Experiences with a java open-source project. J. Mach. Learn. Res. 11, 2533--2541. Google ScholarDigital Library
- Duncan, S. 1974. On the structure of speaker-auditor interaction during speaking turns. Lang. Soc. 3, 161--180.Google ScholarCross Ref
- Eichner, T., Prendinger, H., Andre, E., and Ishizuka, M. 2007. Attentive presentation agents. In Proceedings of the 7th International Conference on Intelligent Virtual Agents (IVA'07). 283--295. Google ScholarDigital Library
- Gratch, J., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., Van Der Werf, R. J., and Morency, L.-P. 2006. Virtual rapport. In Proceedings of the 6th International Conference on Intelligent Virtual Agents (IVA'06). Springer. Google ScholarDigital Library
- Haptek, Ink. 2013. http://www.haptek.com/.Google Scholar
- Hess, E. H. 1965. Attitude and pupil size. Sci. Amer. 212, 46--54.Google Scholar
- Iqbal, S. T., Adamczyk, P. D., Zheng, X. S., and Bailey B. P. 2005. Towards an index of opportunity: Understanding changes in mental workload during task execution. In Proceedings of the Conference on Human-Factors in Computing Systems (CHI'05). ACM Press, New York, 311--320. Google ScholarDigital Library
- Iqbal, S. T., Zheng, X. S., and Bailey, B. P. 2004. Task-evoked pupillary response to mental workload in human-computer interaction. In Proceedings of the Conference on Human Factors in Computing Systems (CHI'04). ACM Press, New York, 1477--1480. Google ScholarDigital Library
- Julius-4.0.2. 2013. http://julius.sourceforge.jp/forum/viewtopic.php?f=2&t=52.Google Scholar
- Kendon, A. 1967. Some functions of gaze direction in social interaction. Acta Psychologica 26, 22--63.Google ScholarCross Ref
- Kendon, A. 1990. Spatial organization in social encounters: The f-formation system. In Conducting Interaction: Patterns of Behavior in Focused Encounters. Cambridge University Press, 209--238.Google Scholar
- Kipp, M. 2001. Anvil - A generic annotation tool for multimodal dialogue. In Proceedings of the 7th European Conference on Speech Communication and Technology. 1367--1370.Google Scholar
- Larsson, S., Bos, J., Ljunglof, P., and Traum, D. 1999. TrindiKit 1.0 (manual). http://www.ling.gu.se/projekt/trindi//.Google Scholar
- Matheson, C., Poesio, M., and Traum, D. 2000. Modelling grounding and discourse obligations using update rules. In Proceedings of the 1st Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL'00). 1--8. Google ScholarDigital Library
- Midiki. 2013. http://midiki.sourceforge.net/.Google Scholar
- Morency, L.-P., De Kok, I., and Gratch, J. 2008. Predicting listener backchannels: A probabilistic multimodal approach. In Proceedings of the 8th International Conference Intelligent Virtual Agents. Springer, 176--190. Google ScholarDigital Library
- Morency, L.-P., Sindner, C., Lee, C., and Darrell, T. 2007. Head gestures for perceptual interfaces: The role of context in improving recognition. Artif. Intell. 171, 8--9, 568--585. Google ScholarDigital Library
- Nakano, Y. I., Reinstein, G., Stocky, T., and Cassell, J. 2003. Towards a model of face-to-face grounding. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL'03). 553--561. Google ScholarDigital Library
- Nakano, Y. I., Okamoto, M., Kawahara, D., Li, Q., and Nishida, T. 2004. Converting text into agent animations: Assigning gestures to text. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04). 153--156. Google ScholarDigital Library
- Nakano, Y. I. and Nishida, T. 2007. Attentional behaviors as nonverbal communicative signals in situated interactions with conversational agents. In Engineering Approaches to Conversational Informatics, T. Nishida, Ed., John Wiley and Sons, Ltd.Google Scholar
- Novick, D. G., Hansen, B., and Ward, K. 1996. Coordinating turn-taking with gaze. In Proceedings of the 4th International Conference on Spoken Language (ICSLP'96). 1888--1891.Google Scholar
- Pelachaud, C. and Bilvi, M. 2009. Modelling gaze behavior for conversational agents. In Proceedings of the International Working Conference on Intelligent Virtual Agents (IVA'09).Google Scholar
- Peters, C. 2005. Direction of attention perception for conversation initiation in virtual environments. In Proceedings of the 5th International Working Conference on Intelligent Virtual Agents (IVA'05). 215--228. Google ScholarDigital Library
- Prasov, Z., and Chai, J. Y. 2008. What's in a gaze? The role of eye-gaze in reference resolution in multimodal conversational interfaces. In Proceedings of the 13th International Conference on Intelligent User Interfaces (IUI'08). 20--29. Google ScholarDigital Library
- Qvarfordt, P. and Zhai, S. 2005. Conversing with the user based on eye-gaze patterns. In Proceedings of the Conference on Human-Factors in Computing Systems (CHI'05). 221--230. Google ScholarDigital Library
- Rich, C., Ponsler, B., Holroyd, A., and Sidner, C. L. 2010. Recognizing engagement in human-robot interaction. In Proceedings of the ACM/IEEE International Conference on Human Robot Interaction (HRI'10). 375--382. Google ScholarDigital Library
- Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers. Google ScholarDigital Library
- Sidner, C. L. and Lee, C. 2003. An architecture for engagement in collaborative conversations between a robot and humans. MERL Tech. rep. TR2003-12. http://www.merl.com/papers/docs/TR2003-12.pdf.Google Scholar
- Sidner, C. L., Lee, C., Kidd, C. D., Lesh, N., and Rich, C. 2005. Explorations in engagement for humans and robots. Artif. Intell. 166, 1--2, 140--164. Google ScholarDigital Library
- Tobii X-120. 2013. http://www.tobii.com/en/eye-tracking-research/global/products/hardware/tobiix60×120-eye-tracker/.Google Scholar
- Voice Sommelier Neo. 2013. http://hitachi-business.com/products/package/sound/voice/index.html.Google Scholar
Index Terms
- Gaze awareness in conversational agents: Estimating a user's conversational engagement from eye gaze
Recommendations
An empirical study of eye-gaze behaviors: towards the estimation of conversational engagement in human-agent communication
EGIHMI '10: Proceedings of the 2010 workshop on Eye gaze in intelligent human machine interactionIn face-to-face conversations, speakers are continuously checking whether the listener is engaged in the conversation by monitoring the partner's eye-gaze behaviors. In this study, focusing on eye-gaze as information of estimating user's conversational ...
Estimating user's engagement from eye-gaze behaviors in human-agent conversations
IUI '10: Proceedings of the 15th international conference on Intelligent user interfacesIn face-to-face conversations, speakers are continuously checking whether the listener is engaged in the conversation and change the conversational strategy if the listener is not fully engaged in the conversation. With the goal of building a ...
Natural Language, Mixed-initiative Personal Assistant Agents
IMCOM '18: Proceedings of the 12th International Conference on Ubiquitous Information Management and CommunicationThe increasing popularity and use of personal voice assistant technologies, such as Siri and Google Now, is driving and expanding progress toward the long-term and lofty goal of using artificial intelligence to build human-computer dialog systems ...
Comments