skip to main content
research-article

Gaze awareness in conversational agents: Estimating a user's conversational engagement from eye gaze

Published:05 August 2013Publication History
Skip Abstract Section

Abstract

In face-to-face conversations, speakers are continuously checking whether the listener is engaged in the conversation, and they change their conversational strategy if the listener is not fully engaged. With the goal of building a conversational agent that can adaptively control conversations, in this study we analyze listener gaze behaviors and develop a method for estimating whether a listener is engaged in the conversation on the basis of these behaviors. First, we conduct a Wizard-of-Oz study to collect information on a user's gaze behaviors. We then investigate how conversational disengagement, as annotated by human judges, correlates with gaze transition, mutual gaze (eye contact) occurrence, gaze duration, and eye movement distance. On the basis of the results of these analyses, we identify useful information for estimating a user's disengagement and establish an engagement estimation method using a decision tree technique. The results of these analyses show that a model using the features of gaze transition, mutual gaze occurrence, gaze duration, and eye movement distance provides the best performance and can estimate the user's conversational engagement accurately. The estimation model is then implemented as a real-time disengagement judgment mechanism and incorporated into a multimodal dialog manager in an animated conversational agent. This agent is designed to estimate the user's conversational engagement and generate probing questions when the user is distracted from the conversation. Finally, we evaluate the engagement-sensitive agent and find that asking probing questions at the proper times has the expected effects on the user's verbal/nonverbal behaviors during communication with the agent. We also find that our agent system improves the user's impression of the agent in terms of its engagement awareness, behavior appropriateness, conversation smoothness, favorability, and intelligence.

Skip Supplemental Material Section

Supplemental Material

References

  1. Anderson, A. H., Bard, E., Sotillo, C., Doherty-Sneddon, G., and Newlands, A. 1997. The effects of face-to-face communication on the intelligibility of speech. Percept. Psychophys. 59, 580--592.Google ScholarGoogle ScholarCross RefCross Ref
  2. Argyle, M., Ingham, R., Alkema, F., and Mccallin, M. 1973. The different functions of gaze. Semiotica 7, 19--32.Google ScholarGoogle ScholarCross RefCross Ref
  3. Argyle, M., and Cook, M. 1976. Gaze and Mutual Gaze. Cambridge University Press.Google ScholarGoogle Scholar
  4. Argyle, M., and Graham, J. A. 1977. The central europe experiment - Looking at persons and looking at thing. J. Environ. Psychol. Nonverbal Behav. 1, 6--16.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bohus, D., and Horvitz, E. 2009. Learning to predict engagement with a spoken dialog system in open-world settings. In Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bouckaert, R. R., Frank, E., Hall, M. A., Holmes, G., Phahringer. B., Reutemann, P., and Witten, I. H. 2010. WEKA-Experiences with a java open-source project. J. Mach. Learn. Res. 11, 2533--2541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Duncan, S. 1974. On the structure of speaker-auditor interaction during speaking turns. Lang. Soc. 3, 161--180.Google ScholarGoogle ScholarCross RefCross Ref
  8. Eichner, T., Prendinger, H., Andre, E., and Ishizuka, M. 2007. Attentive presentation agents. In Proceedings of the 7th International Conference on Intelligent Virtual Agents (IVA'07). 283--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gratch, J., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., Van Der Werf, R. J., and Morency, L.-P. 2006. Virtual rapport. In Proceedings of the 6th International Conference on Intelligent Virtual Agents (IVA'06). Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Haptek, Ink. 2013. http://www.haptek.com/.Google ScholarGoogle Scholar
  11. Hess, E. H. 1965. Attitude and pupil size. Sci. Amer. 212, 46--54.Google ScholarGoogle Scholar
  12. Iqbal, S. T., Adamczyk, P. D., Zheng, X. S., and Bailey B. P. 2005. Towards an index of opportunity: Understanding changes in mental workload during task execution. In Proceedings of the Conference on Human-Factors in Computing Systems (CHI'05). ACM Press, New York, 311--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Iqbal, S. T., Zheng, X. S., and Bailey, B. P. 2004. Task-evoked pupillary response to mental workload in human-computer interaction. In Proceedings of the Conference on Human Factors in Computing Systems (CHI'04). ACM Press, New York, 1477--1480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Julius-4.0.2. 2013. http://julius.sourceforge.jp/forum/viewtopic.php?f=2&t=52.Google ScholarGoogle Scholar
  15. Kendon, A. 1967. Some functions of gaze direction in social interaction. Acta Psychologica 26, 22--63.Google ScholarGoogle ScholarCross RefCross Ref
  16. Kendon, A. 1990. Spatial organization in social encounters: The f-formation system. In Conducting Interaction: Patterns of Behavior in Focused Encounters. Cambridge University Press, 209--238.Google ScholarGoogle Scholar
  17. Kipp, M. 2001. Anvil - A generic annotation tool for multimodal dialogue. In Proceedings of the 7th European Conference on Speech Communication and Technology. 1367--1370.Google ScholarGoogle Scholar
  18. Larsson, S., Bos, J., Ljunglof, P., and Traum, D. 1999. TrindiKit 1.0 (manual). http://www.ling.gu.se/projekt/trindi//.Google ScholarGoogle Scholar
  19. Matheson, C., Poesio, M., and Traum, D. 2000. Modelling grounding and discourse obligations using update rules. In Proceedings of the 1st Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL'00). 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Midiki. 2013. http://midiki.sourceforge.net/.Google ScholarGoogle Scholar
  21. Morency, L.-P., De Kok, I., and Gratch, J. 2008. Predicting listener backchannels: A probabilistic multimodal approach. In Proceedings of the 8th International Conference Intelligent Virtual Agents. Springer, 176--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Morency, L.-P., Sindner, C., Lee, C., and Darrell, T. 2007. Head gestures for perceptual interfaces: The role of context in improving recognition. Artif. Intell. 171, 8--9, 568--585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nakano, Y. I., Reinstein, G., Stocky, T., and Cassell, J. 2003. Towards a model of face-to-face grounding. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL'03). 553--561. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nakano, Y. I., Okamoto, M., Kawahara, D., Li, Q., and Nishida, T. 2004. Converting text into agent animations: Assigning gestures to text. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04). 153--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Nakano, Y. I. and Nishida, T. 2007. Attentional behaviors as nonverbal communicative signals in situated interactions with conversational agents. In Engineering Approaches to Conversational Informatics, T. Nishida, Ed., John Wiley and Sons, Ltd.Google ScholarGoogle Scholar
  26. Novick, D. G., Hansen, B., and Ward, K. 1996. Coordinating turn-taking with gaze. In Proceedings of the 4th International Conference on Spoken Language (ICSLP'96). 1888--1891.Google ScholarGoogle Scholar
  27. Pelachaud, C. and Bilvi, M. 2009. Modelling gaze behavior for conversational agents. In Proceedings of the International Working Conference on Intelligent Virtual Agents (IVA'09).Google ScholarGoogle Scholar
  28. Peters, C. 2005. Direction of attention perception for conversation initiation in virtual environments. In Proceedings of the 5th International Working Conference on Intelligent Virtual Agents (IVA'05). 215--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Prasov, Z., and Chai, J. Y. 2008. What's in a gaze? The role of eye-gaze in reference resolution in multimodal conversational interfaces. In Proceedings of the 13th International Conference on Intelligent User Interfaces (IUI'08). 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Qvarfordt, P. and Zhai, S. 2005. Conversing with the user based on eye-gaze patterns. In Proceedings of the Conference on Human-Factors in Computing Systems (CHI'05). 221--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rich, C., Ponsler, B., Holroyd, A., and Sidner, C. L. 2010. Recognizing engagement in human-robot interaction. In Proceedings of the ACM/IEEE International Conference on Human Robot Interaction (HRI'10). 375--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sidner, C. L. and Lee, C. 2003. An architecture for engagement in collaborative conversations between a robot and humans. MERL Tech. rep. TR2003-12. http://www.merl.com/papers/docs/TR2003-12.pdf.Google ScholarGoogle Scholar
  34. Sidner, C. L., Lee, C., Kidd, C. D., Lesh, N., and Rich, C. 2005. Explorations in engagement for humans and robots. Artif. Intell. 166, 1--2, 140--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tobii X-120. 2013. http://www.tobii.com/en/eye-tracking-research/global/products/hardware/tobiix60×120-eye-tracker/.Google ScholarGoogle Scholar
  36. Voice Sommelier Neo. 2013. http://hitachi-business.com/products/package/sound/voice/index.html.Google ScholarGoogle Scholar

Index Terms

  1. Gaze awareness in conversational agents: Estimating a user's conversational engagement from eye gaze

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Interactive Intelligent Systems
      ACM Transactions on Interactive Intelligent Systems  Volume 3, Issue 2
      Special issue on interaction with smart objects, Special section on eye gaze and conversation
      July 2013
      150 pages
      ISSN:2160-6455
      EISSN:2160-6463
      DOI:10.1145/2499474
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 August 2013
      • Accepted: 1 September 2012
      • Revised: 1 July 2012
      • Received: 1 December 2010
      Published in tiis Volume 3, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader