skip to main content
10.1145/2663204.2663277acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

Eye Gaze for Spoken Language Understanding in Multi-modal Conversational Interactions

Published:12 November 2014Publication History

ABSTRACT

When humans converse with each other, they naturally amalgamate information from multiple modalities (i.e., speech, gestures, speech prosody, facial expressions, and eye gaze). This paper focuses on eye gaze and its combination with speech. We develop a model that resolves references to visual (screen) elements in a conversational web browsing system. The system detects eye gaze, recognizes speech, and then interprets the user's browsing intent (e.g., click on a specific element) through a combination of spoken language understanding and eye gaze tracking. We experiment with multi-turn interactions collected in a wizard-of-Oz scenario where users are asked to perform several web-browsing tasks. We compare several gaze features and evaluate their effectiveness when combined with speech-based lexical features. The resulting multi-modal system not only increases user intent (turn) accuracy by 17%, but also resolves the referring expression ambiguity commonly observed in dialog systems with a 10% increase in F-measure.

References

  1. R. A. Bolt. "put-that-there": Voice and gesture at the graphics interface. volume 14. ACM, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Celikyilmaz, Z. Feizollahi, D. Hakkani-Tür, and R. Sarikaya. Resolving referring expressions in conversational dialogs for natural user interfaces. In Proceedings of EMNLP, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  3. N. Cooke, A. Shen, and M. Russell. Exploiting a "gaze-lombard effect" to improve asr performance in acoustically noisy settings. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. H. D. D. Salvucci. Identifying fixations and saccades in eye-tracking protocols. In Eye Tracking Researchand Application, pages 71--79, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Deng, J. Li, J.-T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, and J. Williams. Recent advances in deep learning for speech research at microsoft. In Proceedings of IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP), 2013.Google ScholarGoogle ScholarCross RefCross Ref
  6. B. Favre, D. Hakkani-Tür, and S. Cuendet. Icsiboost. http://code.google.come/p/icsiboost, 2007.Google ScholarGoogle Scholar
  7. L. Heck, D. Hakkani-Tür, M. Chinthakunta, G. Tur, R. Iyer, P. Parthasarathy, L. Stifelman, A. Fidler, and E. Shriberg. Multimodal conversational search and browse. In Proceedings of IEEE Workshop on Speech, Language and Audio in Multimedia, 2013.Google ScholarGoogle Scholar
  8. C. Kennington, S. Kousidis, and D. Schlangen. Interpreting situated dialogue utterances: an update model that uses speech, gaze, and gesture information. In Proceedings of SIGDial, 2013.Google ScholarGoogle Scholar
  9. T. Misu, A. Raux, I. Lane, J. Devassy, and R. Gupta. Situated multi-modal dialog system in vehicles. In Proceedings of the 6th ACM workshop on Eye gaze in intelligent human machine interaction, pages 25--28, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Z. Prasov and J. Y. Chai. What's in a gaze?: the role of eye-gaze in reference resolution in multimodal conversational interfaces. In ACM Proceedings of the 13th international conference on Intelligent user interfaces, pages 20--29, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Slaney, R. Rajen, A. Stolcke, and P. Parthasarathy. Gaze enhanced speech recognition. In Proceedings of IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP), 2014.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Eye Gaze for Spoken Language Understanding in Multi-modal Conversational Interactions

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction
      November 2014
      558 pages
      ISBN:9781450328852
      DOI:10.1145/2663204

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 November 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      ICMI '14 Paper Acceptance Rate51of127submissions,40%Overall Acceptance Rate453of1,080submissions,42%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader