skip to main content
10.1145/968363.968383acmconferencesArticle/Chapter ViewAbstractPublication PagesetraConference Proceedingsconference-collections
Article

Resolving ambiguities of a gaze and speech interface

Authors Info & Claims
Published:22 March 2004Publication History

ABSTRACT

The recognition ambiguity of a recognition-based user interface is inevitable. Multimodal architecture should be an effective means to reduce the ambiguity, and contribute to error avoidance and recovery, compared with a unimodal one. But does the multimodal architecture always perform better than the unimode at any time? If not, when does it perform better than unimode, and when is it the optimum? Furthermore, how can modalities best be combined to gain the advantage of synergy? Little is known about these issues in the literature available. In this paper we try to give the answer through analyzing integration strategies for gaze and speech modalities, together with an evaluation experiment verifying these analyses. The approach involves studying the mutual correction cases and investigating when the mutual correction phenomena will occur. The goal of this study is to gain insights into integration strategies, and develop an optimum system to make error-prone recognition technologies perform at a more stable and robust level within a multimodal architecture.

Skip Supplemental Material Section

Supplemental Material

References

  1. Bolt, R. A. (1980) Put-that-there: Voice and Gesture in the graphics interface. In Proceedings of the ACM conference on Computer Graphics (pp. 262-270), New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Campana, E. Baldridge, J. Dowding, J. Hockey, B., A., Remington, R. W. Stone, L.S. (2001). Using eye movements to determine referents in a spoken dialogue system. In Proceeding of workshop on perceptive user interface. Orland, Florida. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Duchowski, A. T. and Roel Vertegaal, (2000), Eye-Based Interaction in Graphical Systems: Theory & Practice, SIGGRAPH 2000 Course 05, New Orleans, LA, July 2000, ACM.Google ScholarGoogle Scholar
  4. Hyrskykari, A., Majaranta, P., Aaltonen, A., Raiha, K-J., (2000), Design Issues of iDict:A Gaze-Assisted Translation Aid. In Proceedings of ETRA 2000, Eye Tracking Research and Applications Symposium, ACM Press, Palm Beach Gardens, FL, Nov. 2000, pp 9-14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jacob, R. J. K. (1995). Eye tracking in advanced interface design, In W. Barfield & T. Furness (Ed.), Advanced Interface Design and Virtual Environments (pp. 258-288). Oxford: Oxford University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Karat, C. M., Halverson, C., Horn D., & Karat, J. (1999). Patterns of entry and correction in large vocabulary continuous speech recognition systems. In Proceedings of CHI 1999 Human Factors in Computing Systems Conference (pp. 568-575). NewYork, N.Y.: ACMPress. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Koons, D.B., Sparrell, C.J., & Thorisson, K.R. (1993). Integrating simultaneous input from speech, gaze and hand gestures. In M.Maybury (Ed.), Intelligent Multimedia Interfaces (pp. 257-276). Menlo Park, CA: MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Land, MF & Furneaux S (1997) The knowledge base of the oculomotor system. Phil Trans R Soc Lond B, 352: 1231-1239.Google ScholarGoogle ScholarCross RefCross Ref
  9. Mankoff, J. Hudson, S.E. and Abowd, G.D. Providing integrated toolkit-level support for ambiguity in recognition-based interfaces. In Proc. of CHI'2000 (Amsterdam, APL, 2000). ACM Press, 368-375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Neal, J.G., thielman C.Y., Dobes A., Haller S.M., Shapiro S.C. (1991). Natural language with integrated deictic and graphic gestures. In M.T.Maybury & W. Wahlster(Ed.), In Readings In Intelligent User Interfaces (pp.38-51). San Francisco, CA: Morgan Kaufmann Pub. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Oviatt S.L. (1999). Mutual disambiguation of recognition errors in a multimodal Architecture. In Proc. of ACM CHI 1999 Human Factors in Computing Systems Conference (pp. 576-583). NewYork: ACMPress. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tanaka, K. (1999). A robust selection system using real-time multimodal user-agent interactions. In Proc. of the 1999 international conference on intelligent user interfaces. (pp.105-108). Redondo Beach CA USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Zhang Q.H., Imamiya A. & Go K. (2002). Text entry application based on gaze pointing. In Proc. 7th ERCIM Workshop User Interfaces For All (pp.87-102). Paris.Google ScholarGoogle Scholar

Index Terms

  1. Resolving ambiguities of a gaze and speech interface

            Recommendations

            Reviews

            Thomas Portele

            This paper describes the use of a combination of a gaze tracker and a speech recognition unit in a multimodal object manipulation system. The authors argue that the multimodal approach is able to compensate for the errors of the different unimodal recognizers. Their approach is mainly based on eliminating those items from the speech recognition n -best list that have no counterpart in the gaze detection n -best list. The scores of the items are not used. An experiment with an object selection task showed improved performance for the authors' system compared to speech-only input, but not compared to gaze-only input. This result is explained by the low performance of the speech recognizer. The authors did not really perform experiments with unimodal input conditions, however; they took the results of the unimodal recognizers. Thus, compensation effects (for example, that people speak with more articulatory effort on the phone than face-to-face) are not addressed. Overall, the paper describes an interesting approach, with some methodological problems. Online Computing Reviews Service

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ETRA '04: Proceedings of the 2004 symposium on Eye tracking research & applications
              March 2004
              154 pages
              ISBN:1581138253
              DOI:10.1145/968363

              Copyright © 2004 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 22 March 2004

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              ETRA '04 Paper Acceptance Rate18of40submissions,45%Overall Acceptance Rate69of137submissions,50%

              Upcoming Conference

              ETRA '24
              The 2024 Symposium on Eye Tracking Research and Applications
              June 4 - 7, 2024
              Glasgow , United Kingdom

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader