Article

Resolving ambiguities of a gaze and speech interface

Authors:
Qiaohui Zhang

Department of Compute and Media Engineering, University of Yamanashi

Department of Compute and Media Engineering, University of Yamanashi
View Profile

,
Atsumi Imamiya

Department of Compute and Media Engineering, University of Yamanashi

Department of Compute and Media Engineering, University of Yamanashi
View Profile

,
Kentaro Go

Center for Integrated Information Processing, University of Yamanashi

Center for Integrated Information Processing, University of Yamanashi
View Profile

,
Xiaoyang Mao

Department of Compute and Media Engineering, University of Yamanashi

Department of Compute and Media Engineering, University of Yamanashi
View Profile

ETRA '04: Proceedings of the 2004 symposium on Eye tracking research & applicationsMarch 2004Pages 85–92https://doi.org/10.1145/968363.968383

Published:22 March 2004Publication History

ETRA '04: Proceedings of the 2004 symposium on Eye tracking research & applications

Pages 85–92

ABSTRACT

The recognition ambiguity of a recognition-based user interface is inevitable. Multimodal architecture should be an effective means to reduce the ambiguity, and contribute to error avoidance and recovery, compared with a unimodal one. But does the multimodal architecture always perform better than the unimode at any time? If not, when does it perform better than unimode, and when is it the optimum? Furthermore, how can modalities best be combined to gain the advantage of synergy? Little is known about these issues in the literature available. In this paper we try to give the answer through analyzing integration strategies for gaze and speech modalities, together with an evaluation experiment verifying these analyses. The approach involves studying the mutual correction cases and investigating when the mutual correction phenomena will occur. The goal of this study is to gain insights into integration strategies, and develop an optimum system to make error-prone recognition technologies perform at a more stable and robust level within a multimodal architecture.

Supplemental Material

Available for Download

pdf

p85-zhang-plate.pdf (55.2 KB)

This is the 1st color plate for resolving ambiguities of a gaze and speech interface

References

Bolt, R. A. (1980) Put-that-there: Voice and Gesture in the graphics interface. In Proceedings of the ACM conference on Computer Graphics (pp. 262-270), New York. Google ScholarDigital Library
Campana, E. Baldridge, J. Dowding, J. Hockey, B., A., Remington, R. W. Stone, L.S. (2001). Using eye movements to determine referents in a spoken dialogue system. In Proceeding of workshop on perceptive user interface. Orland, Florida. Google ScholarDigital Library
Duchowski, A. T. and Roel Vertegaal, (2000), Eye-Based Interaction in Graphical Systems: Theory & Practice, SIGGRAPH 2000 Course 05, New Orleans, LA, July 2000, ACM.Google Scholar
Hyrskykari, A., Majaranta, P., Aaltonen, A., Raiha, K-J., (2000), Design Issues of iDict:A Gaze-Assisted Translation Aid. In Proceedings of ETRA 2000, Eye Tracking Research and Applications Symposium, ACM Press, Palm Beach Gardens, FL, Nov. 2000, pp 9-14. Google ScholarDigital Library
Jacob, R. J. K. (1995). Eye tracking in advanced interface design, In W. Barfield & T. Furness (Ed.), Advanced Interface Design and Virtual Environments (pp. 258-288). Oxford: Oxford University Press. Google ScholarDigital Library
Karat, C. M., Halverson, C., Horn D., & Karat, J. (1999). Patterns of entry and correction in large vocabulary continuous speech recognition systems. In Proceedings of CHI 1999 Human Factors in Computing Systems Conference (pp. 568-575). NewYork, N.Y.: ACMPress. Google ScholarDigital Library
Koons, D.B., Sparrell, C.J., & Thorisson, K.R. (1993). Integrating simultaneous input from speech, gaze and hand gestures. In M.Maybury (Ed.), Intelligent Multimedia Interfaces (pp. 257-276). Menlo Park, CA: MIT Press. Google ScholarDigital Library
Land, MF & Furneaux S (1997) The knowledge base of the oculomotor system. Phil Trans R Soc Lond B, 352: 1231-1239.Google ScholarCross Ref
Mankoff, J. Hudson, S.E. and Abowd, G.D. Providing integrated toolkit-level support for ambiguity in recognition-based interfaces. In Proc. of CHI'2000 (Amsterdam, APL, 2000). ACM Press, 368-375. Google ScholarDigital Library
Neal, J.G., thielman C.Y., Dobes A., Haller S.M., Shapiro S.C. (1991). Natural language with integrated deictic and graphic gestures. In M.T.Maybury & W. Wahlster(Ed.), In Readings In Intelligent User Interfaces (pp.38-51). San Francisco, CA: Morgan Kaufmann Pub. Google ScholarDigital Library
Oviatt S.L. (1999). Mutual disambiguation of recognition errors in a multimodal Architecture. In Proc. of ACM CHI 1999 Human Factors in Computing Systems Conference (pp. 576-583). NewYork: ACMPress. Google ScholarDigital Library
Tanaka, K. (1999). A robust selection system using real-time multimodal user-agent interactions. In Proc. of the 1999 international conference on intelligent user interfaces. (pp.105-108). Redondo Beach CA USA. Google ScholarDigital Library
Zhang Q.H., Imamiya A. & Go K. (2002). Text entry application based on gaze pointing. In Proc. 7th ERCIM Workshop User Interfaces For All (pp.87-102). Paris.Google Scholar

Index Terms

Resolving ambiguities of a gaze and speech interface
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking
      2. Computer vision tasks
        Scene understanding
    2. Natural language processing
      1. Speech recognition
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
    2. Interaction paradigms
      1. Graphical user interfaces

Recommendations

Overriding errors in a speech and gaze multimodal architecture
IUI '04: Proceedings of the 9th international conference on Intelligent user interfaces

This work explores how to use the gaze and the speech command simultaneously to select an object on the screen. Multimodal systems have long been a key mean to reduce the recognition errors of individual components. But the multimodal system generates ...
Read More
Robust object-identification from inaccurate recognition-based inputs
AVI '04: Proceedings of the working conference on Advanced visual interfaces

Eyesight and speech are two channels that humans naturally use to communicate with each other. However both the eye tracking and the speech recognition technique existing are still far from perfect. This work explored how to integrate two (or more) ...
Read More
Mutual disambiguation of recognition errors in a multimodel architecture
CHI '99: Proceedings of the SIGCHI conference on Human Factors in Computing Systems

As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate ...
Read More

Reviews

Reviewer: Thomas Portele

This paper describes the use of a combination of a gaze tracker and a speech recognition unit in a multimodal object manipulation system. The authors argue that the multimodal approach is able to compensate for the errors of the different unimodal recognizers. Their approach is mainly based on eliminating those items from the speech recognition n -best list that have no counterpart in the gaze detection n -best list. The scores of the items are not used. An experiment with an object selection task showed improved performance for the authors' system compared to speech-only input, but not compared to gaze-only input. This result is explained by the low performance of the speech recognizer. The authors did not really perform experiments with unimodal input conditions, however; they took the results of the unimodal recognizers. Thus, compensation effects (for example, that people speak with more articulatory effort on the phone than face-to-face) are not addressed. Overall, the paper describes an interesting approach, with some methodological problems. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ETRA '04: Proceedings of the 2004 symposium on Eye tracking research & applications
March 2004
154 pages
ISBN:1581138253
DOI:10.1145/968363
Conference Chairs:
Andrew T. Duchowski
Clemson University
,
Roel Vertegaal
Queens University, Canada
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 March 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ambiguity
eye tracking
integration strategies
multimodal architecture
recognition errors
speech input
Qualifiers
- Article
Conference

Acceptance Rates
ETRA '04 Paper Acceptance Rate18of40submissions,45%Overall Acceptance Rate69of137submissions,50%
More
Upcoming Conference
ETRA '24

Sponsor:

sigchi

sigchi

The 2024 Symposium on Eye Tracking Research and Applications

June 4 - 7, 2024

Glasgow , United Kingdom
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 478
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.