ABSTRACT
The recognition ambiguity of a recognition-based user interface is inevitable. Multimodal architecture should be an effective means to reduce the ambiguity, and contribute to error avoidance and recovery, compared with a unimodal one. But does the multimodal architecture always perform better than the unimode at any time? If not, when does it perform better than unimode, and when is it the optimum? Furthermore, how can modalities best be combined to gain the advantage of synergy? Little is known about these issues in the literature available. In this paper we try to give the answer through analyzing integration strategies for gaze and speech modalities, together with an evaluation experiment verifying these analyses. The approach involves studying the mutual correction cases and investigating when the mutual correction phenomena will occur. The goal of this study is to gain insights into integration strategies, and develop an optimum system to make error-prone recognition technologies perform at a more stable and robust level within a multimodal architecture.
Supplemental Material
Available for Download
This is the 1st color plate for resolving ambiguities of a gaze and speech interface
- Bolt, R. A. (1980) Put-that-there: Voice and Gesture in the graphics interface. In Proceedings of the ACM conference on Computer Graphics (pp. 262-270), New York. Google ScholarDigital Library
- Campana, E. Baldridge, J. Dowding, J. Hockey, B., A., Remington, R. W. Stone, L.S. (2001). Using eye movements to determine referents in a spoken dialogue system. In Proceeding of workshop on perceptive user interface. Orland, Florida. Google ScholarDigital Library
- Duchowski, A. T. and Roel Vertegaal, (2000), Eye-Based Interaction in Graphical Systems: Theory & Practice, SIGGRAPH 2000 Course 05, New Orleans, LA, July 2000, ACM.Google Scholar
- Hyrskykari, A., Majaranta, P., Aaltonen, A., Raiha, K-J., (2000), Design Issues of iDict:A Gaze-Assisted Translation Aid. In Proceedings of ETRA 2000, Eye Tracking Research and Applications Symposium, ACM Press, Palm Beach Gardens, FL, Nov. 2000, pp 9-14. Google ScholarDigital Library
- Jacob, R. J. K. (1995). Eye tracking in advanced interface design, In W. Barfield & T. Furness (Ed.), Advanced Interface Design and Virtual Environments (pp. 258-288). Oxford: Oxford University Press. Google ScholarDigital Library
- Karat, C. M., Halverson, C., Horn D., & Karat, J. (1999). Patterns of entry and correction in large vocabulary continuous speech recognition systems. In Proceedings of CHI 1999 Human Factors in Computing Systems Conference (pp. 568-575). NewYork, N.Y.: ACMPress. Google ScholarDigital Library
- Koons, D.B., Sparrell, C.J., & Thorisson, K.R. (1993). Integrating simultaneous input from speech, gaze and hand gestures. In M.Maybury (Ed.), Intelligent Multimedia Interfaces (pp. 257-276). Menlo Park, CA: MIT Press. Google ScholarDigital Library
- Land, MF & Furneaux S (1997) The knowledge base of the oculomotor system. Phil Trans R Soc Lond B, 352: 1231-1239.Google ScholarCross Ref
- Mankoff, J. Hudson, S.E. and Abowd, G.D. Providing integrated toolkit-level support for ambiguity in recognition-based interfaces. In Proc. of CHI'2000 (Amsterdam, APL, 2000). ACM Press, 368-375. Google ScholarDigital Library
- Neal, J.G., thielman C.Y., Dobes A., Haller S.M., Shapiro S.C. (1991). Natural language with integrated deictic and graphic gestures. In M.T.Maybury & W. Wahlster(Ed.), In Readings In Intelligent User Interfaces (pp.38-51). San Francisco, CA: Morgan Kaufmann Pub. Google ScholarDigital Library
- Oviatt S.L. (1999). Mutual disambiguation of recognition errors in a multimodal Architecture. In Proc. of ACM CHI 1999 Human Factors in Computing Systems Conference (pp. 576-583). NewYork: ACMPress. Google ScholarDigital Library
- Tanaka, K. (1999). A robust selection system using real-time multimodal user-agent interactions. In Proc. of the 1999 international conference on intelligent user interfaces. (pp.105-108). Redondo Beach CA USA. Google ScholarDigital Library
- Zhang Q.H., Imamiya A. & Go K. (2002). Text entry application based on gaze pointing. In Proc. 7th ERCIM Workshop User Interfaces For All (pp.87-102). Paris.Google Scholar
Index Terms
- Resolving ambiguities of a gaze and speech interface
Recommendations
Overriding errors in a speech and gaze multimodal architecture
IUI '04: Proceedings of the 9th international conference on Intelligent user interfacesThis work explores how to use the gaze and the speech command simultaneously to select an object on the screen. Multimodal systems have long been a key mean to reduce the recognition errors of individual components. But the multimodal system generates ...
Robust object-identification from inaccurate recognition-based inputs
AVI '04: Proceedings of the working conference on Advanced visual interfacesEyesight and speech are two channels that humans naturally use to communicate with each other. However both the eye tracking and the speech recognition technique existing are still far from perfect. This work explored how to integrate two (or more) ...
Mutual disambiguation of recognition errors in a multimodel architecture
CHI '99: Proceedings of the SIGCHI conference on Human Factors in Computing SystemsAs a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate ...
Comments