skip to main content
10.1145/1647314.1647343acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Fusion engines for multimodal input: a survey

Published: 02 November 2009 Publication History

Abstract

Fusion engines are fundamental components of multimodal inter-active systems, to interpret input streams whose meaning can vary according to the context, task, user and time. Other surveys have considered multimodal interactive systems; we focus more closely on the design, specification, construction and evaluation of fusion engines. We first introduce some terminology and set out the major challenges that fusion engines propose to solve. A history of past work in the field of fusion engines is then presented using the BRETAM model. These approaches to fusion are then classified. The classification considers the types of application, the fusion principles and the temporal aspects. Finally, the challenges for future work in the field of fusion engines are set out. These include software frameworks, quantitative evaluation, machine learning and adaptation.

References

[1]
Allen, J.: Maintaining Knowledge about Temporal Intervals. Communications of the ACM, Vol. 26, No. 11, (1983) 832--843.
[2]
Bass, L., Pellegrino, R., Reed, S., Seacord, R., Sheppard, R., and Szezur, M. R. The Arch model: Seeheim revisited. proceeding of the User Interface Developpers' workshop. 91.
[3]
Bellik, Y. (1997) Media integration in multimodal interfaces. Proceedings of IEEE Workshop on Multimedia Signal Processing, Princeton, NJ.
[4]
Bolt, R. A. 1980. "Put-that-there": Voice and gesture at the graphics interface. In Proceedings of the 7th Annual Conference on Computer Graphics and interactive Techniques. SIGGRAPH '80. ACM, New York, NY, 262--270.
[5]
Bouchet, J., Nigay, L. ICARE: A Component-Based Approach for the Design and Development of Multimodal Interfaces. In Extended Abstracts CHI'04 (2004). ACM Press, 1325--1328.
[6]
Bouchet, J., Nigay, L. ICARE Software Components for Rapidly Developing Multimodal Interfaces. In Proceedings of the 6th international Conference on Multimodal interfaces (State College, PA, USA, 2004). ICMI '04. ACM, New York, NY, 251--258.
[7]
Bourguet, M. L. A Toolkit for Creating and Testing Multimodal Interface Designs. Proceedings of UIST'02, Paris. 2002, pp. 29--30.
[8]
Carpenter, R. 1990. Typed feature structures: Inheritance, (In)equality, and Extensionality. In W. Daelemans and O. Gazdar (Eds.), Proceedings of the 1TK Workshop: Inheritance in Natural Language Processing, Tilburg University, pp. 9--18.
[9]
Cohen, P. R., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., and Clow, J. 1997. QuickSet: multimodal interaction for distributed applications. In Proceedings of the Fifth ACM international Conference on Multimedia (Seattle, Washington, United States, November 09 -- 13, 1997). MULTIMEDIA '97. ACM, New York, NY, 31--40.
[10]
Coutaz, J., Nigay, L., Salber, D., Blandford, A., May, J. and Young, R. Four Easy Pieces for Assessing the Usability of Multimodal Interaction: The CARE properties. Proceedings of INTERACT'95 conference, 1995, pp. 115--120.
[11]
Duarte, C. and Carriço, L. 2006. A conceptual framework for developing adaptive multimodal applications. In Proceedings of the 11th international Conference on intelligent User interfaces (Sydney, Australia, January 29 -- February 01, 2006). IUI '06. ACM, New York, NY, 132--139.
[12]
Dumas, B., Lalanne, D., Guinard, D., Koenig, R., and Ingold, R. 2008. Strengths and weaknesses of software architectures for the rapid creation of tangible and multimodal interfaces. In Proc. of the 2nd int. Conf. on Tangible and Embedded interaction (Bonn, Germany, 2008). TEI '08. ACM, 47--54
[13]
Dumas B., Lalanne D.&Oviatt S. Multimodal Interfaces: A Survey of Principles, Models and Frameworks. D. Lalanne and J. Kohlas (Eds.): Human Machine Interaction, LNCS 5440, pp. 3--26, 2009. © Springer-Verlag Berlin 2009
[14]
Flippo, F., Krebs, A., and Marsic, I. 2003. A framework for rapid development of multimodal interfaces. In Proceedings of the 5th international Conference on Multimodal interfaces. ICMI '03. ACM, New York, NY, 109--116.
[15]
Gaines, B. R., Modeling and Forecasting the Information Sciences. Information Sciences 57--58, 1991, p. 3--22.
[16]
Jaimes, A. and Sebe, N. 2007. Multimodal human-computer interaction: A survey. Comput. Vis. Image Underst. 108, 1--2 (Oct. 2007), 116--134.
[17]
Holzapfel, H., Nickel, K., and Stiefelhagen, R. 2004. Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures. In Proceedings of the 6th international Conference on Multimodal interfaces (State College, PA, USA. ICMI '04. ACM, New York, NY, 175--182.
[18]
Johnston, M. and Bangalore, S. Finite-state multimodal integration and understanding. Nat. Lang. Eng. 11, 2 (Jun. 2005), 159--187
[19]
Koons, D. B., Sparrell, C. J., and Thorisson, K. R. 1993. Integrating simultaneous input from speech, gaze, and hand gestures. In intelligent Multimedia interfaces, M. T. Maybury, Ed. American Association for Artificial Intelligence, Menlo Park, CA, 257--276.
[20]
Krahnstoever, N., Kettebekov, S., Yeasin, M., and Sharma, R. 2002. A Real--Time Framework for Natural Multimodal Interaction with Large Screen Displays. In Proceedings of the 4th IEEE international Conference on Multimodal interfaces -- Volume 00 (October 14 -- 16, 2002). International Conference on Multimodal Interfaces. IEEE Computer Society, Washington, DC, 349.
[21]
Latoschik, M.E., 2002. Designing transition networks for multimodal VR-interactions using a markup language. Multimodal Interfaces, 2002. In Proceedings of the Fourth IEEE International Conference on Multimodal Interfaces. IEEE, 411--416.
[22]
Martin, J.C., Veldman, R.,&Beroule, D. 1998. Developping multimodal interfaces: a theoretical framework and guided propagation networks. Multimodal Human-Computer Communication. Springer Verlag LNAI 1374.
[23]
Mansoux, B., Nigay, L., Troccaz, J. Output Multimodal Interaction: The Case of Augmented Surgery. In Proceedings of HCI'06, Human Computer Interaction, People and Computers XX, the 20th BCS HCI Group conference, (London, UK,11--15 September, 2006). Springer Publ, 177--192.
[24]
Melichar, M. and Cenek, P. From vocal to multimodal dialogue management. In Proceedings of the 8th international Conference on Multimodal interfaces 2006. ICMI '06. ACM, New York, 59--67.
[25]
Milota, A. D. Modality fusion for graphic design applications. In Proceedings of the 6th international Conference on Multimodal interfaces. ICMI '04. ACM, New York, NY, 167--174.
[26]
Neal, J. G., Thielman, C. Y., Dobes, Z., Haller, S. M., and Shapiro, S. C. 1989. Natural language with integrated deictic and graphic gestures. In Proceedings of the Workshop on Speech and Natural Language. Human Language Technology Conference. Association for Computational Linguistics, Morristown, NJ, 410--423
[27]
Nigay, L. and Coutaz, J. A design space for multimodal systems: concurrent processing and data fusion. In Proceedings of the INTERCHI '93 Conference on Human Factors in Computing Systems. S. Ashlund, A. Henderson, E. Hollnagel, K. Mullet, and T. White, Eds. IOS Press, Amsterdam, 172--178.
[28]
Nigay, L. and Coutaz, J. A generic platform for addressing the multimodal challenge. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Denver, Colorado, United States, May 07 -- 11, 1995). Conference on Human Factors in Computing Systems. ACM Press, New York, NY, 98--105.
[29]
Nigay, L., et al. Software Engineering for Multimodal Interactive Systems. In Multimodal user interfaces: from signals to interaction, Chapter 9, Springer, Lecture Notes in Electrical Engineering, ISBN13: 9783540783442, 201--218.
[30]
Navarre, D., Palanque, P., Bastide, R., Schyn, A., Winckler, M.A., Nedel, L., Freitas, C. A Formal Description of Multimodal Interaction Techniques for Immersive Virtual Reality Applications. In: INTERACT 2005: IFIP TC13 International Conference, Rome, Italy, Springer-Verlag LNCS 3585, p. 170--185, 2005.
[31]
Navarre, D., Palanque, P., Basnyat, S. Usability Service Continuation through Reconfiguration of Input and Output Devices in Safety Critical Interactive Systems. The 27th International Conference on Computer Safety, Reliability and Security (SAFECOMP 2008), LNCS 5219, pp. 373--386, 2008.
[32]
Nielsen, J.: A Virtual Protocol Model for Computer-Human Interaction. International Journal of Man--Machine Studies 24,3 (1986) 301--312
[33]
Pfleger, N. 2004. Context based multimodal fusion. In Proceedings of the 6th international Conference on Multimodal interfaces (State College, PA, USA, October 13 -- 15, 2004). ICMI '04. ACM, New York, NY, 265--272.
[34]
Portillo, P. M., García, G. P., and Carredano, G. A. 2006. Multimodal fusion: a new hybrid strategy for dialogue systems. In Proceedings of the 8th international Conference on Multimodal interfaces (Banff, Alberta, Canada, November 02 -- 04, 2006). ICMI '06. ACM, New York, NY, 357--363
[35]
Schlomer, T., Poppinga B., Henze N.&Boll S. Gesture Recognition with a Wii Controller, Proceedings of the 2nd international conference on Tangible and embedded interaction, ACM, 2008
[36]
Shikler, T. S., Kaliouby, R. and Robinson, P. 2004. Design Challenges in Multimodal Inference Systems for Human-Computer Interaction. In Proceedings of International Workshop on Universal Access and Assistive Technology, Fitzwilliam College, University of Cambridge, United Kingdom, 22nd--24th March, 2004
[37]
Sun, Y., Chen, F., Shi, Y. and Chung, V. 2006. A novel method for multi-sensory data fusion in multimodal human computer interaction. In Proceedings of the 18th Australia Conf. on Computer-Human Interaction: Design: Activities, Artefacts and Environments. . OZCHI '06, vol. 206. ACM, New York, NY, 401--404.
[38]
Jobs, S. P. et al. (2008). Touch Screen Device, Method, and Graphical User Interface for Determining Commands by Applying Heuristics. United States Patent Application 20080122796. Kind Code A, May 29, 2008.
[39]
Vernier, F. and Nigay, L. A Framework for the Combination and Characterization of Output Modalities. In Proceedings of DSV-IS 2000, (Limerick IR, 2000). LNCS 1946, Springer-Verlag, 32--48.
[40]
Wahlster, W. 1991. User and discourse models for multimodal communication. In intelligent User Interfaces, J. W. Sullivan and S. W. Tyler, Eds. ACM, New York, NY, 45--67.

Cited By

View all
  • (2024)ADAS Alarm Sound Design for Autonomous Vehicles Based on Local Optimization: A Case Study in Shanghai, ChinaApplied Sciences10.3390/app14221073314:22(10733)Online publication date: 20-Nov-2024
  • (2024)Towards a Conceptual Model of Users’ Expectations of an Autonomous In-Vehicle Multimodal ExperienceHuman Behavior and Emerging Technologies10.1155/2024/74185972024(1-14)Online publication date: 14-Mar-2024
  • (2024)Interactive Output Modalities Design for Enhancement of User Trust Experience in Highly Autonomous DrivingInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2375697(1-19)Online publication date: 10-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces
November 2009
374 pages
ISBN:9781605587721
DOI:10.1145/1647314
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fusion engine
  2. interaction techniques
  3. multimodal interfaces

Qualifiers

  • Research-article

Conference

ICMI-MLMI '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)6
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)ADAS Alarm Sound Design for Autonomous Vehicles Based on Local Optimization: A Case Study in Shanghai, ChinaApplied Sciences10.3390/app14221073314:22(10733)Online publication date: 20-Nov-2024
  • (2024)Towards a Conceptual Model of Users’ Expectations of an Autonomous In-Vehicle Multimodal ExperienceHuman Behavior and Emerging Technologies10.1155/2024/74185972024(1-14)Online publication date: 14-Mar-2024
  • (2024)Interactive Output Modalities Design for Enhancement of User Trust Experience in Highly Autonomous DrivingInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2375697(1-19)Online publication date: 10-Jul-2024
  • (2024)User Interaction Mode Selection and Preferences in Different Driving States of Automotive Intelligent CockpitDesign, User Experience, and Usability10.1007/978-3-031-61353-1_18(262-274)Online publication date: 15-Jun-2024
  • (2023)Literature Reviews in HCI: A Review of ReviewsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581332(1-24)Online publication date: 19-Apr-2023
  • (2022)Multimodal Natural Human–Computer Interfaces for Computer-Aided Design: A Review PaperApplied Sciences10.3390/app1213651012:13(6510)Online publication date: 27-Jun-2022
  • (2022)Acoustic-Based Automatic Addressee Detection for Technical Systems: A ReviewFrontiers in Computer Science10.3389/fcomp.2022.8317844Online publication date: 14-Jul-2022
  • (2022)Enhancing interaction of people with quadriplegiaProceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3529190.3529218(223-229)Online publication date: 29-Jun-2022
  • (2022)Reducing the Cognitive Load of Playing a Digital Tabletop Game with a Multimodal InterfaceProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502062(1-13)Online publication date: 29-Apr-2022
  • (2022)A Case Study on the Rapid Development of Natural and Synergistic Multimodal Interfaces for XR Use-CasesExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3503552(1-8)Online publication date: 27-Apr-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media