skip to main content
10.1145/1027933.1027977acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Context based multimodal fusion

Published: 13 October 2004 Publication History

Abstract

We present a generic approach to multimodal fusion which we call <i>context based multimodal integration</i>. Key to this approach is that every multimodal input event is interpreted and enriched with respect to its <i>local turn context</i>. This local turn context comprises all previously recognized input events and the dialogue state that both belong to the same user turn. We show that a production rule system is an elegant way to handle this context based multimodal integration and we describe a first implementation of the so-called PATE system. Finally, we present results from a first evaluation of this approach as part of a human-factors experiment with the <sc>COMIC</sc> system.

References

[1]
P. R. Cohen, M. Johnston, D. McGee, S. L. Oviatt, J. A. Pittman, I. Smith, L. Chen, and J. Clow. Quickset: Multimodal interaction for distributed applications. In Proceedings of ACM Multimedia 1997, pages 31 -- 40, Seattle, WA, 1997.
[2]
E. den Os and L. Boves. Towards ambient intelligence: Multimodal computers that understand our intentions. In eChallenges e-2003, pages 22--24, 2003.
[3]
R. Engel and N. Pfleger. Multimodal fusion. In W. Wahlster, editor, SmartKom - Foundations of Multi-modal Dialogue Systems, Cognitive Technologies. Springer Verlag (in Press), 2004.
[4]
G. Herzog, H. Kirchmann, S. Merten, A. Ndiaye, and P. Poller. Multiplatform testbed: An integration platform for multimodal dialog systems. In H. Cunningham and J. Patrick, editors, Proceedings of the HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS), pages 75--82, Edmonton, Canada, 2003.
[5]
M. Johnston and S. Bangalore. Finite-state methods for multimodal parsing and integration. In Finite-state Methods Workshop, ESSLLI Summer School on Logic Language and Information, Helsinki, Finland, August 2001.
[6]
M. Johnston, S. Bangalore, and G. Vasireddy. Match: Multimodal access to city help. In Proceedings of ASRU 2001 Workshop, Madonna di Campiglio, Italy, 2001.
[7]
M. Johnston, P. R. Cohen, D. McGee, S. L. Oviatt, J. A. Pittman, and I. Smith. Unification based multimodal integration. In Proceedings of the 35th ACL, pages 281--288, Madrid, Spain, 1997.
[8]
B. Kempe. PATE -- a production rule system based on activation and typed feature structure elements. Master's thesis, Saarland University, 2004.
[9]
S. Oviatt. Ten myths of multimodal interaction. Communications of the ACM, 42(11):74--81, 1999.
[10]
S. Oviatt, R. Coulston, S. Tomko, B. Xiao, R. Lunsford, M. Wesson, and L. Carmichael. Toward a theory of organized multimodal integration patterns during human-computer interaction. In Proceedings of the 5th international conference on Multimodal interfaces, pages 44--51. ACM Press, 2003.
[11]
N. Pfleger, J. Alexandersson, and T. Becker. Scoring functions for overlay and their application in discourse processing. In KONVENS-02, Saarbrücken, September 2002.
[12]
P. V. and H. T. S. Multimodal tracking and classification of audio-visual features. In AAAI Workshop on Representations for Multi-modal Human-Computer Interaction, July 1998.
[13]
L. Vuurpijl, L. ten Bosch, S. Rossignol, A. Neumann, R. Engel, and N. Pfleger. Comic deliverable 3.3: Reports on human factors experiments with simultaneous coordinated speech and pen input and fusion. Technical report, The COMIC Project, 2004.
[14]
L. Vuurpijl, L. ten Bosch, S. Rossignol, A. Neumann, N. Pfleger, and R. Engel. Evaluation of multimodal input for design applications. In LREC WS on Multimodal Corpora Lisbon, Portugal, in press.
[15]
W. Wahlster. User and discourse models for multimodal communication. In J. W. Sullivan and S. W. Tyler, editor, Intelligent User Interfaces, pages 45--67. ACM Press, 1991.
[16]
W. Wahlster. Smartkom: Symmetric multimodality in an adaptive and reusable dialogue shell. In R. Krahl and D. Günther, editors, Proceedings of the Human Computer Interaction Status Conference 2003, pages 47--62, Berlin: DLR, June 2003.

Cited By

View all
  • (2024)EFCC-IeT: Cross-Modal Electronic File Content Correlation via Image-Enhanced TextKnowledge Science, Engineering and Management10.1007/978-981-97-5492-2_17(214-227)Online publication date: 26-Jul-2024
  • (2023)A Parallel Multimodal Integration Framework and Application for Cake ShoppingApplied Sciences10.3390/app1401029914:1(299)Online publication date: 29-Dec-2023
  • (2018)Semantic Fusion for Natural Multimodal Interfaces using Concurrent Augmented Transition NetworksMultimodal Technologies and Interaction10.3390/mti20400812:4(81)Online publication date: 6-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces
October 2004
368 pages
ISBN:1581139950
DOI:10.1145/1027933
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fusion
  2. multimodal dialogue systems
  3. multimodal integration
  4. speech and pen input

Qualifiers

  • Article

Conference

ICMI04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)EFCC-IeT: Cross-Modal Electronic File Content Correlation via Image-Enhanced TextKnowledge Science, Engineering and Management10.1007/978-981-97-5492-2_17(214-227)Online publication date: 26-Jul-2024
  • (2023)A Parallel Multimodal Integration Framework and Application for Cake ShoppingApplied Sciences10.3390/app1401029914:1(299)Online publication date: 29-Dec-2023
  • (2018)Semantic Fusion for Natural Multimodal Interfaces using Concurrent Augmented Transition NetworksMultimodal Technologies and Interaction10.3390/mti20400812:4(81)Online publication date: 6-Dec-2018
  • (2017)A Genetic Algorithm Based Approach for Data Fusion at Grammar Level2017 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI.2017.48(286-291)Online publication date: Dec-2017
  • (2017)Management of Multimodal User Interaction in Companion-SystemsCompanion Technology10.1007/978-3-319-43665-4_10(187-207)Online publication date: 5-Dec-2017
  • (2016)Increasing robustness of multimodal interaction via individual interaction historiesProceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction10.1145/3011263.3011273(56-63)Online publication date: 12-Nov-2016
  • (2016)Speech Phoneme Classification by Intelligent Decision-Level FusionInformatics in Control, Automation and Robotics 12th International Conference, ICINCO 2015 Colmar, France, July 21-23, 2015 Revised Selected Papers10.1007/978-3-319-31898-1_4(63-78)Online publication date: 15-May-2016
  • (2014)The Automated Interplay of Multimodal Fission and Fusion in Adaptive HCIProceedings of the 2014 International Conference on Intelligent Environments10.1109/IE.2014.32(170-177)Online publication date: 30-Jun-2014
  • (2014)Multimodal fusion framework: A multiresolution approach for emotion classification and recognition from physiological signalsNeuroImage10.1016/j.neuroimage.2013.11.007102(162-172)Online publication date: Nov-2014
  • (2014)Multimodal Input for Perceptual User InterfacesInteractive Displays10.1002/9781118706237.ch9(285-312)Online publication date: 12-Jul-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media