research-article

A fusion framework for multimodal interactive applications

Authors:

Hildeberto Mendonça,

Jean-Yves Lionel Lawson,

Olga Vybornova,

Jean VanderdoncktAuthors Info & Claims

ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces

Pages 161 - 168

https://doi.org/10.1145/1647314.1647344

Published: 02 November 2009 Publication History

Abstract

This research aims to propose a multi-modal fusion framework for high-level data fusion between two or more modalities. It takes as input low level features extracted from different system devices, analyses and identifies intrinsic meanings in these data. Extracted meanings are mutually compared to identify complementarities, ambiguities and inconsistencies to better understand the user intention when interacting with the system. The whole fusion life cycle will be described and evaluated in an office environment scenario, where two co-workers interact by voice and movements, which might show their intentions. The fusion in this case is focusing on combining modalities for capturing a context to enhance the user experience.

References

[1]

V. Akman. From discourse to logic: Introduction to model-theoretic semantics of natural language, formal logic and discourse representation theory. Computational Linguistics, 21(2):265--268, June 1995.

Digital Library

[2]

J. Bouchet and L. Nigay. Icare: A component-based approach for the design and development of multimodal interfaces. In CHI '04 Extended Abstracts on Human Factors in Computing Systems, Vienna, 2004.

Digital Library

[3]

G. Calvary, J. Coutaz, D. Thevenin, Q. Limbourg, L. Bouillon, and J. Vanderdonckt. A unifying reference framework for multi-target user interfaces. Interacting with Computers, Vol. 15(No. 3):289--308, June 2003.

[4]

U. catholique de Louvain. Interface de recherche multimodale dans le contenu audivisuel -- irma. http://www.irmaproject.net/, December 2008.

[5]

L. Cole and D. Austin. Visual object recognition using template matching. In Proceedings of Australasian Conference on Robotics and Automation, 2004.

[6]

I. Corporation. Open source computer vision library -- opencv. http://www.intel.com/technology/computing/opencv/index.htm, January 2009.

[7]

J. Curran, S. Clark, and J. Bos. Linguistically motivated large-scale nlp with c&c and boxer. In Proceedings of the ACL 2007 Demonstrations Session, pages 29--32, 2007.

Digital Library

[8]

R. D. and M. B. A master-slaves volumetric framework for 3d reconstruction from images. In Proceedings of the SPIE'07, volume Vol. 6491, San Jose, USA, 2007.

[9]

B. A. Davis J. The representation and recognition of action using temporal templates. Technical Report 402, MIT Media Lab Technical Report, 1997.

[10]

R. Engel and N. P°eger. SmartKom: Foundations of Multimodal Dialogue Systems. Springer, Berlin, Germany, 2006.

[11]

J. Hockenmaier and M. Steedman. Ccgbank: A corpus of ccg derivations and dependency structures extracted from the penn treebank. Comput. Linguist., 33(3):355--396, 2007.

Digital Library

[12]

S. M. Inc. Project grizzly. https://grizzly.dev.java.net/, May 2009.

[13]

J.-Y. L. Lawson, A.-A. Al-Akkad, J. Vanderdonckt, and B. Macq. An open source workbench for prototyping multimodal interactions based on off-the-shelf heterogeneous components. In Proceedings of the EICS'09, Pittsburgh, USA, July 2009.

Digital Library

[14]

J.-Y. L. Lawson and B. Macq. Openinterface platform for multimodal applications prototyping. In ICASSP Show&Tell '08, Las Vegas, USA, April 2008.

[15]

J. C. Martin. Tycoon: Theoretical framework and software tools for multimodal interfaces, 1998.

[16]

J. C. Martin and M. Kipp. Annotating and measuring multimodal behaviour -- tycoon metrics in the anvil tool, 2002.

[17]

H. Mendonca. Meanings4fusion. http://kenai.com/projects/meanings4fusion, May 2009.

[18]

U. of Michigan. Soar. http://sitemaker.umich.edu/soar/home, August 2008.

[19]

N. Pfleger. Context based multimodal fusion. In Proceedings of the ICMI'04, Pennsylvania, USA, October 2004.

Digital Library

[20]

D. Reynolds and R. Rose. Robust text--independent speaker identification using gaussian mixture speaker models. Speech and Audio Processing, IEEE Transactions on, Vol. 3(No. 1):pp. 72--83, 1995.

[21]

S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, New Jersey, 2 ed. edition, 2002.

Digital Library

[22]

Y. Shoham and K. Leyton-Brown. Multiagent Systems. Cambridge, New York, 2009.

[23]

D. o. C. S. Stanford University. Protege platform. http://protege.stanford.edu, January 2009.

[24]

C. Town. Multi-sensory and multi-modal fusion for sentient computing. International Journal of Computer Vision, 71(2):235--253, February 2007.

Digital Library

[25]

O. Vybornova, H. Mendonca, L. Lawson, and B. Macq. High level data fusion on a multimodal interactive application platform. In Proceedings of IEEE ISM'08, Berkeley, USA, December 2008.

Digital Library

[26]

W3C. Resource description framework -- rdf. http://www.w3.org/TR/PR-rdf-syntax/, January 1999.

[27]

W. Walker. Sphinx-4: A flexible open source framework for speech recognition. Technical Report SMLI TR2004-0811, Sun Mycrosystems Inc., 2004.

Digital Library

[28]

X. Zou and B. Bhanu. Tracking humans using multi-modal fusion. In CVPR'2005, IEEE Computer Society Conference on, San Diego, USA, June 2005.

Digital Library

Cited By

Grigoriu ABuraga S(2024)Adapting Web Applications for AR/VR using Distributed User Interfaces and Meta-UIsProcedia Computer Science10.1016/j.procs.2024.09.495246:C(762-771)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.procs.2024.09.495
Lawson JVanderdonckt JVatavu RMandryk RHancock MPerry MCox A(2018)Mass-Computer Interaction for Thousands of Users and BeyondExtended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems10.1145/3170427.3188465(1-6)Online publication date: 20-Apr-2018
https://dl.acm.org/doi/10.1145/3170427.3188465
Fischbach MWiebusch DLatoschik M(2017)Semantic Entity-Component State Management Techniques to Enhance Software Quality for Multimodal VR-SystemsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2017.265709823:4(1342-1351)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TVCG.2017.2657098
Show More Cited By

Index Terms

A fusion framework for multimodal interactive applications

Recommendations

High level data fusion on a multimodal interactive application platform
EICS '09: Proceedings of the 1st ACM SIGCHI symposium on Engineering interactive computing systems

This research aims to propose a multimodal fusion framework for high-level data integration between two or more modalities. It takes as input extracted low level features from different system devices, analyzes and identifies intrinsic meanings in these ...
Incongruity-aware multimodal physiology signals fusion for emotion recognition
Abstract
Various physiological signals can reflect the human’s emotional states objectively. How to take advantage of the common as well as complementary properties of different physiological signals in representing the emotional states is an interesting ...
Highlights
- Multimodal physiological signals are fused for emotion recognition.
- Incongruity among auxiliary modalities is reduced by Cross Modal Transformer.
- Primary modality is enhanced by auxiliary modality through Modified Cross Modal ...
Embodied conversational agents in Wizard-of-Oz and multimodal interaction applications
COST 2102'07: Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours

Embodied conversational agents employed in multimodal interaction applications have the potential to achieve similar properties as humans in faceto-face conversation. They enable the inclusion of verbal and nonverbal communication. Thus, the degree of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces

November 2009

374 pages

ISBN:9781605587721

DOI:10.1145/1647314

General Chairs:
James L. Crowley
INRIA Grenoble Rhône-Alpes Research Centre, France
,
Yuri Ivanov
MERL, USA
,
Christopher Wren
Google, USA
,
Program Chairs:
Daniel Gatica-Perez
Idiap Research Institute, Switzerland
,
Michael Johnston
AT&T Research, USA
,
Rainer Stiefelhagen
University of Karlsruhe, Germany

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI-MLMI '09

Sponsor:

SIGCHI

ICMI-MLMI '09: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES/WORKSHOP ON MACHINE LEARNING FOR MULTIMODAL INTERFACES

November 2 - 4, 2009

Massachusetts, Cambridge, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
296
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Grigoriu ABuraga S(2024)Adapting Web Applications for AR/VR using Distributed User Interfaces and Meta-UIsProcedia Computer Science10.1016/j.procs.2024.09.495246:C(762-771)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.procs.2024.09.495
Lawson JVanderdonckt JVatavu RMandryk RHancock MPerry MCox A(2018)Mass-Computer Interaction for Thousands of Users and BeyondExtended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems10.1145/3170427.3188465(1-6)Online publication date: 20-Apr-2018
https://dl.acm.org/doi/10.1145/3170427.3188465
Fischbach MWiebusch DLatoschik M(2017)Semantic Entity-Component State Management Techniques to Enhance Software Quality for Multimodal VR-SystemsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2017.265709823:4(1342-1351)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TVCG.2017.2657098
Caschera MD’Ulizia AFerri FGrifoni P(2015)Multimodal Systems: An Excursus of the Main Research QuestionsOn the Move to Meaningful Internet Systems: OTM 2015 Workshops10.1007/978-3-319-26138-6_59(546-558)Online publication date: 28-Oct-2015
https://doi.org/10.1007/978-3-319-26138-6_59
Turk M(2014)Review ArticlePattern Recognition Letters10.1016/j.patrec.2013.07.00336(189-195)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1016/j.patrec.2013.07.003
Llinas J(2010)A survey and analysis of frameworks and framework issues for information fusion applicationsProceedings of the 5th international conference on Hybrid Artificial Intelligence Systems - Volume Part I10.1007/978-3-642-13769-3_2(14-23)Online publication date: 23-Jun-2010
https://dl.acm.org/doi/10.1007/978-3-642-13769-3_2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten