research-article

MultiML: a general purpose representation language for multimodal human utterances

Authors:

Manuel Giuliani,

Alois KnollAuthors Info & Claims

ICMI '08: Proceedings of the 10th international conference on Multimodal interfaces

Pages 165 - 172

https://doi.org/10.1145/1452392.1452424

Published: 20 October 2008 Publication History

Abstract

We present MultiML, a markup language for the annotation of multimodal human utterances. MultiML is able to represent input from several modalities, as well as the relationships between these modalities. Since MultiML separates general parts of representation from more context-specific aspects, it can easily be adapted for use in a wide range of contexts. This paper demonstrates how speech and gestures are described with MultiML, showing the principles - including hierarchy and underspecification - that ensure the quality and extensibility of MultiML. As a proof of concept, we show how MultiML is used to annotate a sample human-robot interaction in the domain of a multimodal joint-action scenario.

References

[1]

A. Kranstedt, S. Kopp, and I. Wachsmuth "Murml: A multimodal utterance representation markup language for conversational agents," in Proc. of the AAMAS Workshop on "Embodied conversational agents Let's specify and evaluate them", 2002.

[2]

S. Prillwitz, R. Leven, H. Zienert, T. Hanke, and J. Henning, HamNoSys. Version 2.0; Hamburger Notationssystem für Gebärdensprache. Eine Einführung. Hamburg: Signum, 1989.

[3]

F. Landragin, A. Denis, A. Ricci, and L. Romary, "Multimodal meaning representation for generic dialogue systems architectures," in Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), 2004, pp. 521--524.

[4]

D. Gibbon, U. Gut, B. Hell, K. Looks, and A. T. T. Trippel, "A computational model of arm gestures in conversation," in EUROSPEECH-2003, 2003, pp. 813--816.

[5]

J. Y. Chai, "Semantics-based representation for multimodal interpretation in conversational systems," in Proceedings of the 19th international conference on Computational Linguistics, vol. 1. Association for Computational Linguistics Morristown, NJ, USA, 2002, pp. 1--7.

Digital Library

[6]

M. E. Foster, T. By, M. Rickert, and A. Knoll, "Human-robot dialogue for joint construction tasks," in Proceedings, Eighth International Conference on Multimodal Interfaces (ICMI 2006), Banff, November 2006.

Digital Library

[7]

T. Müller, P. Ziaie, and A. Knoll, "A wait-free realtime system for optimal distribution of vision tasks on multicore architectures," in Proc. 5th International Conference on Informatics in Control, Automation and Robotics, May 2008.

[8]

P. Ziaie, T. Müller, M. E. Foster, and A. Knoll, "Using a naïve Bayes classifier based on k-nearest neighbors with distance weighting for static hand-gesture recognition in a human-robot dialog system," in Proceedings of CSICC 2008, Kish Island, Iran, Mar. 2008.

[9]

M. Henning, "A new approach to object-oriented middleware," IEEE Computer Society, vol. 8, no. 1, pp. 66--75, Jan-Feb 2004.

Digital Library

[10]

M. Giuliani and A. Knoll, "Integrating multimodal cues using grammar based models," in Proceedings of HCI International 2007, Beijing, China, July 2007, pp. 858--867.

Digital Library

[11]

M. Rickert, M. E. Foster, M. Giuliani, T. By, G. Panin, and A. Knoll, "Integrating language, vision and action for human robot dialog systems," in Proceedings of the International Conference on Human-Computer Interaction, C. Stephanidis, Ed. Beijing: Springer, July 2007, pp. 987--995.

Digital Library

[12]

J. F. Allen and G. Ferguson, "Actions and events in interval temporal logic," Journal of Logic and Computation, vol. 4, pp. 531--579, 1994.

[13]

A. E. Ades and M. J. Steedman, "On the order of words," Linguistics and philosophy, vol. 4, pp. 517--558, 1982.

[14]

M. Steedman, The syntactic process. Cambridge, MA, USA: MIT Press, 2000.

Digital Library

[15]

K. Ajdukiewicz, "Die syntaktische Konnexität," Studia Philosophica, vol. 1, pp. 1--27, 1935.

[16]

Y. Bar-Hillel, "A quasi-arithmetic notation for syntactic description," Language, vol. 29, pp. 47--58, 1953.

[17]

M. White, "Effcient realization of coordinate structures in combinatory categorial grammar," Research on Language & Computation, vol. 4, no. 1, pp. 39--75, 2006.

[18]

J. Baldridge and G.-J. Kruijff, "Coupling ccg and hybrid logic dependency semantics," in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 02), Philadelphia, PA: University of Pennsylania, 2002.

Digital Library

[19]

D. McNeill, Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press, 1992.

[20]

A. Kendon, Gesture: Visible Action as Utterance. Cambridge University Press, 2004.

Cited By

Donatelli LLai KBrutti RPustejovsky J(2022)Towards Situated AMR: Creating a Corpus of Gesture AMRDigital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Health, Operations Management, and Design10.1007/978-3-031-06018-2_21(293-312)Online publication date: 16-Jun-2022
https://doi.org/10.1007/978-3-031-06018-2_21
Rickert MGaschler AKnoll A(2018)Applications in HHI: Physical CooperationHumanoid Robotics: A Reference10.1007/978-94-007-6046-2_129(2221-2259)Online publication date: 10-Oct-2018
https://doi.org/10.1007/978-94-007-6046-2_129
Van den Bergh JLuyten KCampos JNunes N(2017)DICE-RProceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems10.1145/3102113.3102147(117-122)Online publication date: 26-Jun-2017
https://dl.acm.org/doi/10.1145/3102113.3102147
Show More Cited By

Index Terms

MultiML: a general purpose representation language for multimodal human utterances
1. Human-centered computing

Recommendations

Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

Common approaches to problems involving multiple modalities (classification, retrieval, hyperlinking, etc.) are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially ...
Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking
iV&L-MM '16: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion

Video hyperlinking represents a classical example of multimodal problems. Common approaches to such problems are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially ...
Combining Chat and Task-Based Multimodal Dialogue for More Engaging HRI: A Scalable Method Using Reinforcement Learning
HRI '17: Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction

We develop the first system to combine task-based and chatbot-style dialogue in a multimodal system for Human-Robot Interaction. We show that Reinforcement Learning is beneficial for training dialogue management (DM) in such systems -- providing a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '08: Proceedings of the 10th international conference on Multimodal interfaces

October 2008

322 pages

ISBN:9781605581989

DOI:10.1145/1452392

General Chairs:
Vassilis Digalakis
TU Crete, Greece
,
Alex Potamianos
TU Crete, Greece
,
Matthew Turk
UC Santa Barbara, USA
,
Program Chairs:
Roberto Pieraccini
SpeechCycle, USA
,
Yuri Ivanov
MERL Research, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI '08

Sponsor:

ICMI '08: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES

October 20 - 22, 2008

Crete, Chania, Greece

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
258
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Donatelli LLai KBrutti RPustejovsky J(2022)Towards Situated AMR: Creating a Corpus of Gesture AMRDigital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Health, Operations Management, and Design10.1007/978-3-031-06018-2_21(293-312)Online publication date: 16-Jun-2022
https://doi.org/10.1007/978-3-031-06018-2_21
Rickert MGaschler AKnoll A(2018)Applications in HHI: Physical CooperationHumanoid Robotics: A Reference10.1007/978-94-007-6046-2_129(2221-2259)Online publication date: 10-Oct-2018
https://doi.org/10.1007/978-94-007-6046-2_129
Van den Bergh JLuyten KCampos JNunes N(2017)DICE-RProceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems10.1145/3102113.3102147(117-122)Online publication date: 26-Jun-2017
https://dl.acm.org/doi/10.1145/3102113.3102147
Rickert MGaschler AKnoll A(2017)Applications in HHI: Physical CooperationHumanoid Robotics: A Reference10.1007/978-94-007-7194-9_129-1(1-39)Online publication date: 10-Oct-2017
https://doi.org/10.1007/978-94-007-7194-9_129-1
Dourlens SRamdane-Cherif A(2011)Cognitive Memory for Semantic Agents Architecture in Robotic InteractionInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/jcini.20110101035:1(43-58)Online publication date: Jan-2011
https://doi.org/10.4018/jcini.2011010103
Giuliani MFoster MIsard AMatheson COberlander JKnoll AKelleher JNamee Bvan der Sluis I(2010)Situated reference in a hybrid human-robot interaction systemProceedings of the 6th International Natural Language Generation Conference10.5555/1873738.1873749(67-75)Online publication date: 7-Jul-2010
https://dl.acm.org/doi/10.5555/1873738.1873749
Foster MGiuliani MIsard AMatheson COberlander JKnoll A(2009)Evaluating description and reference strategies in a cooperative human-robot dialogue systemProceedings of the 21st International Joint Conference on Artificial Intelligence10.5555/1661445.1661737(1818-1823)Online publication date: 11-Jul-2009
https://dl.acm.org/doi/10.5555/1661445.1661737
Ferati MBolchini DMannheimer S(2009)Towards a Modeling Language for Designing Auditory InterfacesProceedings of the 5th International Conference on Universal Access in Human-Computer Interaction. Part III: Applications and Services10.1007/978-3-642-02713-0_53(502-511)Online publication date: 14-Jul-2009
https://dl.acm.org/doi/10.1007/978-3-642-02713-0_53

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten