Article

Towards integrated microplanning of language and iconic gesture for multimodal output

Authors:

Justine CassellAuthors Info & Claims

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

Pages 97 - 104

https://doi.org/10.1145/1027933.1027952

Published: 13 October 2004 Publication History

Abstract

When talking about spatial domains, humans frequently accompany their explanations with iconic gestures to depict what they are referring to. For example, when giving directions, it is common to see people making gestures that indicate the shape of buildings, or outline a route to be taken by the listener, and these gestures are essential to the understanding of the directions. Based on results from an ongoing study on language and gesture in direction-giving, we propose a framework to analyze such gestural images into semantic units (image description features), and to link these units to morphological features (hand shape, trajectory, etc.). This feature-based framework allows us to generate novel iconic gestures for embodied conversational agents, without drawing on a lexicon of canned gestures. We present an integrated microplanner that derives the form of both coordinated natural language and iconic gesture directly from given communicative goals, and serves as input to the speech and gesture realization engine in our NUMACK project.

References

[1]

Cassell, J. & Prevost, S. Distribution of Semantic Features Across Speech and Gesture by Humans and Computers. In Proc. Workshop on Integration of Gesture in Language and Speech, 1996, Wilmington, DE.

[2]

Cassell, J., McNeill, D. & McCullough, K.E. Speech-Gesture Mismatches: Evidence for One Underlying Representation of Linguistic and Non-Linguistic Information. Pragmatics and Cognition 7(1): 1--33, 1999.

[3]

Cassell, J., Stone, M. & Yan, H. Coordination and context-dependence in the generation of embodied conversation. In Proc. INLG 2000. Mitzpe Ramon, Israel.

Digital Library

[4]

Cassell, J., Vilhjalmsson, H. & Bickmore, T. BEAT: the behavior expression animation toolkit. In Proc. SIGGRAPH 2001, pp. 477--486.

Digital Library

[5]

Chi, D., Costa, M., Zhao, L. & Badler, N. The EMOTE model for effort and shape. In Proc. SIGGRAPH 2000, pp. 173--182.

Digital Library

[6]

Clark, H. H. Using Language. Cambridge Univ. Press, 1996.

[7]

de Ruiter, J.P. The production of gesture and speech. In McNeill, D. (ed.) Language and Gesture. Cambridge, UK: Cambridge University Press, 2000.

[8]

Emmorey, K., Tversky, B. & Taylor, H.A. Using space to describe space: Perspective in speech, sign, and gesture. Spatial Cognition and Computation 2:3, pp. 157--180, 2000.

Digital Library

[9]

Gao, Y. Automatic extraction of spatial location for gesture generation, Master thesis, MIT Dept. of Electrical Engineering and Computer Science, 2002.

[10]

Green, N., G. Carenini, et al. A Media-Independent Content Language for Integrated Text and Graphics Generation. Workshop on Content Visualization and Intermedia Representations at COLING/ACL '98, Montreal, 1998.

[11]

Herskovits, A. Language and Spatial Cognition: An Interdisciplinary Study of the Prepositions in English. Cambridge University Press, 1986.

Digital Library

[12]

Joshi, A.K. An Introduction to Tree Adjoining Grammars, In A. Manaster-Ramer (ed), Mathematics of Language, Amsterdam: John Benjamins, pp. 87--114, 1987.

[13]

Kerpedjiev, S., Carenini, G., Roth, S. & Moore, J. D. AutoBrief: A multimedia presentation system for assisting data analysis. In Computer Standards and Interfaces, 18: 583--593, 1997.

Digital Library

[14]

Koons, D.B., Sparrell, C.J., & Thorisson, K. Integrating simultaneous input from speech, gaze, and hand gestures. In M. Maybury (ed), Intelligent Multimedia Interfaces, pp. 252-276. Menlo Park, CA: MIT Press, 1993.

Digital Library

[15]

Kopp, S. & Wachsmuth, I. Synthesizing Multimodal Utterances for Conversational Agents. Computer Animation and Virtual Worlds: 15(1), pp. 39--52, 2004.

[16]

Landau, B. & Jackendoff, R. What and where in spatial language and spatial cognition. Behavioral and Brain Sciences 16: 217--265, 1993.

[17]

McNeill, D. & Levy, E. Conceptual representations in language activity and gesture. In R. Jarvella, & W. Klein (eds.): Speech, Place, and Action, John Wiley & Sons, 1982.

[18]

McNeill, D., Hand and Mind: What Gestures Reveal About Thought. Chicago, IL: Univ. of Chicago Press, 1992.

[19]

McNeill, D. Catchments and Contexts: Non-modular factors in speech and gesture production. In D. McNeill (ed.): Language and Gesture. Cambridge University Press, 2000.

[20]

Nijholt, A., Theune, M. & Heylen, D. Embodied Language Generation, In O. Stock & M. Zancanaro (eds): Intelligent Information Presentation, Kluwer, 2004.

[21]

Perlin, K. & Goldberg, A. Improv: A System for Scripting Interactive Actors in Virtual Worlds. In Proc. SIGGRAPH '96, pp. 205--216.

Digital Library

[22]

Pelachaud, C. & Poggi, I. Multimodal Embodied Agents, In Autonomous Agents Workshop Multimodal Communication and Context in Embodied Agents, pp. 95--99, 2001.

[23]

Reiter, E. & Dale, R. Building Natural Language Generation Systems. Cambridge, UK: Cambridge University Press, 2000.

Digital Library

[24]

Rickel, J., Marsella, S., Gratch, J., Hill, R., Traum, D. & Swartout, W. Toward a New Generation of Virtual Humans for Interactive Experiences. IEEE Intelligent Systems 17(4): 32--38, 2002.

Digital Library

[25]

Sowa, T. & Wachsmuth, I. Coverbal Iconic Gestures for Object Descriptions in Virtual Environments: An Empirical Study. In M. Rector, I. Poggi & N. Trigo (eds.): Proc. "Gestures. Meaning and Use", pp. 365--376, 2003.

[26]

Stone, M., Doran, C., Webber, B., Bleam, T. & Palmer, M. Microplanning with communicative intentions: the SPUD system. Computational Intelligence 19(4): 311--381, 2003.

[27]

Towns, S., Callaway, C. & Lester, J. Generating Coordinated Natural Language and 3D Animations for Complex Spatial Explanations. In Proc. AAAI-98, pp. 112--119, 1998.

Digital Library

[28]

Traum, D. & Rickel, J. Embodied Agents for Multi-party Dialogue in Immersive Virtual Worlds. In Proc. Autonomous Agents and Multi-Agent Systems, pp. 766--773, 2002.

Digital Library

[29]

Yan, H. Paired Speech and Gesture Generation in Embodied Conversational Agents. Masters Thesis. MIT, School of Architecture and Planning, 2000.

Cited By

Wang IRuiz JKappas A(2025)Body Language Between Humans and MachinesBody Language Communication10.1007/978-3-031-70064-4_18(443-476)Online publication date: 2-Jan-2025
https://doi.org/10.1007/978-3-031-70064-4_18
Ebert C(2024)Semantics of GestureAnnual Review of Linguistics10.1146/annurev-linguistics-022421-06305710:1(169-189)Online publication date: 16-Jan-2024
https://doi.org/10.1146/annurev-linguistics-022421-063057
Sabu KRenoux JSaffiotti A(2024)Deliberative Communication for Human-Agent Interaction: A Position PaperProceedings of the 12th International Conference on Human-Agent Interaction10.1145/3687272.3688299(11-16)Online publication date: 24-Nov-2024
https://dl.acm.org/doi/10.1145/3687272.3688299
Show More Cited By

Index Terms

Towards integrated microplanning of language and iconic gesture for multimodal output
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Natural language interfaces

Recommendations

Multimodal human discourse: gesture and speech

Gesture and speech combine to form a rich basis for human conversational interaction. To exploit these modalities in HCI, we need to understand the interplay between them and the way in which they support communication. We propose a framework for the ...
Who's next?: Integrating Non-Verbal Turn-Taking Cues for Embodied Conversational Agents
IVA '23: Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents

Taking turns in a conversation is a delicate interplay of various signals, which we as humans can easily decipher. Embodied conversational agents (ECAs) communicating with humans should leverage this ability for smooth and enjoyable conversations. ...
Increasing the expressiveness of virtual agents: autonomous generation of speech and gesture for spatial description tasks
AAMAS '09: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

Embodied conversational agents are required to be able to express themselves convincingly and autonomously. Based on an empirial study on spatial descriptions of landmarks in direction-giving, we present a model that allows virtual agents to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

October 2004

368 pages

ISBN:1581139950

DOI:10.1145/1027933

General Chairs:
Rajeev Sharma
Advanced Interfaces
,
Trevor Darrell
Massachusetts Institute of Technology
,
Program Chairs:
Mary Harper
Purdue University, West Lafayette, IN
,
Gianni Lazzari
ITC-IRST
,
Matthew Turk
University of California, Santa Barbara, CA

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI04

Sponsor:

ICMI04: Sixth International Conference on Multimodal Interfaces 2004

October 13 - 15, 2004

PA, State College, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
724
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)3

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang IRuiz JKappas A(2025)Body Language Between Humans and MachinesBody Language Communication10.1007/978-3-031-70064-4_18(443-476)Online publication date: 2-Jan-2025
https://doi.org/10.1007/978-3-031-70064-4_18
Ebert C(2024)Semantics of GestureAnnual Review of Linguistics10.1146/annurev-linguistics-022421-06305710:1(169-189)Online publication date: 16-Jan-2024
https://doi.org/10.1146/annurev-linguistics-022421-063057
Sabu KRenoux JSaffiotti A(2024)Deliberative Communication for Human-Agent Interaction: A Position PaperProceedings of the 12th International Conference on Human-Agent Interaction10.1145/3687272.3688299(11-16)Online publication date: 24-Nov-2024
https://dl.acm.org/doi/10.1145/3687272.3688299
Iriskhanova OKiose MLeonteva AAgafonova O(2023)Multimodal languaging: Reification profiles in language and gestureLinguistic Frontiers10.2478/lf-2023-00146:2(78-91)Online publication date: 26-Oct-2023
https://doi.org/10.2478/lf-2023-0014
Hinterwimmer SPatil UEbert C(2021)On the Interaction of Gestural and Linguistic Perspective TakingFrontiers in Communication10.3389/fcomm.2021.6257576Online publication date: 15-Jun-2021
https://doi.org/10.3389/fcomm.2021.625757
Rojas-Munoz EWachs J(2021)Assessing Collaborative Physical Tasks Via Gestural AnalysisIEEE Transactions on Human-Machine Systems10.1109/THMS.2021.305130551:2(152-161)Online publication date: Apr-2021
https://doi.org/10.1109/THMS.2021.3051305
Wang IRuiz J(2021)Examining the Use of Nonverbal Communication in Virtual AgentsInternational Journal of Human–Computer Interaction10.1080/10447318.2021.189885137:17(1648-1673)Online publication date: 28-Mar-2021
https://doi.org/10.1080/10447318.2021.1898851
Blomsma PLinders GVaitonyte JLouwerse M(2020)Intrapersonal dependencies in multimodal behaviorProceedings of the 20th ACM International Conference on Intelligent Virtual Agents10.1145/3383652.3423872(1-8)Online publication date: 20-Oct-2020
https://dl.acm.org/doi/10.1145/3383652.3423872
Brown SChu SQuek FCanaday PLi QLoustau TWu SZhang L(2019)Towards a Gesture-Based Story Authoring System: Design Implications from Feature Analysis of Iconic Gestures During StorytellingInteractive Storytelling10.1007/978-3-030-33894-7_38(364-373)Online publication date: 22-Oct-2019
https://doi.org/10.1007/978-3-030-33894-7_38
Neff M(2018)Hand Gesture Synthesis for Conversational CharactersHandbook of Human Motion10.1007/978-3-319-14418-4_5(2201-2212)Online publication date: 5-Apr-2018
https://doi.org/10.1007/978-3-319-14418-4_5
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten