skip to main content
10.1145/1027933.1027952acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Towards integrated microplanning of language and iconic gesture for multimodal output

Published: 13 October 2004 Publication History

Abstract

When talking about spatial domains, humans frequently accompany their explanations with iconic gestures to depict what they are referring to. For example, when giving directions, it is common to see people making gestures that indicate the shape of buildings, or outline a route to be taken by the listener, and these gestures are essential to the understanding of the directions. Based on results from an ongoing study on language and gesture in direction-giving, we propose a framework to analyze such gestural images into semantic units (image description features), and to link these units to morphological features (hand shape, trajectory, etc.). This feature-based framework allows us to generate novel iconic gestures for embodied conversational agents, without drawing on a lexicon of canned gestures. We present an integrated microplanner that derives the form of both coordinated natural language and iconic gesture directly from given communicative goals, and serves as input to the speech and gesture realization engine in our NUMACK project.

References

[1]
Cassell, J. & Prevost, S. Distribution of Semantic Features Across Speech and Gesture by Humans and Computers. In Proc. Workshop on Integration of Gesture in Language and Speech, 1996, Wilmington, DE.
[2]
Cassell, J., McNeill, D. & McCullough, K.E. Speech-Gesture Mismatches: Evidence for One Underlying Representation of Linguistic and Non-Linguistic Information. Pragmatics and Cognition 7(1): 1--33, 1999.
[3]
Cassell, J., Stone, M. & Yan, H. Coordination and context-dependence in the generation of embodied conversation. In Proc. INLG 2000. Mitzpe Ramon, Israel.
[4]
Cassell, J., Vilhjalmsson, H. & Bickmore, T. BEAT: the behavior expression animation toolkit. In Proc. SIGGRAPH 2001, pp. 477--486.
[5]
Chi, D., Costa, M., Zhao, L. & Badler, N. The EMOTE model for effort and shape. In Proc. SIGGRAPH 2000, pp. 173--182.
[6]
Clark, H. H. Using Language. Cambridge Univ. Press, 1996.
[7]
de Ruiter, J.P. The production of gesture and speech. In McNeill, D. (ed.) Language and Gesture. Cambridge, UK: Cambridge University Press, 2000.
[8]
Emmorey, K., Tversky, B. & Taylor, H.A. Using space to describe space: Perspective in speech, sign, and gesture. Spatial Cognition and Computation 2:3, pp. 157--180, 2000.
[9]
Gao, Y. Automatic extraction of spatial location for gesture generation, Master thesis, MIT Dept. of Electrical Engineering and Computer Science, 2002.
[10]
Green, N., G. Carenini, et al. A Media-Independent Content Language for Integrated Text and Graphics Generation. Workshop on Content Visualization and Intermedia Representations at COLING/ACL '98, Montreal, 1998.
[11]
Herskovits, A. Language and Spatial Cognition: An Interdisciplinary Study of the Prepositions in English. Cambridge University Press, 1986.
[12]
Joshi, A.K. An Introduction to Tree Adjoining Grammars, In A. Manaster-Ramer (ed), Mathematics of Language, Amsterdam: John Benjamins, pp. 87--114, 1987.
[13]
Kerpedjiev, S., Carenini, G., Roth, S. & Moore, J. D. AutoBrief: A multimedia presentation system for assisting data analysis. In Computer Standards and Interfaces, 18: 583--593, 1997.
[14]
Koons, D.B., Sparrell, C.J., & Thorisson, K. Integrating simultaneous input from speech, gaze, and hand gestures. In M. Maybury (ed), Intelligent Multimedia Interfaces, pp. 252-276. Menlo Park, CA: MIT Press, 1993.
[15]
Kopp, S. & Wachsmuth, I. Synthesizing Multimodal Utterances for Conversational Agents. Computer Animation and Virtual Worlds: 15(1), pp. 39--52, 2004.
[16]
Landau, B. & Jackendoff, R. What and where in spatial language and spatial cognition. Behavioral and Brain Sciences 16: 217--265, 1993.
[17]
McNeill, D. & Levy, E. Conceptual representations in language activity and gesture. In R. Jarvella, & W. Klein (eds.): Speech, Place, and Action, John Wiley & Sons, 1982.
[18]
McNeill, D., Hand and Mind: What Gestures Reveal About Thought. Chicago, IL: Univ. of Chicago Press, 1992.
[19]
McNeill, D. Catchments and Contexts: Non-modular factors in speech and gesture production. In D. McNeill (ed.): Language and Gesture. Cambridge University Press, 2000.
[20]
Nijholt, A., Theune, M. & Heylen, D. Embodied Language Generation, In O. Stock & M. Zancanaro (eds): Intelligent Information Presentation, Kluwer, 2004.
[21]
Perlin, K. & Goldberg, A. Improv: A System for Scripting Interactive Actors in Virtual Worlds. In Proc. SIGGRAPH '96, pp. 205--216.
[22]
Pelachaud, C. & Poggi, I. Multimodal Embodied Agents, In Autonomous Agents Workshop Multimodal Communication and Context in Embodied Agents, pp. 95--99, 2001.
[23]
Reiter, E. & Dale, R. Building Natural Language Generation Systems. Cambridge, UK: Cambridge University Press, 2000.
[24]
Rickel, J., Marsella, S., Gratch, J., Hill, R., Traum, D. & Swartout, W. Toward a New Generation of Virtual Humans for Interactive Experiences. IEEE Intelligent Systems 17(4): 32--38, 2002.
[25]
Sowa, T. & Wachsmuth, I. Coverbal Iconic Gestures for Object Descriptions in Virtual Environments: An Empirical Study. In M. Rector, I. Poggi & N. Trigo (eds.): Proc. "Gestures. Meaning and Use", pp. 365--376, 2003.
[26]
Stone, M., Doran, C., Webber, B., Bleam, T. & Palmer, M. Microplanning with communicative intentions: the SPUD system. Computational Intelligence 19(4): 311--381, 2003.
[27]
Towns, S., Callaway, C. & Lester, J. Generating Coordinated Natural Language and 3D Animations for Complex Spatial Explanations. In Proc. AAAI-98, pp. 112--119, 1998.
[28]
Traum, D. & Rickel, J. Embodied Agents for Multi-party Dialogue in Immersive Virtual Worlds. In Proc. Autonomous Agents and Multi-Agent Systems, pp. 766--773, 2002.
[29]
Yan, H. Paired Speech and Gesture Generation in Embodied Conversational Agents. Masters Thesis. MIT, School of Architecture and Planning, 2000.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces
October 2004
368 pages
ISBN:1581139950
DOI:10.1145/1027933
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. embodied conversational agents
  2. generation
  3. gesture
  4. language
  5. multimodal output

Qualifiers

  • Article

Conference

ICMI04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)3
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Body Language Between Humans and MachinesBody Language Communication10.1007/978-3-031-70064-4_18(443-476)Online publication date: 2-Jan-2025
  • (2024)Semantics of GestureAnnual Review of Linguistics10.1146/annurev-linguistics-022421-06305710:1(169-189)Online publication date: 16-Jan-2024
  • (2024)Deliberative Communication for Human-Agent Interaction: A Position PaperProceedings of the 12th International Conference on Human-Agent Interaction10.1145/3687272.3688299(11-16)Online publication date: 24-Nov-2024
  • (2023)Multimodal languaging: Reification profiles in language and gestureLinguistic Frontiers10.2478/lf-2023-00146:2(78-91)Online publication date: 26-Oct-2023
  • (2021)On the Interaction of Gestural and Linguistic Perspective TakingFrontiers in Communication10.3389/fcomm.2021.6257576Online publication date: 15-Jun-2021
  • (2021)Assessing Collaborative Physical Tasks Via Gestural AnalysisIEEE Transactions on Human-Machine Systems10.1109/THMS.2021.305130551:2(152-161)Online publication date: Apr-2021
  • (2021)Examining the Use of Nonverbal Communication in Virtual AgentsInternational Journal of Human–Computer Interaction10.1080/10447318.2021.189885137:17(1648-1673)Online publication date: 28-Mar-2021
  • (2020)Intrapersonal dependencies in multimodal behaviorProceedings of the 20th ACM International Conference on Intelligent Virtual Agents10.1145/3383652.3423872(1-8)Online publication date: 20-Oct-2020
  • (2019)Towards a Gesture-Based Story Authoring System: Design Implications from Feature Analysis of Iconic Gestures During StorytellingInteractive Storytelling10.1007/978-3-030-33894-7_38(364-373)Online publication date: 22-Oct-2019
  • (2018)Hand Gesture Synthesis for Conversational CharactersHandbook of Human Motion10.1007/978-3-319-14418-4_5(2201-2212)Online publication date: 5-Apr-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media