skip to main content
research-article

Gesture modeling and animation based on a probabilistic re-creation of speaker style

Published: 20 March 2008 Publication History

Abstract

Animated characters that move and gesticulate appropriately with spoken text are useful in a wide range of applications. Unfortunately, this class of movement is very difficult to generate, even more so when a unique, individual movement style is required. We present a system that, with a focus on arm gestures, is capable of producing full-body gesture animation for given input text in the style of a particular performer. Our process starts with video of a person whose gesturing style we wish to animate. A tool-assisted annotation process is performed on the video, from which a statistical model of the person's particular gesturing style is built. Using this model and input text tagged with theme, rheme and focus, our generation algorithm creates a gesture script. As opposed to isolated singleton gestures, our gesture script specifies a stream of continuous gestures coordinated with speech. This script is passed to an animation system, which enhances the gesture description with additional detail. It then generates either kinematic or physically simulated motion based on this description. The system is capable of generating gesture animations for novel text that are consistent with a given performer's style, as was successfully validated in an empirical user study.

References

[1]
Albrecht, I., Haber, J., and Seidel, H.-P. 2002. Automatic generation of non-verbal facial expressions from speech. In Proceedings of the Computer Graphics International. 283--293.
[2]
Axtell, R. E. 1998. Gestures---The Do's and Taboo's of Body Language Around the World. John Wiley & Sons, Inc., New York.
[3]
Badler, N. I., Phillips, C. B., and Webber, B. L. 1992. Simulating Humans: Computer Graphics, Animation and Control. Oxford University Press.
[4]
Boersma, P. and Weenink, D. 2005. Praat: Doing phonetics by computer (version 4.3.14). http://www.praat.org/.
[5]
Calbris, G. 1990. Semiotics of French Gesture. Indiana University Press, Bloomington, IN.
[6]
Cassell, J., Nakano, Y., Bickmore, T., Sidner, C., and Rich, C. 2001. Non-verbal cues for discourse structure. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 106--115.
[7]
Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., and Stone, M. 1994. Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In Proceedings of SIGGRAPH. 413--420.
[8]
Cassell, J., Vilhjálmsson, H., and Bickmore, T. 2001. BEAT: The Behavior Expression Animation Toolkit. In Proceedings of SIGGRAPH. 477--486.
[9]
Chi, D. M., Costa, M., Zhao, L., and Badler, N. I. 2000. The EMOTE model for effort and shape. In Proceedings of SIGGRAPH. 173--182.
[10]
Cohen, M. and Massaro, D. 1993. Modeling coarticulation in synthetic visual speech. In Models and Techniques in Computer Animation, N. Magnenat-Thalmann and D. Thalmann, Eds. Springer. 139--156.
[11]
de Ruiter, J. 2000. The production of gesture and speech. In Language and Gesture: Window into Thought and Action, D. McNeill, Ed. Cambridge University Press, Cambridge, UK. 284--311.
[12]
Faloutsos, P., van de Panne, M., and Terzopoulos, D. 2001. The virtual stuntman: Dynamic characters with a repertoire of autonomous motor skills. Comput. Graph. 25, 6, 933--953.
[13]
Finkler, W. and Neumann, G. 1988. MORPHIX. A Fast Realization of a Classification-Based Approach to Morphology. In Proceedings 4. Österreichische Artificial-Intelligence-Tagung. Wiener Workshop---Wissensbasierte Sprachverarbeitung. H. Trost, Ed. Springer. 11--19.
[14]
Frey, S. 1999. Die Macht des Bildes: der Einfluβ der nonverbalen Kommunikation auf Kultur und Politik. Verlag Hans Huber, Bern.
[15]
Hartmann, B., Mancini, M., Buisine, S., and Pelachaud, C. 2005. Design and evaluation of expressive gesture synthesis for embodied conversational agents. In Proceedings of the 4th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM Press.
[16]
Hartmann, B., Mancini, M., and Pelachaud, C. 2002. Formational parameters and adaptive prototype installation for MPEG-4 compliant gesture synthesis. In Proceedings of Computer Animation. 111--119.
[17]
Hartmann, B., Mancini, M., and Pelachaud, C. 2006. Implementing expressive gesture synthesis for embodied conversational agents. In Proceedings of Gesture Workshop. Lecture Notes Artificial Intelligence, vol. 3881. Springer. 45--55.
[18]
Hodgins, J. K., Wooten, W. L., Brogan, D. C., and O'Brien, J. F. 1995. Animating human athletics. In Proceedings of SIGGRAPH. 71--78.
[19]
Hollars, M. G., Rosenthal, D. E., and Sherman, M. A. 1994. SD/FAST User's Manual. Symbolic Dynamics Inc.
[20]
Huenerfauth, M., Zhou, L., Gu, E., and Allbeck, J. 2007. Design and evaluation of an american sign language generator. In Proceedings of the Workshop on Embodied Language Processing. Association for Computational Linguistics. 51--58.
[21]
Jurafsky, D. and Martin, J. H. 2003. Speech and Language Processing. Prentice Hall.
[22]
Kendon, A. 1980. Gesticulation and speech: Two aspects of the process of utterance. In The Relationship of Verbal and Nonverbal Communication, M. Key, Ed. Mouton Publisher, The Hague, Netherlands. 207--227.
[23]
Kendon, A. 2004. Gesture---Visible Action as Utterance. Cambridge University Press, Cambridge.
[24]
Kipp, M. 2001. Anvil---a Generic Annotation Tool for Multimodal Dialogue. In Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech). Aalborg, Denmark. 1367--1370.
[25]
Kipp, M. 2004. Gesture generation by imitation: From human behavior to computer character animation. Dissertation.com, Boca Raton, FL.
[26]
Kipp, M., Neff, M., and Albrecht, I. 2006. An Annotation Scheme for Conversational Gestures: How to economically capture timing and form. In Proceedings of the Workshop on Multimodal Corpora (LREC'06). 24--27.
[27]
Kipp, M., Neff, M., Kipp, K., and Albrecht, I. 2007. Towards natural gesture synthesis: Evaluating gesture units in a data-driven approach to gesture synthesis. In Proceedings of Intelligent Virtual Agents (IVA'07). Lecture Notes in Artificial Intelligence, vol. 4722, 15--28.
[28]
Kita, S., van Gijn, I., and van der Hulst, H. 1998. Movement phases in signs and co-speech gestures, and their transcription by human coders. In Gesture and Sign Language in Human-Computer Interaction, I. Wachsmuth and M. Fröhlich, Eds. Springer, 23--35.
[29]
Kopp, S., Sowa, T., and Wachsmuth, I. 2004a. Imitation games with an artificial agent: From mimicking to understanding shape-related iconic gestures. In Proceedings of the Gesture Workshop. Lecture Notes in Artificial Intelligence, vol. 2915. Springer. 436--447.
[30]
Kopp, S., Tepper, P., and Cassell, J. 2004b. Towards integrated microplanning of language and iconic gesture for multimodal output. In Proceedings of the International Conference on Multimodal Interfaces. 97--104.
[31]
Kopp, S. and Wachsmuth, I. 2004c. Synthesizing multimodal utterances for conversational agents. Comput. Anim. Virt. Worlds 15, 39--52.
[32]
Mann, W. C. and Thompson, S. A. 1988. Rhetorical Structure Theory: Toward a functional theory of text organization. Text 8, 3, 243--281.
[33]
Martell, C. H. 2004. Form: An extensible, kinematically based gesture annotation scheme. In Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems. Kluwer Academic Press.
[34]
Martin, J.-C., Niewiadomski, R., Devillers, L., Buisine, S., and Pelachaud, C. 2006. Multimodal complex emotions: Gesture expressivity and blended facial expressions. J. Huma. Robot. 3, 269--291.
[35]
McNeill, D. 1992. Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago, IL.
[36]
McNeill, D. 2005. Gesture and Thought. University of Chicago Press, Chicago, IL.
[37]
Miller, G. A., Beckwith, R., Felbaum, C., Gross, D., and Miller, K. 1990. Introduction to WordNet: an online lexical database. Int. J. Lexicog. 3, 4, 235--244.
[38]
Neff, M. 2005. Aesthetic exploration and refinement: A computational framework for expressive character animation. Ph.D. Dissertation. Department of Computer Science, University of Toronto.
[39]
Neff, M. and Fiume, E. 2002. Modeling tension and relaxation for computer animation. In Proceedings of ACM SIGGRAPH Symposium on Computer Animation. 81--88.
[40]
Neff, M. and Fiume, E. 2005. AER: Aesthetic Exploration and Refinement for expressive character animation. In Proceedings of ACM SIGGRAPH / Eurographics Symposium on Computer Animation. 161--170.
[41]
Neff, M. and Fiume, E. 2006. Methods for exploring expressive stance. Graphic. Models 68, 2, 133--157.
[42]
Neff, M. and Seidel, H.-P. 2006. Modeling relaxed hand shape for character animation. In Articulated Motion and Deformable Objects (AMDO'06). Lecture Notes in Computer Science, vol. 4069. Springer.
[43]
Noma, T., Zhao, L., and Badler, N. 2000. Design of a virtual human presenter. IEEE Comput. Graph. Appl. 20, 4, 79--85.
[44]
Noot, H. and Ruttkay, Z. 2004. Gesture in style. In Proceedings of Gesture Workshop. Lecture Notes in Artificial Intelligence, vol. 2915. Springer, 324--337.
[45]
Pollard, N. S. and Zordan, V. B. 2005. Physically based grasping control from example. In Proceedings of ACM SIGGRAPH / Eurographics Symposium on Computer Animation. 311--318.
[46]
Popovic, Z. and Witkin, A. 1999. Physically based motion transformation. In Proceedings of SIGGRAPH. 11--20.
[47]
Press, W. H., Tukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 1992. Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge University Press, Cambridge, UK.
[48]
Saitz, R. L. and Cervenka, E. J. 1972. Handbook of Gestures: Colombia and the United States, second ed. Mouton, The Hague, The Netherlands.
[49]
Scheflen, A. E. 1964. The significance of posture in communication systems. Psychiatry 26, 316--331.
[50]
Schegloff, E. A. 1984. On some gestures' relation to talk. In Structures of Social Action. J. M. Atkinson and J. Heritage, Eds. Cambridge University Press, Cambridge, UK. 266--296.
[51]
Shapiro, A., Faloutsos, P., and Ng-Thow-Hing, V. 2005. Dynamic animation and control environment. Graphics Interface, 61--70.
[52]
Steedman, M. 2000. Information structure and the syntax-phonology interface. Linguist. Inq. 34, 649--689.
[53]
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., and Bregler, C. 2004. Speaking with hands: Creating animated conversational characters from recordings of human performance. In Proceedings of SIGGRAPH. 506--513.
[54]
Tan, R. and Davis, J. 2004. Differential video coding of face and gesture events in presentation videos. Comput. Vision Image Understand. 96, 2, 200--215.
[55]
Webb, R. 1997. Linguistic Properties of Metaphoric Gestures. UMI, New York, NY.
[56]
Witkin, A. and Kass, M. 1988. Spacetime constraints. In Proceedings of SIGGRAPH. 159--168.
[57]
Witkin, A. and Popovic, Z. 1995. Motion warping. In Proceedings of SIGGRAPH. 105--108.
[58]
Zordan, V. B. and Hodgins, J. K. 2002. Motion capture-driven simulations that hit and react. In Proceedings of ACM SIGGRAPH Symposium on Computer Animation. 89--96.

Cited By

View all
  • (2024)Selecting Iconic Gesture Forms Based on Typical Entity ImagesJournal of Information Processing10.2197/ipsjjip.32.19632(196-205)Online publication date: 2024
  • (2024)Semantic Gesticulator: Semantics-Aware Co-Speech Gesture SynthesisACM Transactions on Graphics10.1145/365813443:4(1-17)Online publication date: 19-Jul-2024
  • (2024)Actor Takeover of Animated Characters2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00361(1134-1135)Online publication date: 16-Mar-2024
  • Show More Cited By

Index Terms

  1. Gesture modeling and animation based on a probabilistic re-creation of speaker style

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Graphics
    ACM Transactions on Graphics  Volume 27, Issue 1
    March 2008
    135 pages
    ISSN:0730-0301
    EISSN:1557-7368
    DOI:10.1145/1330511
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 March 2008
    Accepted: 01 November 2007
    Revised: 01 May 2007
    Received: 01 August 2006
    Published in TOG Volume 27, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Human modeling
    2. character animation
    3. gesture

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Selecting Iconic Gesture Forms Based on Typical Entity ImagesJournal of Information Processing10.2197/ipsjjip.32.19632(196-205)Online publication date: 2024
    • (2024)Semantic Gesticulator: Semantics-Aware Co-Speech Gesture SynthesisACM Transactions on Graphics10.1145/365813443:4(1-17)Online publication date: 19-Jul-2024
    • (2024)Actor Takeover of Animated Characters2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00361(1134-1135)Online publication date: 16-Mar-2024
    • (2024)Music-stylized hierarchical dance synthesis with user controlVirtual Reality & Intelligent Hardware10.1016/j.vrih.2024.06.0046:5(339-357)Online publication date: Oct-2024
    • (2024)Dual-Path Transformer-Based GAN for Co-speech Gesture SynthesisInternational Journal of Social Robotics10.1007/s12369-024-01136-yOnline publication date: 13-May-2024
    • (2024)ASAP for multi-outputs: auto-generating storyboard and pre-visualization with virtual actors based on screenplayMultimedia Tools and Applications10.1007/s11042-024-19904-3Online publication date: 3-Aug-2024
    • (2023)Zero-shot style transfer for gesture animation driven by text and speech using adversarial disentanglement of multimodal style encodingFrontiers in Artificial Intelligence10.3389/frai.2023.11429976Online publication date: 12-Jun-2023
    • (2023)Data-Driven Communicative Behaviour Generation: A SurveyACM Transactions on Human-Robot Interaction10.1145/3609235Online publication date: 16-Aug-2023
    • (2023)GestureDiffuCLIP: Gesture Diffusion Model with CLIP LatentsACM Transactions on Graphics10.1145/359209742:4(1-18)Online publication date: 26-Jul-2023
    • (2023)Large language models in textual analysis for gesture selectionProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614158(378-387)Online publication date: 9-Oct-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media