Article

Transferable videorealistic speech animation

Authors:

Tony EzzatAuthors Info & Claims

SCA '05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation

Pages 143 - 151

https://doi.org/10.1145/1073368.1073388

Published: 29 July 2005 Publication History

Abstract

Image-based videorealistic speech animation achieves significant visual realism at the cost of the collection of a large 5- to 10-minute video corpus from the specific person to be animated. This requirement hinders its use in broad applications, since a large video corpus for a specific person under a controlled recording setup may not be easily obtained In this paper, we propose a model transfer and adaptation algorithm which allows for a novel person to be animated using only a small video corpus. The algorithm starts with a multidimensional morphable model (MMM) previously trained from a different speaker with a large corpus, and transfers it to the novel speaker with a much smaller corpus. The algorithm consists of 1) a novel matching-by-synthesis algorithm which semi-automatically selects new MMM prototype images from the new video corpus and 2) a novel gradient descent linear regression algorithm which adapts the MMM phoneme models to the data in the novel video corpus. Encouraging experimental results are presented in which a morphable model trained from a performer with a 10-minute corpus is transferred to a novel person using a 15-second movie clip of him as the adaptation video corpus.

References

[1]

{BBPV03} Blanz V., Basso C., Poggio T., Vetter T.: Reanimating faces in images and video. In Proc. Eurographics '03 (2003), vol. 22.

[2]

{BCS97} Bregler C., Covell M., Slaney M.: Video rewrite: Driving visual speech with audio. In Proc. SIGGRAPH '97 (1997), pp. 353--360.

Digital Library

[3]

{Bis95} Bishop C. M.: Neural Networks for Pattern Recognition. Oxford University Press, 1995.

Digital Library

[4]

{BP95} Beymer D., Poggio T.: Face recognition from one example view. In Proc. IEEE 5th International Conference on Computer Vision (1995), pp. 500--507.

Digital Library

[5]

{CC02} Chang Y. J., Chen Y. C.: Facial model adaptation from a monocular image sequence using a textured polygonal model. Signal Processing: Image Communication 17, 5 (May 2002), 373--392.

[6]

{CFKP04} Cao Y., Faloutsos P., Kohler E., Pighin F.: Real-time speech motion synthesis from recorded motions. In Proc. 2004 ACM SIGGRAPH/Eurographics Sympsium on Computer Animation (2004), pp. 347--355.

Digital Library

[7]

{CG00} Cosatto E., Graf H. P.: Photo-realistic talking-heads from image samples. IEEE Trans. on Multimedia 2, 3 (Sept. 2000), 152--163.

Digital Library

[8]

{EGP02} Ezzat T., Geiger G., Poggio T.: Trainable videorealistic speech animation. In Proc. SIGGRAPH '02 (2002), vol. 21, pp. 388--397.

Digital Library

[9]

{Gal98} Gales M. J. F.: Cluster adaptive training for speech recognition. In Proc. the 5th International Conference on Spoken Language Processing (1998), pp. 1783--1786.

[10]

{GCSH02} Graf H. P., Cosatto E., Strom V., Huang F. J.: Visual prosody: facial movements accompanying speech. In Proc. 5th IEEE International Conference on Automatic Face and Gesture Recognition (2002), pp. 381--386.

Digital Library

[11]

{GL94} Gauvain J. L., Lee C. H.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. on Speech and Audio Processing 2, 2 (Apr. 1994), 291--298.

[12]

{Gle98} Gleicher M.: Retargetting motion to new characters. In Proc. SIGGRAPH '98 (1998), pp. 33--42.

Digital Library

[13]

{HAH01} Huang X., Acero A., Hon H. W.: Spoken language processing: a guide to theory, algorithm and system development. Pearson Education, 2001.

Digital Library

[14]

{JP98} Jones M., Poggio T.: Multidimensional morphable models: a framework for representing and matching object classes. International Journal of Computer Vision 29, 2 (Aug. 1998), 107--131.

Digital Library

[15]

{KNJ*98} Kuhn R., Nguyen P., Junqua J. C., Goldwasser L., Niedzielski N., Fincke S., Field K., Contolini M.: Eigenvoices for speaker adaptation. In Proc. the 5th International Conference on Spoken Language Processing (1998), pp. 1771--1774.

[16]

{LSZ01} Liu Z., Shan Y., Zhang Z.: Expressive expression mapping with ratio images. In Proc. SIGGRAPH '01 (2001), pp. 271--276.

Digital Library

[17]

{LW95} Leggetter C. J., Woodland P. C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language 9, 2 (1995), 171--185.

[18]

{NJ04} Na K., Jung M.: Hierarchical retargetting of fine facial motions. In Proc. Eurographics '04 (2004).

[19]

{NN01} Noh J. Y., Neumann U.: Expression cloning. In Proc. SIGGRAPH '01 (2001), pp. 277--288.

Digital Library

[20]

{PHL*98} Pighin F., Hecker J., Lischinski D., Szeliski R., Salesin D.: Synthesizing realistic facial expressions from photographs. In Proc. SIGGRAPH '98 (1998), pp. 75--84.

Digital Library

[21]

{SSSE00} Schödl A., Szeliski R., Salesin D. H., Essa I.: Video textures. In Proc. SIGGRAPH '00 (2000), pp. 489--498.

Digital Library

[22]

{WHL*04} Wang Y., Huang X., Lee C. S., Zhang S., Li Z., Samaras D., Metaxas D., Elgammal A., Huang P.: High resolution acquisition, learning and transfer of dynamic 3-d facial expressions. In Proc. Eurographics '04 (2004).

[23]

{ZLGS03} Zhang Q., Liu Z., Guo B., Shum H.: Geometry-driven photorealistic facial expression synthesis. In Proc. 2003 ACM SIGGRAPH/Eurographics Sympsium on Computer Animation (2003), pp. 177--186.

Digital Library

Cited By

Sha TZhang WShen TLi ZMei T(2023)Deep Person Generation: A Survey from the Perspective of Face, Pose, and Cloth SynthesisACM Computing Surveys10.1145/357565655:12(1-37)Online publication date: 28-Mar-2023
https://dl.acm.org/doi/10.1145/3575656
Wang YSong LWu WQian CHe RLoy C(2022)Talking Faces: Audio-to-Video Face GenerationHandbook of Digital Face Manipulation and Detection10.1007/978-3-030-87664-7_8(163-188)Online publication date: 31-Jan-2022
https://doi.org/10.1007/978-3-030-87664-7_8
Yao XFried OFatahalian KAgrawala M(2021)Iterative Text-Based Editing of Talking-Heads Using Neural RetargetingACM Transactions on Graphics10.1145/344906340:3(1-14)Online publication date: 1-Aug-2021
https://dl.acm.org/doi/10.1145/3449063
Show More Cited By

Index Terms

Transferable videorealistic speech animation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        3D imaging
  2. Computer graphics
    1. Animation

Recommendations

Trainable videorealistic speech animation

We describe how to create with machine learning techniques a generative, speech animation module. A human subject is first recorded using a videocamera as he/she utters a predetermined speech corpus. After processing the corpus automatically, a visual ...
Trainable videorealistic speech animation
SIGGRAPH '02: Proceedings of the 29th annual conference on Computer graphics and interactive techniques

We describe how to create with machine learning techniques a generative, speech animation module. A human subject is first recorded using a videocamera as he/she utters a predetermined speech corpus. After processing the corpus automatically, a visual ...
Trainable videorealistic speech animation
FGR' 04: Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition

We describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she utters a pre-determined speech corpus. After processing the corpus ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SCA '05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation

July 2005

366 pages

ISBN:1595931988

DOI:10.1145/1073368

Conference Chairs:
Demetri Terzopoulos
New York University, USA
,
Victor Brian Zordan
University of California at Riverside, USA
,
Program Chairs:
Ken Anjyo
OLM Digital, Inc., Japan
,
Petros Faloutsos
University of California at Los Angeles, USA

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
EUROGRAPHICS: The European Association for Computer Graphics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 July 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SCA05

Sponsor:

SIGGRAPH
EUROGRAPHICS

SCA05: Symposium on Computer Animation

July 29 - 31, 2005

California, Los Angeles

Acceptance Rates

Overall Acceptance Rate 183 of 487 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
335
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sha TZhang WShen TLi ZMei T(2023)Deep Person Generation: A Survey from the Perspective of Face, Pose, and Cloth SynthesisACM Computing Surveys10.1145/357565655:12(1-37)Online publication date: 28-Mar-2023
https://dl.acm.org/doi/10.1145/3575656
Wang YSong LWu WQian CHe RLoy C(2022)Talking Faces: Audio-to-Video Face GenerationHandbook of Digital Face Manipulation and Detection10.1007/978-3-030-87664-7_8(163-188)Online publication date: 31-Jan-2022
https://doi.org/10.1007/978-3-030-87664-7_8
Yao XFried OFatahalian KAgrawala M(2021)Iterative Text-Based Editing of Talking-Heads Using Neural RetargetingACM Transactions on Graphics10.1145/344906340:3(1-14)Online publication date: 1-Aug-2021
https://dl.acm.org/doi/10.1145/3449063
Mirsky YLee W(2021)The Creation and Detection of DeepfakesACM Computing Surveys10.1145/342578054:1(1-41)Online publication date: 2-Jan-2021
https://dl.acm.org/doi/10.1145/3425780
Berson ESoladié CStoiber NLevin DKry P(2020)Intuitive facial animation editing based on a generative RNN frameworkProceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation10.1111/cgf.14117(1-11)Online publication date: 6-Oct-2020
https://dl.acm.org/doi/10.1111/cgf.14117
Chen LCui GLiu CLi ZKou ZXu YXu C(2020)Talking-Head Generation with Rhythmic Head MotionComputer Vision – ECCV 202010.1007/978-3-030-58545-7_3(35-51)Online publication date: 5-Nov-2020
https://doi.org/10.1007/978-3-030-58545-7_3
Fried OTewari AZollhöfer MFinkelstein AShechtman EGoldman DGenova KJin ZTheobalt CAgrawala M(2019)Text-based editing of talking-head videoACM Transactions on Graphics10.1145/3306346.332302838:4(1-14)Online publication date: 12-Jul-2019
https://dl.acm.org/doi/10.1145/3306346.3323028
Huang DChandra EYang XZhou YMing HLin WDong MLi HHuang DZhao SSchuller BYao HTao JXu MXie LHuang QYang J(2018)Visual Speech Emotion Conversion using Deep Learning for 3D Talking HeadProceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and first Multi-Modal Affective Computing of Large-Scale Multimedia Data10.1145/3267935.3267950(7-13)Online publication date: 19-Oct-2018
https://dl.acm.org/doi/10.1145/3267935.3267950
Thies JZollhöfer MTheobalt CStamminger MNiessner M(2018)HeadonACM Transactions on Graphics10.1145/3197517.320135037:4(1-13)Online publication date: 30-Jul-2018
https://dl.acm.org/doi/10.1145/3197517.3201350
Kim HGarrido PTewari AXu WThies JNiessner MPérez PRichardt CZollhöfer MTheobalt C(2018)Deep video portraitsACM Transactions on Graphics10.1145/3197517.320128337:4(1-14)Online publication date: 30-Jul-2018
https://dl.acm.org/doi/10.1145/3197517.3201283
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten