skip to main content
10.5555/1218064.1218100acmconferencesArticle/Chapter ViewAbstractPublication PagesscaConference Proceedingsconference-collections
Article

Simulating speech with a physics-based facial muscle model

Published: 02 September 2006 Publication History

Abstract

We present a physically based system for creating animations of novel words and phrases from text and audio input based on the analysis of motion captured speech examples. Leading image based techniques exhibit photo-real quality, yet lack versatility especially with regard to interactions with the environment. Data driven approaches that use motion capture to deform a three dimensional surface often lack any anatomical or physically based structure, limiting their accuracy and realism. In contrast, muscle driven physics-based facial animation systems can trivially integrate external interacting objects and have the potential to produce very realistic animations as long as the underlying model and simulation framework are faithful to the anatomy of the face and the physics of facial tissue deformation. We start with a high resolution, anatomically accurate flesh and muscle model built for a specific subject. Then we translate a motion captured training set of speech examples into muscle activation signals, and subsequently segment those into intervals corresponding to individual phonemes. Finally, these samples are used to synthesize novel words and phrases. The versatility of our approach is illustrated by combining this novel speech content with various facial expressions, as well as interactions with external objects.

References

[1]
{Ann06} Annosoft, LIC: Basic phoneset, 2006. Available at http://www.annosoft.com/phoneset.htm.
[2]
{BB02} Byun M., Badler N. I.: FacEMOTE: Qualitative parametric modifiers for facial animations. In Proc. of ACM SIGGRAPH/Eurographics Symp. on Comput. Anim. (2002), ACM Press, pp. 65--71.
[3]
{BBPV03} Blanz V., Basso C., Poggio T., Vetter T.: Reanimating faces in images and video. In Proc. of Eurographics (2003), vol. 22.
[4]
{BCS97} Bregler C., Covell M., Slaney M.: Video Rewrite: driving visual speech with audio. In Proc. of ACM SIGGRAPH (1997), pp. 353--360.
[5]
{BOP98a} Basu S., Oliver N., Pentland A.: 3D lip shapes from video: a combined physical-statistical model. Speech Communication 26 (1998), 131--148.
[6]
{BOP98b} Basu S., Oliver N., Pentland A.: 3D modeling and tracking of human lip motions. IEEE Comput. Society, pp. 337--343.
[7]
{Bra99} Brand M.: Voice puppetry. In Proc. of ACM SIGGRAPH (1999), pp. 21--28.
[8]
{BSVS04} Blanz V., Scherbaum K., Vetter T., Seidel H. P.: Exchanging faces in images. In Proc. of Eurographics (2004), vol. 23.
[9]
{BTC99} Black A., Taylor P., Caley R.: The festival speech synthesis system.
[10]
{BV99} Blanz V., Vetter T.: A morphable model for the synthesis of 3D faces. In Proc. of ACM SIGGRAPH (1999), ACM Press, pp. 187--194.
[11]
{CB05} Chuang E., Bregler C.: Mood swings: expressive speech animation. ACM Trans. Graph. 24, 2 (2005), 331--347.
[12]
{CE05} Chang Y., Ezzat T.: Transferable videorealistic speech animation. Eurographics/ACM SIGGRAPH Symp. on Comput. Anim. (2005).
[13]
{CFKP04} Cao Y., Faloutsos P., Kohler E., Pighin F.: Real-time speech motion synthesis from recorded motions. In Proc. of 2003 ACM SIGGRAPH/Eurographics Symp. on Comput. Anim. (2004), pp. 347--355.
[14]
{CFP03} Cao Y., Faloutsos P., Pighin F.: Unsupervised learning for speech motion editing. In Proc. of the ACM SIGGRAPH/Eurographics Symp. on Comput. Anim. (2003), pp. 225--231.
[15]
{CK01} Choe B., Ko H.-S.: Analysis and synthesis of facial expressions with hand-generated muscle actuation basis. In Proc. of Comput. Anim. (2001), pp. 12--19.
[16]
{CLK01} Choe B., Lee H., Ko H.-S.: Performance-driven muscle-based facial animation. J. Vis. and Comput. Anim. 12 (2001), 67--79.
[17]
{CM93} Cohen M., Massaro D.: Modeling coarticulation in synthetic visual speech. Models and Techniques in Comput. Anim. (1993).
[18]
{CPB*94} Cassell J., Pelachaud C., Badler N., Steedman M., Achorn B., Becket T., Doubille B., Prevost S., Stone M.: Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In Proc. of ACM SIGGRAPH (1994), ACM Press, pp. 413--420.
[19]
{CVB01} Cassell J., Vilhjálmsson H. H., Bickmore T.: BEAT: the Behavior Expression Animation Toolkit. In Proc. of ACM SIGGRAPH (2001), pp. 477--486.
[20]
{CXH03} Chai J., Xiao J., Hodgins J.: Vision-based control of 3D facial animation. In Proc. of ACM SIGGRAPH/Eurographics Symp. on Comput. Anim. (2003), pp. 193--206.
[21]
{DLN05} Deng Z., Lewis J., Neumann U.: Synthesizing speech animation by learning compact speech co-articulation models. Comput. Graph. Int. (2005), 19--25.
[22]
{DMS98} Decarlo D., Metaxas D., Stone M.: An anthropometric face model using variational techniques. In Proc. of ACM SIGGRAPH (1998), ACM Press, pp. 67--74.
[23]
{EBDP96} Essa I., Basu S., Darrell T., Pentland A.: Modeling, tracking and interactive animation of faces and heads using input from video. In Proc. of Comput. Anim. (1996), IEEE Comput. Society, pp. 68--79.
[24]
{EF78} Ekman P., Friesen W. V.: Facial Action Coding System. Consulting Psychologist Press, Palo Alto, 1978.
[25]
{EGP02} Ezzat T., Geiger G., Poggio T.: Trainable videorealistic speech animation. In ACM Trans. Graph. (2002), vol. 21, ACM Press, pp. 388--398.
[26]
{EP97} Essa I., Pentland A.: Coding, analysis, interpretation, and recognition of facial expressions. IEEE Trans. on Pattern Analysis and Machine Intelligence 19, 7 (1997), 757--763.
[27]
{EP00} Ezzat T., Poggio T.: Visual speech synthesis by morphing visemes. In Int. J. Comp. Vision (2000), vol. 38, pp. 45--37.
[28]
{GGW*98} Guenter B., Grimm C., Wood D., Malvar H., Pighin F.: Making faces. In Proc. ACM SIGGRAPH (1998), ACM Press, pp. 55--66.
[29]
{JTDP03} Joshi P., Tien W. C., Desbrun M., Pighin F.: Learning controls for blend shape based realistic facial animation. In Proc. ACM SIGGRAPH/Eurographics Symp. on Comput. Anim. (2003), pp. 365--373.
[30]
{KGB98} Koch R., Gross M., Bosshard A.: Emotion editing using finite elements. Proc. of Eurographics 1998 17, 3 (1998).
[31]
{KGC*96} Koch R. M., Gross M. H., Carls F. R., Von Buren D. F., Fankhauser G., Parish Y. I. H.: Simulating facial surgery using finite element models. Comput. Graph. 30, Annual Conf. Series (1996), 421--428.
[32]
{KHS01} Kahler K., Haber J., Seidel H.-P.: Geometry-based muscle modeling for facial animation. In Proc. of Graph. Interface (2001), pp. 37--46.
[33]
{KHS03} Kahler K., Haber J., Seidel H.-P.: Reanimating the dead: Reconstruction of expressive faces from skull data. In ACM Trans. Graph. (2003), vol. 22, pp. 554--561.
[34]
{KHYS02} Kahler K., Haber J., Yamauchi H., Seidel H.-P.: Head shop: Generating animated head models with anatomical structure. In Proc. of ACM SIGGRAPH/Eurographics Symp. on Comput. Anim. (2002), pp. 55--63.
[35]
{KMMTT92} Kalra P., Mangili A., Magnetat-Thalmann N., Thalmann D.: Simulation of facial muscle actions based on rational free form deformations. In Proc. of Eurographics (1992), pp. 59--69.
[36]
{KMT03} Kshirsagar S., Magnenat-Thalmann N.: Visyllable based speech animation. In Proc. of Eurographics (2003), vol. 22.
[37]
{KP05} King S., Parent R.: Creating speech-synchronized animation. IEEE Transactions on Visualization and Computer Graphics 11, 3 (2005), 341--352.
[38]
{LM99} Lucero J., Munhall K.: A model of facial biomechanics for speech production. Journal of the Accoustical Society of America 106, 5 (1999), 2834--2842.
[39]
{LMVB*97} Lucero J., Munhall K., Vatikiotis-Bateson E., Gracco V., Terzopoulos D.: Muscle-based modeling of facial dynamics during speech production. Journal of the Acoustical Society of America 101, 5 (May 1997), 3175--3176.
[40]
{LTW95} Lee Y., Terzopoulos D., Waters K.: Realistic modeling for facial animation. Comput. Graph. (SIGGRAPH Proc.) (1995), 55--62.
[41]
{MIT98} Morishima S., Ishikawa T., Terzopoulos D.: Facial muscle parameter decision from 2D frontal image. In Proc. of the Int. Conf. on Pattern Recognition (1998), vol. 1, pp. 160--162.
[42]
{MTPT88} Magnenat-Thalmann N., Primeau E., Thalmann D.: Abstract muscle action procedures for human face animation. The Vis. Comput. 3, 5 (1988), 290--297.
[43]
{NJ04} Na K., Jung M.: Hierarchical retargetting of fine facial motions. In Proc. of Eurographics (2004), vol. 23.
[44]
{NN01} Noh J., Neumann U.: Expression cloning. In Proc. of ACM SIGGRAPH (2001), Fiume E., (Ed.), ACM Press, pp. 277--288.
[45]
{Par72} Parke F. I.: Computer generated animation of faces. In Proc. of ACM Conf. (1972), ACM Press, pp. 451--457.
[46]
{PB81} Platt S. M., Badler N. I.: Animating facial expressions. Comput. Graph. (SIGGRAPH Proc.) (1981), 245--252.
[47]
{PHL*98} Pighin F., Hecker J., Lischinski D., Szeliski R., Salesin D. H.: Synthesizing realistic facial expressions from photographs. In Proc. of ACM SIGGRAPH (1998), ACM Press, pp. 75--84.
[48]
{PKC*03} Pyun H., Kim Y., Chae W., Kang H. W., Shin S. Y.: An example-based approach for facial expression cloning. In Proc. of ACM SIGGRAPH/Eurographics Symp. on Comput. Anim. (2003), pp. 167--176.
[49]
{PLB*05} Pighin F., Lewis J., Borshukov G., Bennett D., Debevec P., Hery C., Sullivan S., Williams L., Zhang L.: Digital face cloning. In SIGGRAPH Course Notes (2005), ACM.
[50]
{PSS99} Pighin F., Szeliski R., Salesin D.: Resynthesizing facial animation through 3D model-based tracking. In Proc. of Int. Conf. on Comput. Vision (1999), pp. 143--150.
[51]
{RE01} Reveret L., Essa I.: Visual coding and tracking of speech related facial motion. In Proc. of IEEE CVPR Int. Wrkshp. on Cues in Communication (2001).
[52]
{RGTC98} Roth S. H., Gross M., Turello M. H., Carls S.: A Bernstein-Bézier based approach to soft tissue simulation. In Proc. of Eurographics (1998), vol. 17, pp. 285--294.
[53]
{SNF05} Sifakis E., Neverov I., Fedkiw R.: Automatic determination of facial muscle activations from sparse motion capture marker data. ACM Trans. Graph. (SIGGRAPH Proc.) (2005).
[54]
{SP04} Sumner R., Popović J.: Deformation transfer for triangle meshes. In ACM Transactions on Graphics (Proc. of ACM SIGGRAPH) (2004), vol. 23, pp. 399--405.
[55]
{TSIF05} Teran J., Sifakis E., Irving G., Fedkiw R.: Robust quasistatic finite elements and flesh simulation. Proc. of the 2005 ACM SIGGRAPH/Eurographics Symp. on Comput. Anim. (2005).
[56]
{TW90} Terzopoulos D., Waters K.: Physically-based facial modeling, analysis, and animation. J. Vis. and Comput. Anim. 1 (1990), 73--80.
[57]
{TW93} Terzopoulos D., Waters K.: Analysis and synthesis of facial image sequences using physical and anatomical models. IEEE Trans. on Pattern Analysis and Machine Intelligence 15, 6 (1993).
[58]
{VBMH*96} Vatikiotis-Bateson E., Munhall K., Hirayama M., Lee Y., Terzopoulos D.: Dynamics of facial motion in speech: Kinematic and electromyographic studies of orofacial structures. In Speechreading by Humans and Machines, vol. 150 of NATO ASI Series on Computer and System Sciences. Springer-Verlag, March 1996, ch. 16, pp. 231--232.
[59]
{VBPP05} Vlasic D., Brand M., Pfister H., Popović J.: Face transfer with multilinear models. In ACM Transactions on Graphics (Proc. of ACM SIGGRAPH) (2005), vol. 24, pp. 426--433.
[60]
{Wat87} Waters K.: A muscle model for animating three-dimensional facial expressions. Comput. Graph. (SIGGRAPH Proc.) (1987), 17--24.
[61]
{WF95} Waters K., Frisbie J.: A coordinated muscle model for speech animation. In Proc. of Graph. Interface (May 1995), pp. 163--170.
[62]
{WHL*04} Wang Y., Huang X., Lee C. S., Zhang S., Li Z., Samaras D., Metaxas D., Elgammal A., Huang P.: High resolution acquisition, learning and transfer of dynamic 3-D facial expressions. In Proc. of Eurographics (2004), pp. 677--686.
[63]
{Wil90} Williams L.: Performance-driven facial animation. In Comput. Graph. (Proc. of Int. Conf. on Comput. Graph, and Int. Techniques) (1990), ACM Press, pp. 235--242.
[64]
{ZLGS03} Zhang Q., Liu Z., Guo B., Shum H.: Geometry-driven photorealistic facial expression synthesis. In Proc. of ACM SIGGRAPH/Eurographics Symp. on Comput. Anim. (2003), ACM Press, pp. 16--22.
[65]
{ZSCS04} Zhang L., Snavely N., Curless B., Seitz S.: Spacetime faces: High resolution capture for modeling and animation. In ACM Transactions on Graphics (Proc. of ACM SIGGRAPH) (2004), vol. 23, ACM Press, pp. 548--558.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SCA '06: Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation
September 2006
370 pages
ISBN:3905673347

Sponsors

Publisher

Eurographics Association

Goslar, Germany

Publication History

Published: 02 September 2006

Check for updates

Qualifiers

  • Article

Conference

SCA06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 183 of 487 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)MyoVibeACM Transactions on Sensor Networks10.1145/314912714:1(1-26)Online publication date: 7-Mar-2018
  • (2017)Lessons from the evolution of an anatomical facial muscle modelProceedings of the ACM SIGGRAPH Digital Production Symposium10.1145/3105692.3105693(1-3)Online publication date: 29-Jul-2017
  • (2017)Realistic emotion visualization by combining facial animation and hairstyle synthesisMultimedia Tools and Applications10.1007/s11042-016-4239-876:13(14905-14919)Online publication date: 1-Jul-2017
  • (2017)Creating and simulating a realistic physiological tongue model for speech productionMultimedia Tools and Applications10.1007/s11042-016-3929-676:13(14673-14689)Online publication date: 1-Jul-2017
  • (2016)Modeling and estimation of energy-based hyperelastic objectsProceedings of the 37th Annual Conference of the European Association for Computer Graphics10.5555/3058909.3058960(385-396)Online publication date: 9-May-2016
  • (2016)JALIACM Transactions on Graphics10.1145/2897824.292598435:4(1-11)Online publication date: 11-Jul-2016
  • (2015)Fully automatic generation of anatomical face simulation modelsProceedings of the 14th ACM SIGGRAPH / Eurographics Symposium on Computer Animation10.1145/2786784.2786786(175-183)Online publication date: 7-Aug-2015
  • (2015)MyoVibeProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/2750858.2804258(27-38)Online publication date: 7-Sep-2015
  • (2015)VDubComputer Graphics Forum10.1111/cgf.1255234:2(193-204)Online publication date: 1-May-2015
  • (2015)Blend Shape Interpolation and FACS for Realistic Avatar3D Research10.1007/s13319-015-0038-76:1(1-10)Online publication date: 1-Mar-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media