Abstract
We present an approach for generating face animations from large image collections of the same person. Such collections, which we call photobios, sample the appearance of a person over changes in pose, facial expression, hairstyle, age, and other variations. By optimizing the order in which images are displayed and cross-dissolving between them, we control the motion through face space and create compelling animations (e.g., render a smooth transition from frowning to smiling). Used in this context, the cross dissolve produces a very strong motion effect; a key contribution of the paper is to explain this effect and analyze its operating range. The approach operates by creating a graph with faces as nodes, and similarities as edges, and solving for walks and shortest paths on this graph. The processing pipeline involves face detection, locating fiducials (eyes/nose/mouth), solving for pose, warping to frontal views, and image comparison based on Local Binary Patterns. We demonstrate results on a variety of datasets including time-lapse photography, personal photo collections, and images of celebrities downloaded from the Internet. Our approach is the basis for the Face Movies feature in Google's Picasa.
Supplemental Material
Available for Download
- Adelson, E. H., and Movshon, J. A. 1982. Phenomenal coherence of moving visual patterns. Nature 300, 5892, 523--525.Google Scholar
- Ahonen, T., Hadid, A., and Pietikinen, M. 2006. Face description with local binary pat- terns: Application to face recognition. In IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 28, 2037--2041. Google ScholarDigital Library
- Arikan, O., and Forsyth, D. A. 2002. Interactive motion generation from examples. ACM Trans. Graph. 21, 3, 483--490. Google ScholarDigital Library
- Bederson, B. B. 2001. Photomesa: a zoomable image browser using quantum treemaps and bubblemaps. In UIST, 71--80. Google Scholar
- Beier, T., and Neely, S. 1992. Feature-based image metamorphosis. 35--42.Google Scholar
- Berg, T. L., Berg, A. C., Edwards, J., Maire, M., White, R., Teh, Y.-W., Learned-Miller, E., and Forsyth, D. A. 2004. Names and faces in the news. In CVPR, 848--854. Google Scholar
- Bitouk, D., Kumar, N., Dhillon, S., Belhumeur, P., and Nayar, S. K. 2008. Face swapping: automatically replacing faces in photographs. In SIGGRAPH, 1--8. Google Scholar
- Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3d faces. In SIGGRAPH, 187--194. Google Scholar
- Bourdev, L., and Brandt, J. 2005. Robust object detection via soft cascade. CVPR. Google Scholar
- Bregler, C., Covell, M., and Slaney, M. 1997. Video rewrite: Driving visual speech with audio. In SIGGRAPH, 75--84. Google Scholar
- Chen, S. E., and Williams, L. 1993. View interpolation for image synthesis. In SIGGRAPH, 279--288. Google Scholar
- Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In CVPR, 886--893. Google Scholar
- Everingham, M., Sivic, J., and Zisserman, A. 2006. "Hello! My name is... Buffy" -- automatic naming of characters in TV video. In Proceedings of the British Machine Vision Conference.Google Scholar
- Freeman, W. T., Adelson, E. H., and Heeger, D. J. 1991. Motion without movement. Computer Graphics 25, 27--30. Google ScholarDigital Library
- Goldman, D. B., Gonterman, C., Curless, B., Salesin, D., and Seitz, S. M. 2008. Video object annotation, navigation, and composition. In UIST, 3--12. Google Scholar
- Graham, A., Garcia-Molina, H., Paepcke, A., and Winograd, T. 2002. Time as essence for photo browsing through personal digital libraries. In JCDL, 326--335. Google Scholar
- Huang, G. B., Ramesh, M., Berg, T., and Learned-Miller, E. 2007. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. Rep. 07-49, University of Massachusetts, Amherst.Google Scholar
- Huynh, D. F., Drucker, S. M., Baudisch, P., and Wong, C. 2005. Time quilt: scaling up zoomable photo browsers for large, unstructured photo collections. In CHI, 1937--1940. Google Scholar
- Joshi, N., Szeliski, R., and Kriegman, D. J. 2008. Psf estimation using sharp edge prediction. In CVPR.Google Scholar
- Katz, S., Tal, A., and Basri, R. 2007. Direct visibility of point sets. SIGGRAPH 26, 3. Google Scholar
- Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. M. 2010. Being John Malkovich. ECCV. Google Scholar
- Kenkel, F. 1913. Untersuchungenber den zusammenhang zwischen erscheinungsgrsse und ersehrinungsbewegung bei einigen sogenannten optischen tuschungen. Z. Psychol. 67, 358--449.Google Scholar
- Kovar, L., Gleicher, M., and Pighin, F. 2002. Motion graphs. In SIGGRAPH, 473--482. Google Scholar
- Kumar, N., Belhumeur, P., and Nayar, S. 2008. Facetracer: A search engine for large collections of images with faces. In ECCV, 340--353. Google Scholar
- Lasseter, J. 1987. Principles of traditional animation applied to 3D computer animation. In Proc. SIGGRAPH 87, 35--44. Google ScholarDigital Library
- Levoy, M., and Hanrahan, P. 1996. Light field rendering. In SIGGRAPH, 31--42. Google Scholar
- Lu, Z.-L., and Sperling, G. 2002. Three systems theory of human visual motion perception. JOSA A 19, 2, 413--413.Google ScholarCross Ref
- Lucas, B., and Kanade, T. 1981. An iterative image registration technique with an application to stereo vision. Proceedings of Imaging Understanding Workshop, 121--130. Google ScholarDigital Library
- Marr, D., and Hildreth, E. 1980. Theory of edge detection. Proc. R. Soc. Lond. B 207, 187--217.Google ScholarCross Ref
- Nalwa, V. S., and Binford, T. O. 1986. On detecting edges. IEEE Trans. Pattern Anal. Mach. Intell. 8, 699--714. Google ScholarDigital Library
- Ojala, T., Pietikinen, M., and Menp, T. 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, 971--987. Google ScholarDigital Library
- Pentland, A., Picard, R. W., and Sclaroff, S. 1996. Photobook: Content-based manipulation of image databases. International Journal of Computer Vision 18, 3, 233--254. Google ScholarDigital Library
- Picasa, 2010. http://googlephotos.blogspot.com/2010/08/picasa-38-face-movies-picnik.html.Google Scholar
- Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., and Salesin, D. H. 1998. Synthesizing realistic facial expressions from photographs. In SIGGRAPH, 75--84. Google Scholar
- Rekimoto, J. 1999. Time-machine computing: a time-centric approach for the information environment. In UIST, 45--54. Google Scholar
- Seitz, S. M., and Dyer, C. R. 1996. View morphing. In SIGGRAPH, 21--30. Google Scholar
- Shashua, A. 1992. Geometry and Photometry in 3D Visual Recognition. PhD thesis, Massachusetts Institute Of Technology, Cambridge, MA. Google Scholar
- Snavely, N., Garg, R., Seitz, S. M., and Szeliski, R. 2008. Finding paths through the world's photos. ACM Trans. Graph. 27, 3, 1--11. Google ScholarDigital Library
- Szeliski, R., and Shum, H.-Y. 1997. Creating full view panoramic image mosaics and environment maps. In Proc. SIGGRAPH 97, 251--258. Google ScholarDigital Library
- Wertheimer, M. 1912. Experimentelle studien uber das sehen von bewegung. Z. Psychol. 61, 161--265.Google Scholar
- Zhang, L., Snavely, N., Curless, B., and Seitz, S. M. 2004. Spacetime faces: high resolution capture for modeling and animation. In SIGGRAPH, 548--558. Google Scholar
Index Terms
- Exploring photobios
Recommendations
Exploring photobios
SIGGRAPH '11: ACM SIGGRAPH 2011 papersWe present an approach for generating face animations from large image collections of the same person. Such collections, which we call photobios, sample the appearance of a person over changes in pose, facial expression, hairstyle, age, and other ...
Audio-driven talking face generation with diverse yet realistic facial animations
AbstractAudio-driven talking face generation, which aims to synthesize talking faces with realistic facial animations (including accurate lip movements, vivid facial expression details and natural head poses) corresponding to the audio, has achieved ...
Highlights- Generate diverse yet realistic talking faces from the same input audio.
- Network for modelling the uncertain relations between audio and visual signals.
- Novel technique that enables to generate temporally coherent talking faces.
Automatic acquisition of high-fidelity facial performances using monocular videos
This paper presents a facial performance capture system that automatically captures high-fidelity facial performances using uncontrolled monocular videos (e.g., Internet videos). We start the process by detecting and tracking important facial features ...
Comments