Abstract
Detailed facial performance geometry can be reconstructed using dense camera and light setups in controlled studios. However, a wide range of important applications cannot employ these approaches, including all movie productions shot from a single principal camera. For post-production, these require dynamic monocular face capture for appearance modification. We present a new method for capturing face geometry from monocular video. Our approach captures detailed, dynamic, spatio-temporally coherent 3D face geometry without the need for markers. It works under uncontrolled lighting, and it successfully reconstructs expressive motion including high-frequency face detail such as folds and laugh lines. After simple manual initialization, the capturing process is fully automatic, which makes it versatile, lightweight and easy-to-deploy. Our approach tracks accurate sparse 2D features between automatically selected key frames to animate a parametric blend shape model, which is further refined in pose, expression and shape by temporally coherent optical flow and photometric stereo. We demonstrate performance capture results for long and complex face sequences captured indoors and outdoors, and we exemplify the relevance of our approach as an enabling technology for model-based face editing in movies and video, such as adding new facial textures, as well as a step towards enabling everyone to do facial performance capture with a single affordable camera.
Supplemental Material
Available for Download
Supplemental material.
- Ahonen, T., Hadid, A., and Pietikainen, M. 2006. Face description with local binary patterns: Application to face recognition. IEEE TPAMI 28, 12, 2037--2041. Google ScholarDigital Library
- Alexander, O., Rogers, M., Lambeth, W., Chiang, M., and Debevec, P. 2009. The Digital Emily Project: photoreal facial modeling and animation. In ACM SIGGRAPH Courses, 12:1--12:15. Google ScholarDigital Library
- Arun, K. S., Huang, T. S., and Blostein, S. D. 1987. Least-squares fitting of two 3-D point sets. IEEE TPAMI 9, 5, 698--700. Google ScholarDigital Library
- Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG (Proc. SIGGRAPH) 30, 75:1--75:10. Google ScholarDigital Library
- Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M., Pfister, H., and Gross, M. 2007. Multi-scale capture of facial geometry and motion. ACM TOG (Proc. SIGGRAPH) 26, 33:1--33:10. Google ScholarDigital Library
- Black, M., and Yacoob, Y. 1995. Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In Proc. ICCV, 374--381. Google ScholarDigital Library
- Blanz, V., Basso, C., Vetter, T., and Poggio, T. 2003. Reanimating faces in images and video. CGF (Proc. EUROGRAPHICS) 22, 641--650.Google ScholarCross Ref
- Borshukov, G., Piponi, D., Larsen, O., Lewis, J. P., and Tempelaar-Lietz, C. 2003. Universal capture: image-based facial animation for "The Matrix Reloaded". In ACM SIGGRAPH 2003 Sketches, 16:1--16:1. Google ScholarDigital Library
- Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM TOG (Proc. SIGGRAPH) 32, 4, 40:1--40:10. Google ScholarDigital Library
- Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM TOG (Proc. SIGGRAPH) 29, 4, 41:1--41:10. Google ScholarDigital Library
- Brand, M., and Bhotika, R. 2001. Flexible flow for 3D nonrigid tracking and shape recovery. In Proc. CVPR, 315--322.Google Scholar
- Cao, C., Weng, Y., Lin, S., and Zhou, K. 2013. 3D shape regression for real-time facial animation. ACM TOG (Proc. SIGGRAPH) 32, 4, 41:1--41:10. Google ScholarDigital Library
- Chai, J.-x., Xiao, J., and Hodgins, J. 2003. Vision-based control of 3D facial animation. In Proc. SCA, 193--206. Google ScholarDigital Library
- Chuang, E., and Bregler, C. 2002. Performance-driven facial animation using blend shape interpolation. Tech. Rep. CS-TR-2002-02, Stanford University.Google Scholar
- Cootes, T. F., Edwards, G. J., and Taylor, C. J. 2001. Active appearance models. IEEE TPAMI 23, 6, 681--685. Google ScholarDigital Library
- Dale, K., Sunkavalli, K., Johnson, M. K., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. ACM TOG (Proc. SIGGRAPH Asia) 30, 6, 130:1--130:10. Google ScholarDigital Library
- Dantone, M., Gall, J., Fanelli, G., and Gool, L. V. 2012. Real-time facial feature detection using conditional regression forests. In Proc. CVPR, 2578--2585. Google ScholarDigital Library
- David, P., DeMenthon, D., Duraiswami, R., and Samet, H. 2004. SoftPOSIT: Simultaneous pose and correspondence determination. IJCV 59, 3, 259--284. Google ScholarDigital Library
- DeCarlo, D., and Metaxas, D. 1996. The integration of optical flow and deformable models with applications to human face shape and motion estimation. In Proc. CVPR, 231--238. Google ScholarDigital Library
- Essa, I., Basu, S., Darrell, T., and Pentland, A. 1996. Modeling, tracking and interactive animation of faces and heads using input from video. In Proc. CA, 68--79. Google ScholarDigital Library
- Furukawa, Y., and Ponce, J. 2009. Dense 3D motion capture for human faces. In Proc. CVPR, 1674--1681.Google Scholar
- Guenter, B., Grimm, C., Wood, D., Malvar, H., and Pighin, F. 1998. Making faces. In Proc. SIGGRAPH, 55--66. Google ScholarDigital Library
- Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG (Proc. SIGGRAPH) 30, 74:1--74:10. Google ScholarDigital Library
- Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. M. 2010. Being John Malkovich. In Proc. ECCV, 341--353. Google ScholarDigital Library
- Li, H., Roivainen, P., and Forcheimer, R. 1993. 3-D motion estimation in model-based facial image coding. IEEE TPAMI 15, 6, 545--555. Google ScholarDigital Library
- Li, H., Weise, T., and Pauly, M. 2010. Example-based facial rigging. ACM TOG (Proc. SIGGRAPH) 29, 3, 32:1--32:6. Google ScholarDigital Library
- Li, K., Xu, F., Wang, J., Dai, Q., and Liu, Y. 2012. A data-driven approach for facial expression synthesis in video. In Proc. CVPR, 57--64. Google ScholarDigital Library
- Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG (Proc. SIGGRAPH) 32, 4, 42:1--42:10. Google ScholarDigital Library
- Nehab, D., Rusinkiewicz, S., Davis, J., and Ramamoorthi, R. 2005. Efficiently combining positions and normals for precise 3D geometry. ACM TOG 24, 3, 536--543. Google ScholarDigital Library
- Pighin, F., and Lewis, J. 2006. Performance-driven facial animation. In ACM SIGGRAPH Courses.Google Scholar
- Pighin, F., Szeliski, R., and Salesin, D. 1999. Resynthesizing facial animation through 3D model-based tracking. In Proc. CVPR, 143--150.Google Scholar
- Platt, J. C. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. Tech. Rep. MSRTR-98-14, Microsoft Research.Google Scholar
- Saragih, J. M., Lucey, S., and Cohn, J. F. 2011. Deformable model fitting by regularized landmark mean-shift. IJCV 91, 2, 200--215. Google ScholarDigital Library
- Sorkine, O. 2005. Laplacian mesh processing. In EUROGRAPHICS STAR report, 53--70.Google Scholar
- Valgaerts, L., Bruhn, A., Mainberger, M., and Weickert, J. 2011. Dense versus sparse approaches for estimating the fundamental matrix. IJCV 96, 2, 212--234. Google ScholarDigital Library
- Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG (Proc. SIGGRAPH Asia) 31, 6, 187:1--187:11. Google ScholarDigital Library
- Vlasic, D., Brand, M., Pfister, H., and Popovíc, J. 2005. Face transfer with multilinear models. ACM TOG (Proc. SIGGRAPH) 24, 3, 426--433. Google ScholarDigital Library
- Volz, S., Bruhn, A., Valgaerts, L., and Zimmer, H. 2011. Modeling temporal coherence for optical flow. In Proc. ICCV, 1116--1123. Google ScholarDigital Library
- Wang, Y., Huang, X., Su Lee, C., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., and Huang, P. 2004. High resolution acquisition, learning and transfer of dynamic 3-D facial expressions. CGF 23, 677--686.Google ScholarCross Ref
- Weise, T., Leibe, B., and Gool, L. J. V. 2007. Fast 3D scanning with automatic motion compensation. In Proc. CVPR.Google Scholar
- Weise, T., Li, H., Gool, L. J. V., and Pauly, M. 2009. Face/Off: live facial puppetry. In Proc. SIGGRAPH/Eurographics Symposium on Computer Animation, 7--16. Google ScholarDigital Library
- Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM TOG (Proc. SIGGRAPH) 30, 77:1--77:10. Google ScholarDigital Library
- Williams, L. 1990. Performance-driven facial animation. In Proc. SIGGRAPH, 235--242. Google ScholarDigital Library
- Wilson, C. A., Ghosh, A., Peers, P., Chiang, J.-Y., Busch, J., and Debevec, P. 2010. Temporal upsampling of performance geometry using photometric alignment. ACM TOG 29, 17:1--17:11. Google ScholarDigital Library
- Xiao, J., Baker, S., Matthews, I., and Kanade, T. 2004. Real-time combined 2D+3D active appearance models. In Proc. CVPR, 535--542. Google ScholarDigital Library
- Zhang, L., Noah, Curless, B., and Seitz, S. M. 2004. Spacetime faces: high resolution capture for modeling and animation. ACM TOG (Proc. SIGGRAPH) 23, 548--558. Google ScholarDigital Library
Index Terms
Reconstructing detailed dynamic face geometry from monocular video
Recommendations
Automatic acquisition of high-fidelity facial performances using monocular videos
This paper presents a facial performance capture system that automatically captures high-fidelity facial performances using uncontrolled monocular videos (e.g., Internet videos). We start the process by detecting and tracking important facial features ...
Lightweight binocular facial performance capture under uncontrolled lighting
Recent progress in passive facial performance capture has shown impressively detailed results on highly articulated motion. However, most methods rely on complex multi-camera set-ups, controlled lighting or fiducial markers. This prevents them from ...
Silhouette lookup for monocular 3D pose tracking
Computers should be able to detect and track the articulated 3D pose of a human being moving through a video sequence. Incremental tracking methods often prove slow and unreliable, and many must be initialized by a human operator before they can track a ...
Comments