Abstract
We present a technique to automatically animate a still portrait, making it possible for the subject in the photo to come to life and express various emotions. We use a driving video (of a different subject) and develop means to transfer the expressiveness of the subject in the driving video to the target portrait. In contrast to previous work that requires an input video of the target face to reenact a facial performance, our technique uses only a single target image. We animate the target image through 2D warps that imitate the facial transformations in the driving video. As warps alone do not carry the full expressiveness of the face, we add fine-scale dynamic details which are commonly associated with facial expressions such as creases and wrinkles. Furthermore, we hallucinate regions that are hidden in the input target face, most notably in the inner mouth. Our technique gives rise to reactive profiles, where people in still images can automatically interact with their viewers. We demonstrate our technique operating on numerous still portraits from the internet.
Supplemental Material
Available for Download
Supplemental material.
- Jiamin Bai, Aseem Agarwala, Maneesh Agrawala, and Ravi Ramamoorthi. 2013. Automatic cinemagraph portraits. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 17--25. Google ScholarDigital Library
- Volker Blanz, Curzio Basso, Tomaso Poggio, and Thomas Vetter. 2003. Reanimating faces in images and video. In Computer graphics forum, Vol. 22. Wiley Online Library, 641--650.Google Scholar
- Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., 187--194. Google ScholarDigital Library
- Jean-Yves Bouguet. 2001. Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corporation 5, 1--10 (2001), 4.Google Scholar
- Pia Breuer, Kwang-In Kim, Wolf Kienzle, Bernhard Scholkopf, and Volker Blanz. 2008. Automatic 3D face reconstruction from single images or video. In Automatic Face & Gesture Recognition, 2008. FG'08. 8th IEEE International Conference on. IEEE, 1--8.Google ScholarCross Ref
- Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time high-fidelity facial performance capture. ACM Transactions on Graphics (TOG) 34, 4 (2015), 46. Google ScholarDigital Library
- Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2014. Faceware-house: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2014), 413--425. Google ScholarDigital Library
- Chen Cao, Hongzhi Wu, Yanlin Weng, Tianjia Shao, and Kun Zhou. 2016. Real-time facial animation with image-based dynamic avatars. ACM Transactions on Graphics (TOG) 35, 4 (2016), 126. Google ScholarDigital Library
- Erika Chuang and Christoph Bregler. 2005. Mood swings: expressive speech animation. ACM Transactions on Graphics (TOG) 24, 2 (2005), 331--347. Google ScholarDigital Library
- T. F. Cootes. Talking face video. http://www-prima.inrialpes.fr/FGnet/data/01-TalkingFace/talking_face.html (????).Google Scholar
- Kevin Dale, Kalyan Sunkavalli, Micah K Johnson, Daniel Vlasic, Wojciech Matusik, and Hanspeter Pfister. 2011. Video face replacement. ACM Transactions on Graphics (TOG) 30, 6 (2011), 130. Google ScholarDigital Library
- Changxing Ding and Dacheng Tao. 2016. A comprehensive survey on pose-invariant face recognition. ACM Transactions on Intelligent Systems and Technology (TIST) 7, 3 (2016), 37. Google ScholarDigital Library
- Ohad Fried, Eli Shechtman, Dan B Goldman, and Adam Finkelstein. 2016. Perspective-aware Manipulation of Portrait Photos. (2016).Google Scholar
- Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, and Victor Lempitsky. 2016. DeepWarp: Photorealistic image resynthesis for gaze manipulation. In European Conference on Computer Vision. Springer, 311--326.Google ScholarCross Ref
- Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormahlen, Patrick Perez, and Christian Theobalt. 2014. Automatic face reenactment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4217--4224. Google ScholarDigital Library
- Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. In Computer Graphics Forum, Vol. 34. Wiley Online Library, 193--204. Google ScholarDigital Library
- Tal Hassner, Shai Harel, Eran Paz, and Roee Enbar. 2015. Effective face frontalization in unconstrained images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4295--4304.Google ScholarCross Ref
- Alexander Hornung, Ellen Dekkers, and Leif Kobbelt. 2007. Character animation from 2D pictures and 3D motion data. ACM Transactions on Graphics (TOG) 26, 1 (2007). Google ScholarDigital Library
- Masahide Kawai, Tomoyori Iwao, Daisuke Mima, Akinobu Maejima, and Shigeo Morishima. 2013. Photorealistic inner mouth expression in speech animation. In ACM SIGGRAPH 2013 Posters. ACM, 9. Google ScholarDigital Library
- Masahide Kawai, Tomoyori Iwao, Daisuke Mima, Akinobu Maejima, and Shigeo Morishima. 2014. Data-driven speech animation synthesis focusing on realistic inside of the mouth. Journal of information processing 22, 2 (2014), 401--409.Google ScholarCross Ref
- Ira Kemelmacher-Shlizerman, Aditya Sankar, Eli Shechtman, and Steven M Seitz. 2010. Being john malkovich. In European Conference on Computer Vision. 341--353. Google ScholarDigital Library
- Davis E King. 2009. Dlib-ml: A machine learning toolkit. J. Mach. Learning Research 10 (2009), 1755--1758. Google ScholarDigital Library
- Iryna Korshunova, Wenzhe Shi, Joni Dambre, and Lucas Theis. 2016. Fast face-swap using convolutional neural networks. arXiv preprint arXiv:1611.09577 (2016).Google Scholar
- Claudia Kuster, Tiberiu Popa, Jean-Charles Bazin, Craig Gotsman, and Markus Gross. 2012. Gaze correction for home video conferencing. ACM Transactions on Graphics (TOG) 31, 6 (2012), 174. Google ScholarDigital Library
- Tommer Leyvand, Daniel Cohen-Or, Gideon Dror, and Dani Lischinski. 2008. Data-driven enhancement of facial attractiveness. In ACM Transactions on Graphics (TOG), Vol. 27. ACM, 38. Google ScholarDigital Library
- Kai Li, Feng Xu, Jue Wang, Qionghai Dai, and Yebin Liu. 2012. A data-driven approach for facial expression synthesis in video. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 57--64. Google ScholarDigital Library
- Zicheng Liu, Ying Shan, and Zhengyou Zhang. 2001. Expressive expression mapping with ratio images. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. ACM, 271--276. Google ScholarDigital Library
- Iacopo Masi, Anh Tuan Tran, Jatuporn Toy Leksut, Tal Hassner, and Gérard G. Medioni. 2016. Do We Really Need to Collect Millions of Faces for Effective Face Recognition? CoRR abs/1603.07057 (2016). http://arxiv.org/abs/1603.07057Google Scholar
- Maja Pantic, Michel Valstar, Ron Rademaker, and Ludo Maat. 2005. Web-based database for facial expression analysis. In 2005 IEEE international conference on multimedia and Expo. IEEE, 5--pp.Google ScholarCross Ref
- Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson image editing. In ACM Transactions on Graphics (TOG), Vol. 22. ACM, 313--318. Google ScholarDigital Library
- Marcel Piotraschke and Volker Blanz. 2016. Automated 3d face reconstruction from multiple images using quality measures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3418--3427.Google ScholarCross Ref
- Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. Grabcut: Interactive foreground extraction using iterated graph cuts. In ACM transactions on graphics (TOG), Vol. 23. ACM, 309--314. Google ScholarDigital Library
- Shunsuke Saito, Tianye Li, and Hao Li. 2016. Real-time facial segmentation and performance capture from rgb input. In European Conference on Computer Vision. Springer, 244--261.Google ScholarCross Ref
- Jason M Saragih, Simon Lucey, and Jeffrey F Cohn. 2011. Real-time avatar animation from a single image. In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on. IEEE, 117--124.Google Scholar
- Xiaoyong Shen, Aaron Hertzmann, Jiaya Jia, Sylvain Paris, Brian Price, Eli Shechtman, and Ian Sachs. 2016. Automatic Portrait Segmentation for Image Stylization. In Computer Graphics Forum, Vol. 35. Wiley Online Library, 93--102.Google Scholar
- Zhixin Shu, Eli Shechtman, Dimitris Samaras, and Sunil Hadap. 2016. EyeOpener: Editing Eyes in the Wild. ACM Transactions on Graphics (TOG) 36, 1 (2016), 1. Google ScholarDigital Library
- Yaniv Taigman, Adam Polyak, and Lior Wolf. 2016. Unsupervised Cross-Domain Image Generation. arXiv preprint arXiv:1611.02200 (2016).Google Scholar
- Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1 (2016).Google ScholarCross Ref
- Michel Valstar and Maja Pantic. 2010. Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In Proc. 3rd Intern. Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect. 65.Google Scholar
- Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face transfer with multilinear models. In ACM Transactions on Graphics (TOG), Vol. 24. ACM, 426--433. Google ScholarDigital Library
- Fei Yang, Lubomir Bourdev, Eli Shechtman, Jue Wang, and Dimitris Metaxas. 2012. Facial expression editing in video using a temporally-smooth factorization. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 861--868. Google ScholarDigital Library
- Fei Yang, Jue Wang, Eli Shechtman, Lubomir Bourdev, and Dimitri Metaxas. 2011. Expression flow for 3D-aware face component transfer. In ACM Transactions on Graphics (TOG), Vol. 30. ACM, 60. Google ScholarDigital Library
- Raymond Yeh, Ziwei Liu, Dan B Goldman, and Aseem Agarwala. 2016. Semantic Facial Expression Editing using Autoencoded Flow. arXiv preprint arXiv:1611.09961 (2016).Google Scholar
- Shizhe Zhou, Hongbo Fu, Ligang Liu, Daniel Cohen-Or, and Xiaoguang Han. 2010. Parametric reshaping of human bodies in images. ACM Transactions on Graphics (TOG) 29, 4 (2010), 126. Google ScholarDigital Library
Index Terms
- Bringing portraits to life
Recommendations
Deep video portraits
We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full ...
Audio-driven talking face generation with diverse yet realistic facial animations
AbstractAudio-driven talking face generation, which aims to synthesize talking faces with realistic facial animations (including accurate lip movements, vivid facial expression details and natural head poses) corresponding to the audio, has achieved ...
Highlights- Generate diverse yet realistic talking faces from the same input audio.
- Network for modelling the uncertain relations between audio and visual signals.
- Novel technique that enables to generate temporally coherent talking faces.
Exploring photobios
We present an approach for generating face animations from large image collections of the same person. Such collections, which we call photobios, sample the appearance of a person over changes in pose, facial expression, hairstyle, age, and other ...
Comments