research-article

MonoPerfCap: Human Performance Capture From Monocular Video

Authors:
Weipeng Xu

Max Planck Institute for Informatics, Germany

Max Planck Institute for Informatics, Germany
View Profile

,
Avishek Chatterjee

Max Planck Institute for Informatics, Germany

Max Planck Institute for Informatics, Germany
View Profile

,
Michael Zollhöfer

Max Planck Institute for Informatics, Germany

Max Planck Institute for Informatics, Germany
View Profile

,
Helge Rhodin

EPFL, Switzerland

EPFL, Switzerland
View Profile

,
Dushyant Mehta

Max Planck Institute for Informatics, Germany

Max Planck Institute for Informatics, Germany
View Profile

,
Hans-Peter Seidel

Max Planck Institute for Informatics, Germany

Max Planck Institute for Informatics, Germany
View Profile

,
Christian Theobalt

Max Planck Institute for Informatics, Germany

Max Planck Institute for Informatics, Germany
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 37 Issue 2Article No.: 27pp 1–15https://doi.org/10.1145/3181973

Published:21 May 2018Publication History

ACM Transactions on Graphics

Abstract

We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows us to resolve the ambiguities of the monocular reconstruction problem based on a low-dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness, and scene complexity that can be handled.

Supplemental Material

tog37-2-a27-xu.mp4

mp4

295.4 MB

Download

Available for Download

zip

xu.zip (77 MB)

Supplemental movie and image files for, MonoPerfCap: Human Performance Capture From Monocular Video

References

Ijaz Akhter and Michael J. Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1446--1455.Google Scholar
Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarDigital Library
Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people. ACM Trans. Graph. 24, 3, 408--416. Google ScholarDigital Library
Alexandru O. Balan, Leonid Sigal, Michael J. Black, James E. Davis, and Horst W. Haussecker. 2007. Detailed human shape and pose from images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). 1--8.Google Scholar
A. Bartoli, Y. Gérard, F. Chadebecq, T. Collins, and D. Pizarro. 2015. Shape-from-template. IEEE Trans. Pattern Anal. Mach. Intell. 37, 10, 2099--2118. Google ScholarDigital Library
Federica Bogo, Michael J. Black, Matthew Loper, and Javier Romero. 2015. Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In Proceedings of the International Conference on Computer Vision (ICCV’15). 2300--2308. Google ScholarDigital Library
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google Scholar
Derek Bradley, Tiberiu Popa, Alla Sheffer, Wolfgang Heidrich, and Tamy Boubekeur. 2008. Markerless garment capture. ACM Trans. Graph. 27, 99. Google ScholarDigital Library
Matthieu Bray, Pushmeet Kohli, and Philip H. S. Torr. 2006. Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In Proceedings of the European Conference on Computer Vision (ECCV’06). Springer, 642--655. Google ScholarDigital Library
Thomas Brox, Bodo Rosenhahn, Daniel Cremers, and Hans-Peter Seidel. 2006. High-accuracy optical flow serves 3D pose tracking: Exploiting contour and flow-based constraints. In Proceedings of the European Conference on Computer Vision (ECCV’06). Springer, 98--111. Google ScholarDigital Library
Thomas Brox, Bodo Rosenhahn, Juergen Gall, and Daniel Cremers. 2010. Combined region and motion-based 3D tracking of rigid and articulated objects. IEEE Trans. Pattern Anal. Mach. Intell. 32, 3, 402--415. Google ScholarDigital Library
Cedric Cagniart, Edmond Boyer, and Slobodan Ilic. 2010. Free-form mesh tracking: A patch-based approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 1339--1346.Google ScholarCross Ref
Joel Carranza, Christian Theobalt, Marcus A. Magnor, and Hans-Peter Seidel. 2003. Free-viewpoint video of human actors. ACM Trans. Graph. 22, 3, 569--577. Google ScholarDigital Library
Yu Chen, Tae-Kyun Kim, and Roberto Cipolla. 2010. Inferring 3D shapes and deformations from single views. In Proceedings of the European Conference on Computer Vision (ECCV’10). 300--313. Google ScholarDigital Library
Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 4, 69. Google ScholarDigital Library
Edilson De Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. 2008. Performance capture from sparse multi-view video. In ACM Trans. Graph. 27, 98. Google ScholarDigital Library
Mingsong Dou, Henry Fuchs, and Jan-Michael Frahm. 2013. Scanning and tracking dynamic objects with commodity depth cameras. In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR’13). IEEE, Los Alamitos, CA, 99--106.Google Scholar
Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, and others. 2016. Fusion4D: Real-time performance capture of challenging scenes. ACM Trans. Graph. 35, 4, 114. Google ScholarDigital Library
Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Micha Andriluka, Chris Bregler, Bernt Schiele, and Christian Theobalt. 2015. Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3810--3818.Google ScholarCross Ref
Juergen Gall, Carsten Stoll, Edilson De Aguiar, Christian Theobalt, Bodo Rosenhahn, and Hans-Peter Seidel. 2009. Motion capture using joint skeleton tracking and surface estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Los Alamitos, CA, 1746--1753.Google ScholarCross Ref
R. Garg, A. Roussos, and L. Agapito. 2013. Dense variational reconstruction of non-rigid surfaces from monocular video. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. 1272--1279. Google ScholarDigital Library
Pablo Garrido, Michael Zollhoefer, Dan Casas, Levi Valgaerts, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM Trans. Graph. 35, 3 28:1--28:15. Google ScholarDigital Library
Daniel Grest, Dennis Herzog, and Reinhard Koch. 2005. Human model fitting from monocular posture images. In Proceedings of the Conference on Vision, Modeling and Visualization (VMV’05).Google Scholar
Peng Guan, Alexander Weiss, Alexandru O Bălan, and Michael J Black. 2009. Estimating human shape and pose from a single image. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’09). 1381--1388.Google Scholar
Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, and Qionghai Dai. 2015. Robust non-rigid motion tracking and surface reconstruction using L0 regularization. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 3083--3091. Google ScholarDigital Library
Nils Hasler, Hanno Ackermann, Bodo Rosenhahn, Thorsten Thormählen, and Hans-Peter Seidel. 2010. Multilinear pose and body shape estimation of dressed subjects from image sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 1823--1830.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the EEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarCross Ref
Thomas Helten, Meinard Muller, Hans-Peter Seidel, and Christian Theobalt. 2013. Real-time body tracking with one depth camera and inertial sensors. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). Google ScholarDigital Library
Yinghao Huang, Federica Bogo, Christoph Lassner, Angjoo Kanazawa, Peter V. Gehler, Javier Romero, Ijaz Akhter, and Michael J. Black. 2017. Towards accurate marker-less human shape and pose estimation over time. In Proceedings of the International Conference on 3D Vision (3DV’17).Google Scholar
Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. 2016. VolumeDeform: Real-time volumetric non-rigid reconstruction. In Computer Vision—ECCV 2016. Springer, 17.Google ScholarCross Ref
Catalin Ionescu, Joao Carreira, and Cristian Sminchisescu. 2014a. Iterated second-order label sensitive pooling for 3D human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 1661--1668. Google ScholarDigital Library
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014b. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 7, 1325--1339. Google ScholarDigital Library
Arjun Jain, Thorsten Thormählen, Hans-Peter Seidel, and Christian Theobalt. 2010. MovieReshape: Tracking and reshaping of humans in videos. ACM Trans. Graph. 29, 5, Article 148. Google ScholarDigital Library
Arjun Jain, Jonathan Tompson, Yann LeCun, and Christoph Bregler. 2014. Modeep: A deep learning framework using motion features for human pose estimation. In Proceedings of the Asian Conference on Computer Vision (ACCV’14). 302--315.Google Scholar
Sam Johnson and Mark Everingham. 2011. Learning effective human pose estimation from inaccurate annotation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
Ladislav Kavan, Steven Collins, Jiří Žára, and Carol O’Sullivan. 2007. Skinning with dual quaternions. In Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games (I3D’07). Google ScholarDigital Library
J. P. Lewis, Matt Cordner, and Nickson Fong. 2000. Pose Space Deformation: A unified approach to shape interpolation and skeleton-driven deformation. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’00). 165--172. Google ScholarDigital Library
Hao Li, Bart Adams, Leonidas J. Guibas, and Mark Pauly. 2009. Robust single-view geometry and motion reconstruction. ACM Trans. Graph. 28, 5, Article 175. Google ScholarDigital Library
Sijin Li and Antoni B Chan. 2014. 3D human pose estimation from monocular images with deep convolutional neural network. In Proceedings of the Asian Conference on Computer Vision (ACCV’14). 332--347.Google Scholar
Sijin Li, Weichen Zhang, and Antoni B Chan. 2015. Maximum-margin structured learning with deep networks for 3D human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 2848--2856. Google ScholarDigital Library
Yebin Liu, Carsten Stoll, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Markerless motion capture of interacting characters using multi-view image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, Los Alamitos, CA, 1249--1256. Google ScholarDigital Library
Matthew Loper, Naureen Mahmood, and Michael J. Black. 2014. MoSh: Motion and shape capture from sparse markers. ACM Trans. Graph. 33, 6, 220. Google ScholarDigital Library
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. 34, 6, Article 248. Google ScholarDigital Library
Wojciech Matusik, Chris Buehler, Ramesh Raskar, Steven J. Gortler, and Leonard McMillan. 2000. Image-based visual hulls. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. 369--374. Google ScholarDigital Library
Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2016. Monocular 3D human pose estimation using transfer learning and improved CNN supervision. arXiv:1611.09813.Google Scholar
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36, 4, 14. Google ScholarDigital Library
Greg Mori and Jitendra Malik. 2006. Recovering 3D human body configurations using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 28, 7, 1052--1062. Google ScholarDigital Library
Armin Mustafa, Hansung Kim, Jean-Yves Guillemaut, and Adrian Hilton. 2015. General dynamic scene reconstruction from multiple view video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). Google ScholarDigital Library
Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).Google Scholar
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. arXiv:1603.06937.Google Scholar
Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh. 2015. 3D trajectory reconstruction under perspective projection. Int. J. Comput. Vision 115, 2, 115--135. Google ScholarDigital Library
Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2016. Coarse-to-fine volumetric prediction for single-image 3D human pose. arXiv:1611.07828.Google Scholar
Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarCross Ref
Ralf Plänkers and Pascal Fua. 2001. Tracking and modeling people in video sequences. Comput. Vision Image Understand. 81, 3, 285--302. Google ScholarDigital Library
Helge Rhodin, Nadia Robertini, Dan Casas, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2016. General automatic human shape and motion capture using volumetric contour cues. In Proceedings of the European Conference on Computer Vision (ECCV’16). 509--526.Google ScholarCross Ref
Nadia Robertini, Dan Casas, Helge Rhodin, Hans-Peter Seidel, and Christian Theobalt. 2016. Model-based outdoor performance capture. In Proceedings of the International Conference on Computer Vision (3DV’16).Google ScholarCross Ref
Lorenz Rogge, Felix Klose, Michael Stengel, Martin Eisemann, and Marcus Magnor. 2014. Garment replacement in monocular video sequences. ACM Trans. Graph. 34, 1, 6. Google ScholarDigital Library
Rómer Rosales and Stan Sclaroff. 2006. Combining generative and discriminative models in a framework for articulated pose estimation. Int. J. Comput. Vis. 67, 3, 251--276. Google ScholarDigital Library
Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. GrabCut: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 309--314. Google ScholarDigital Library
Chris Russell, Rui Yu, and Lourdes Agapito. 2014. Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes. Springer International Publishing, Cham, 583--598.Google Scholar
Mathieu Salzmann and Pascal Fua. 2011. Linear local models for monocular reconstruction of deformable surfaces. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5, 931--944. Google ScholarDigital Library
J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. 2011. Real-time human pose recognition in parts from single depth images. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 1297--1304. Google ScholarDigital Library
Hedvig Sidenbladh, Michael J. Black, and David J. Fleet. 2000. Stochastic tracking of 3D human figures using 2D image motion. In Proceedings of the European Conference on Computer Vision (ECCV’00). 702--718. Google ScholarDigital Library
Leonid Sigal, Alexandru Balan, and Michael J. Black. 2007. Combined discriminative and generative articulated pose and non-rigid shape estimation. In Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, 1337--1344. Google ScholarDigital Library
Edgar Simo-Serra, Arnau Ramisa, Guillem Alenyà, Carme Torras, and Francesc Moreno-Noguer. 2012. Single image 3D human pose estimation from noisy observations. In Proceedings of the EEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, Los Alamitos, CA, 2673--2680. Google ScholarDigital Library
Cristian Sminchisescu, Atul Kanaujia, and Dimitris Metaxas. 2006. Learning joint top-down and bottom-up processes for 3D visual inference. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, Los Alamitos, CA, 1743--1752. Google ScholarDigital Library
Cristian Sminchisescu and Bill Triggs. 2003a. Estimating articulated human motion with covariance scaled sampling. Int. J. Robot. Res. 22, 6, 371--391.Google ScholarCross Ref
Cristian Sminchisescu and Bill Triggs. 2003b. Kinematic jump processes for monocular 3D human tracking. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’03), Vol. 1. IEEE, Los Alamitos, CA, I--69. Google ScholarDigital Library
Dan Song, Ruofeng Tong, Jian Chang, Xiaosong Yang, Min Tang, and Jian Jun Zhang. 2016. 3D body shapes estimation from dressed-human silhouettes. In Computer Graphics Forum, Vol. 35. Wiley Online Library, 147--156. Google ScholarDigital Library
Olga Sorkine and Marc Alexa. 2007. As-rigid-as-possible surface modeling. In Proceedings of the 5th Eurographics Symposium on Geometry Processing (SGP’07). Google ScholarDigital Library
Jonathan Starck and Adrian Hilton. 2007. Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27, 3, 21--31. Google ScholarDigital Library
Carsten Stoll, Nils Hasler, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Fast articulated motion tracking using a sums of Gaussians body model. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11). 951--958. Google ScholarDigital Library
Robert W. Sumner, Johannes Schmid, and Mark Pauly. 2007. Embedded deformation for shape manipulation. ACM Trans. Graph. 26, 3, 80. Google ScholarDigital Library
Camillo J. Taylor. 2000. Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’00), Vol. 1. 677--684.Google ScholarCross Ref
Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2016. Structured prediction of 3D human pose with deep neural networks. In Proceedings of the British Machine Vision Conference (BMVC’16).Google ScholarCross Ref
J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Los Alamitos, CA.Google Scholar
Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’14). 1653--1660. Google ScholarDigital Library
Raquel Urtasun, David J. Fleet, and Pascal Fua. 2005. Monocular 3D tracking of the golf swing. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’05). 932--938. Google ScholarDigital Library
Raquel Urtasun, David J. Fleet, and Pascal Fua. 2006. Temporal motion models for monocular and multiview 3D human body tracking. Comput. Vision Image Understand. 104, 2, 157--177. Google ScholarDigital Library
Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27, 97. Google ScholarDigital Library
Daniel Vlasic, Pieter Peers, Ilya Baran, Paul Debevec, Jovan Popović, Szymon Rusinkiewicz, and Wojciech Matusik. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Trans. Graph. 28, 5, 174. Google ScholarDigital Library
Chunyu Wang, Yizhou Wang, Zhouchen Lin, Alan L. Yuille, and Wen Gao. 2014. Robust estimation of 3D human poses from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 2361--2368. Google ScholarDigital Library
Ruizhe Wang, Lingyu Wei, Etienne Vouga, Qixing Huang, Duygu Ceylan, Gerard Medioni, and Hao Li. 2016. Capturing dynamic textured surfaces of moving targets. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google ScholarCross Ref
Michael Waschbüsch, Stephan Würmlin, Daniel Cotting, Filip Sadlo, and Markus Gross. 2005. Scalable 3D video of dynamic scenes. Visual Comput. 21, 8--10, 629--638.Google ScholarCross Ref
Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarCross Ref
Xiaolin Wei and Jinxiang Chai. 2010. Videomocap: Modeling physically realistic human motion from monocular video sequences. ACM Trans. Graph. 29, 42. Google ScholarDigital Library
Christopher Richard Wren, Ali Azarbayejani, Trevor Darrell, and Alex Paul Pentland. 1997. Pfinder: Real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 19, 7, 780--785. Google ScholarDigital Library
Chenglei Wu, Carsten Stoll, Levi Valgaerts, and Christian Theobalt. 2013. On-set performance capture of multiple actors with a stereo camera. ACM Trans. Graph. 32, 161:1--161:11. Google ScholarDigital Library
Chenglei Wu, Kiran Varanasi, and Christian Theobalt. 2012. Full body performance capture under uncontrolled and varying illumination: A shading-based approach. In Proceedings of the European Conference on Computer Vision (ECCV’12). 757--770. Google ScholarDigital Library
Weipeng Xu, Mathieu Salzmann, Yongtian Wang, and Yue Liu. 2015. Deformable 3D fusion: From partial dynamic 3D observations to complete 4D models. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 2183--2191. Google ScholarDigital Library
Hashim Yasin, Umar Iqbal, Björn Krüger, Andreas Weber, and Juergen Gall. 2016. A dual-source approach for 3D pose estimation from a single image. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarCross Ref
Genzhi Ye, Yebin Liu, Nils Hasler, Xiangyang Ji, Qionghai Dai, and Christian Theobalt. 2012. Performance capture of interacting characters with handheld kinects. In Proceedings of the European Conference on Computer Vision (ECCV’12), Vol. 7573 LNCS. 828--841.Google ScholarCross Ref
Rui Yu, Chris Russell, Neill D. F. Campbell, and Lourdes Agapito. 2015. Direct, dense, and deformable: Template-based non-rigid 3D reconstruction from RGB video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). Google ScholarDigital Library
Qing Zhang, Bo Fu, Mao Ye, and Ruigang Yang. 2014. Quality dynamic human body modeling using a single low-cost depth camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE, 676--683. Google ScholarDigital Library
Shizhe Zhou, Hongbo Fu, Ligang Liu, Daniel Cohen-Or, and Xiaoguang Han. 2010. Parametric reshaping of human bodies in images. ACM Trans. Graph. (TOG) 29, 4 (2010), 126. Google ScholarDigital Library
Xiaowei Zhou, Spyridon Leonardos, Xiaoyan Hu, and Kostas Daniilidis. 2015. 3D shape estimation from 2D landmarks: A convex relaxation approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 4447--4455.Google ScholarCross Ref
Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016a. Deep kinematic pose regression. arXiv Preprint arXiv:1609.05317 (2016).Google Scholar
Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Konstantinos G. Derpanis, and Kostas Daniilidis. 2016b. Sparseness meets deepness: 3D human pose estimation from monocular video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4966--4975.Google ScholarCross Ref
Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rhemann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. 33, 4, Article 156. Google ScholarDigital Library

Index Terms

MonoPerfCap: Human Performance Capture From Monocular Video
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Motion capture

Recommendations

LiveCap: Real-Time Human Performance Capture From Monocular Video

We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-...
Read More
Uncalibrated multi-view multiple humans association and 3D pose estimation by adversarial learning
Abstract
Multiple human 3D pose estimation is a useful but challenging task in computer vison applications. The ambiguities in estimation of 2D and 3D poses of multiple persons can be verified by using multi-view frames, in which the occluded or self-...
Read More
Tracking the articulated motion of the human body with two RGBD cameras

We present a model-based, top-down solution to the problem of tracking the 3D position, orientation and full articulation of the human body from markerless visual observations obtained by two synchronized RGBD cameras. Inspired by recent advances to the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Graphics Volume 37, Issue 2
April 2018
244 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3191713
Editor:
Kavita Bala
Cornell University
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 May 2018
- Revised: 1 February 2018
- Accepted: 1 February 2018
- Received: 1 September 2017
Published in tog Volume 37, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D pose estimation
Monocular performance capture
human body
non-rigid surface deformation
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 137
  Total Citations
  View Citations
- 956
  Total Downloads
- Downloads (Last 12 months)85
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MonoPerfCap: Human Performance Capture From Monocular Video

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

LiveCap: Real-Time Human Performance Capture From Monocular Video

Uncalibrated multi-view multiple humans association and 3D pose estimation by adversarial learning

Tracking the articulated motion of the human body with two RGBD cameras

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

MonoPerfCap: Human Performance Capture From Monocular Video

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

LiveCap: Real-Time Human Performance Capture From Monocular Video

Uncalibrated multi-view multiple humans association and 3D pose estimation by adversarial learning

Tracking the articulated motion of the human body with two RGBD cameras

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media