skip to main content
research-article

MonoPerfCap: Human Performance Capture From Monocular Video

Published:21 May 2018Publication History
Skip Abstract Section

Abstract

We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows us to resolve the ambiguities of the monocular reconstruction problem based on a low-dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness, and scene complexity that can be handled.

Skip Supplemental Material Section

Supplemental Material

tog37-2-a27-xu.mp4

mp4

295.4 MB

References

  1. Ijaz Akhter and Michael J. Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1446--1455.Google ScholarGoogle Scholar
  2. Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people. ACM Trans. Graph. 24, 3, 408--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alexandru O. Balan, Leonid Sigal, Michael J. Black, James E. Davis, and Horst W. Haussecker. 2007. Detailed human shape and pose from images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). 1--8.Google ScholarGoogle Scholar
  5. A. Bartoli, Y. Gérard, F. Chadebecq, T. Collins, and D. Pizarro. 2015. Shape-from-template. IEEE Trans. Pattern Anal. Mach. Intell. 37, 10, 2099--2118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Federica Bogo, Michael J. Black, Matthew Loper, and Javier Romero. 2015. Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In Proceedings of the International Conference on Computer Vision (ICCV’15). 2300--2308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google ScholarGoogle Scholar
  8. Derek Bradley, Tiberiu Popa, Alla Sheffer, Wolfgang Heidrich, and Tamy Boubekeur. 2008. Markerless garment capture. ACM Trans. Graph. 27, 99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Matthieu Bray, Pushmeet Kohli, and Philip H. S. Torr. 2006. Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In Proceedings of the European Conference on Computer Vision (ECCV’06). Springer, 642--655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Thomas Brox, Bodo Rosenhahn, Daniel Cremers, and Hans-Peter Seidel. 2006. High-accuracy optical flow serves 3D pose tracking: Exploiting contour and flow-based constraints. In Proceedings of the European Conference on Computer Vision (ECCV’06). Springer, 98--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Thomas Brox, Bodo Rosenhahn, Juergen Gall, and Daniel Cremers. 2010. Combined region and motion-based 3D tracking of rigid and articulated objects. IEEE Trans. Pattern Anal. Mach. Intell. 32, 3, 402--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cedric Cagniart, Edmond Boyer, and Slobodan Ilic. 2010. Free-form mesh tracking: A patch-based approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 1339--1346.Google ScholarGoogle ScholarCross RefCross Ref
  13. Joel Carranza, Christian Theobalt, Marcus A. Magnor, and Hans-Peter Seidel. 2003. Free-viewpoint video of human actors. ACM Trans. Graph. 22, 3, 569--577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yu Chen, Tae-Kyun Kim, and Roberto Cipolla. 2010. Inferring 3D shapes and deformations from single views. In Proceedings of the European Conference on Computer Vision (ECCV’10). 300--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 4, 69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Edilson De Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. 2008. Performance capture from sparse multi-view video. In ACM Trans. Graph. 27, 98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mingsong Dou, Henry Fuchs, and Jan-Michael Frahm. 2013. Scanning and tracking dynamic objects with commodity depth cameras. In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR’13). IEEE, Los Alamitos, CA, 99--106.Google ScholarGoogle Scholar
  18. Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, and others. 2016. Fusion4D: Real-time performance capture of challenging scenes. ACM Trans. Graph. 35, 4, 114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Micha Andriluka, Chris Bregler, Bernt Schiele, and Christian Theobalt. 2015. Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3810--3818.Google ScholarGoogle ScholarCross RefCross Ref
  20. Juergen Gall, Carsten Stoll, Edilson De Aguiar, Christian Theobalt, Bodo Rosenhahn, and Hans-Peter Seidel. 2009. Motion capture using joint skeleton tracking and surface estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Los Alamitos, CA, 1746--1753.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. Garg, A. Roussos, and L. Agapito. 2013. Dense variational reconstruction of non-rigid surfaces from monocular video. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. 1272--1279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Pablo Garrido, Michael Zollhoefer, Dan Casas, Levi Valgaerts, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM Trans. Graph. 35, 3 28:1--28:15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Daniel Grest, Dennis Herzog, and Reinhard Koch. 2005. Human model fitting from monocular posture images. In Proceedings of the Conference on Vision, Modeling and Visualization (VMV’05).Google ScholarGoogle Scholar
  24. Peng Guan, Alexander Weiss, Alexandru O Bălan, and Michael J Black. 2009. Estimating human shape and pose from a single image. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’09). 1381--1388.Google ScholarGoogle Scholar
  25. Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, and Qionghai Dai. 2015. Robust non-rigid motion tracking and surface reconstruction using L0 regularization. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 3083--3091. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Nils Hasler, Hanno Ackermann, Bodo Rosenhahn, Thorsten Thormählen, and Hans-Peter Seidel. 2010. Multilinear pose and body shape estimation of dressed subjects from image sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 1823--1830.Google ScholarGoogle ScholarCross RefCross Ref
  27. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the EEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarGoogle ScholarCross RefCross Ref
  28. Thomas Helten, Meinard Muller, Hans-Peter Seidel, and Christian Theobalt. 2013. Real-time body tracking with one depth camera and inertial sensors. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yinghao Huang, Federica Bogo, Christoph Lassner, Angjoo Kanazawa, Peter V. Gehler, Javier Romero, Ijaz Akhter, and Michael J. Black. 2017. Towards accurate marker-less human shape and pose estimation over time. In Proceedings of the International Conference on 3D Vision (3DV’17).Google ScholarGoogle Scholar
  30. Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. 2016. VolumeDeform: Real-time volumetric non-rigid reconstruction. In Computer Vision—ECCV 2016. Springer, 17.Google ScholarGoogle ScholarCross RefCross Ref
  31. Catalin Ionescu, Joao Carreira, and Cristian Sminchisescu. 2014a. Iterated second-order label sensitive pooling for 3D human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 1661--1668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014b. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 7, 1325--1339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Arjun Jain, Thorsten Thormählen, Hans-Peter Seidel, and Christian Theobalt. 2010. MovieReshape: Tracking and reshaping of humans in videos. ACM Trans. Graph. 29, 5, Article 148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Arjun Jain, Jonathan Tompson, Yann LeCun, and Christoph Bregler. 2014. Modeep: A deep learning framework using motion features for human pose estimation. In Proceedings of the Asian Conference on Computer Vision (ACCV’14). 302--315.Google ScholarGoogle Scholar
  35. Sam Johnson and Mark Everingham. 2011. Learning effective human pose estimation from inaccurate annotation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ladislav Kavan, Steven Collins, Jiří Žára, and Carol O’Sullivan. 2007. Skinning with dual quaternions. In Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games (I3D’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. P. Lewis, Matt Cordner, and Nickson Fong. 2000. Pose Space Deformation: A unified approach to shape interpolation and skeleton-driven deformation. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’00). 165--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Hao Li, Bart Adams, Leonidas J. Guibas, and Mark Pauly. 2009. Robust single-view geometry and motion reconstruction. ACM Trans. Graph. 28, 5, Article 175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sijin Li and Antoni B Chan. 2014. 3D human pose estimation from monocular images with deep convolutional neural network. In Proceedings of the Asian Conference on Computer Vision (ACCV’14). 332--347.Google ScholarGoogle Scholar
  40. Sijin Li, Weichen Zhang, and Antoni B Chan. 2015. Maximum-margin structured learning with deep networks for 3D human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 2848--2856. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yebin Liu, Carsten Stoll, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Markerless motion capture of interacting characters using multi-view image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, Los Alamitos, CA, 1249--1256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Matthew Loper, Naureen Mahmood, and Michael J. Black. 2014. MoSh: Motion and shape capture from sparse markers. ACM Trans. Graph. 33, 6, 220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. 34, 6, Article 248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Wojciech Matusik, Chris Buehler, Ramesh Raskar, Steven J. Gortler, and Leonard McMillan. 2000. Image-based visual hulls. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. 369--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2016. Monocular 3D human pose estimation using transfer learning and improved CNN supervision. arXiv:1611.09813.Google ScholarGoogle Scholar
  46. Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36, 4, 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Greg Mori and Jitendra Malik. 2006. Recovering 3D human body configurations using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 28, 7, 1052--1062. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Armin Mustafa, Hansung Kim, Jean-Yves Guillemaut, and Adrian Hilton. 2015. General dynamic scene reconstruction from multiple view video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).Google ScholarGoogle Scholar
  50. Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. arXiv:1603.06937.Google ScholarGoogle Scholar
  51. Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh. 2015. 3D trajectory reconstruction under perspective projection. Int. J. Comput. Vision 115, 2, 115--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2016. Coarse-to-fine volumetric prediction for single-image 3D human pose. arXiv:1611.07828.Google ScholarGoogle Scholar
  53. Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarGoogle ScholarCross RefCross Ref
  54. Ralf Plänkers and Pascal Fua. 2001. Tracking and modeling people in video sequences. Comput. Vision Image Understand. 81, 3, 285--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Helge Rhodin, Nadia Robertini, Dan Casas, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2016. General automatic human shape and motion capture using volumetric contour cues. In Proceedings of the European Conference on Computer Vision (ECCV’16). 509--526.Google ScholarGoogle ScholarCross RefCross Ref
  56. Nadia Robertini, Dan Casas, Helge Rhodin, Hans-Peter Seidel, and Christian Theobalt. 2016. Model-based outdoor performance capture. In Proceedings of the International Conference on Computer Vision (3DV’16).Google ScholarGoogle ScholarCross RefCross Ref
  57. Lorenz Rogge, Felix Klose, Michael Stengel, Martin Eisemann, and Marcus Magnor. 2014. Garment replacement in monocular video sequences. ACM Trans. Graph. 34, 1, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Rómer Rosales and Stan Sclaroff. 2006. Combining generative and discriminative models in a framework for articulated pose estimation. Int. J. Comput. Vis. 67, 3, 251--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. GrabCut: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 309--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Chris Russell, Rui Yu, and Lourdes Agapito. 2014. Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes. Springer International Publishing, Cham, 583--598.Google ScholarGoogle Scholar
  61. Mathieu Salzmann and Pascal Fua. 2011. Linear local models for monocular reconstruction of deformable surfaces. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5, 931--944. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. 2011. Real-time human pose recognition in parts from single depth images. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 1297--1304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Hedvig Sidenbladh, Michael J. Black, and David J. Fleet. 2000. Stochastic tracking of 3D human figures using 2D image motion. In Proceedings of the European Conference on Computer Vision (ECCV’00). 702--718. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Leonid Sigal, Alexandru Balan, and Michael J. Black. 2007. Combined discriminative and generative articulated pose and non-rigid shape estimation. In Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, 1337--1344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Edgar Simo-Serra, Arnau Ramisa, Guillem Alenyà, Carme Torras, and Francesc Moreno-Noguer. 2012. Single image 3D human pose estimation from noisy observations. In Proceedings of the EEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, Los Alamitos, CA, 2673--2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Cristian Sminchisescu, Atul Kanaujia, and Dimitris Metaxas. 2006. Learning joint top-down and bottom-up processes for 3D visual inference. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, Los Alamitos, CA, 1743--1752. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Cristian Sminchisescu and Bill Triggs. 2003a. Estimating articulated human motion with covariance scaled sampling. Int. J. Robot. Res. 22, 6, 371--391.Google ScholarGoogle ScholarCross RefCross Ref
  68. Cristian Sminchisescu and Bill Triggs. 2003b. Kinematic jump processes for monocular 3D human tracking. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’03), Vol. 1. IEEE, Los Alamitos, CA, I--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Dan Song, Ruofeng Tong, Jian Chang, Xiaosong Yang, Min Tang, and Jian Jun Zhang. 2016. 3D body shapes estimation from dressed-human silhouettes. In Computer Graphics Forum, Vol. 35. Wiley Online Library, 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Olga Sorkine and Marc Alexa. 2007. As-rigid-as-possible surface modeling. In Proceedings of the 5th Eurographics Symposium on Geometry Processing (SGP’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Jonathan Starck and Adrian Hilton. 2007. Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27, 3, 21--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Carsten Stoll, Nils Hasler, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Fast articulated motion tracking using a sums of Gaussians body model. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11). 951--958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Robert W. Sumner, Johannes Schmid, and Mark Pauly. 2007. Embedded deformation for shape manipulation. ACM Trans. Graph. 26, 3, 80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Camillo J. Taylor. 2000. Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’00), Vol. 1. 677--684.Google ScholarGoogle ScholarCross RefCross Ref
  75. Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2016. Structured prediction of 3D human pose with deep neural networks. In Proceedings of the British Machine Vision Conference (BMVC’16).Google ScholarGoogle ScholarCross RefCross Ref
  76. J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  77. Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’14). 1653--1660. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Raquel Urtasun, David J. Fleet, and Pascal Fua. 2005. Monocular 3D tracking of the golf swing. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’05). 932--938. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Raquel Urtasun, David J. Fleet, and Pascal Fua. 2006. Temporal motion models for monocular and multiview 3D human body tracking. Comput. Vision Image Understand. 104, 2, 157--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27, 97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Daniel Vlasic, Pieter Peers, Ilya Baran, Paul Debevec, Jovan Popović, Szymon Rusinkiewicz, and Wojciech Matusik. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Trans. Graph. 28, 5, 174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Chunyu Wang, Yizhou Wang, Zhouchen Lin, Alan L. Yuille, and Wen Gao. 2014. Robust estimation of 3D human poses from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 2361--2368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Ruizhe Wang, Lingyu Wei, Etienne Vouga, Qixing Huang, Duygu Ceylan, Gerard Medioni, and Hao Li. 2016. Capturing dynamic textured surfaces of moving targets. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google ScholarGoogle ScholarCross RefCross Ref
  84. Michael Waschbüsch, Stephan Würmlin, Daniel Cotting, Filip Sadlo, and Markus Gross. 2005. Scalable 3D video of dynamic scenes. Visual Comput. 21, 8--10, 629--638.Google ScholarGoogle ScholarCross RefCross Ref
  85. Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarGoogle ScholarCross RefCross Ref
  86. Xiaolin Wei and Jinxiang Chai. 2010. Videomocap: Modeling physically realistic human motion from monocular video sequences. ACM Trans. Graph. 29, 42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Christopher Richard Wren, Ali Azarbayejani, Trevor Darrell, and Alex Paul Pentland. 1997. Pfinder: Real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 19, 7, 780--785. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Chenglei Wu, Carsten Stoll, Levi Valgaerts, and Christian Theobalt. 2013. On-set performance capture of multiple actors with a stereo camera. ACM Trans. Graph. 32, 161:1--161:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Chenglei Wu, Kiran Varanasi, and Christian Theobalt. 2012. Full body performance capture under uncontrolled and varying illumination: A shading-based approach. In Proceedings of the European Conference on Computer Vision (ECCV’12). 757--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Weipeng Xu, Mathieu Salzmann, Yongtian Wang, and Yue Liu. 2015. Deformable 3D fusion: From partial dynamic 3D observations to complete 4D models. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 2183--2191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Hashim Yasin, Umar Iqbal, Björn Krüger, Andreas Weber, and Juergen Gall. 2016. A dual-source approach for 3D pose estimation from a single image. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarGoogle ScholarCross RefCross Ref
  92. Genzhi Ye, Yebin Liu, Nils Hasler, Xiangyang Ji, Qionghai Dai, and Christian Theobalt. 2012. Performance capture of interacting characters with handheld kinects. In Proceedings of the European Conference on Computer Vision (ECCV’12), Vol. 7573 LNCS. 828--841.Google ScholarGoogle ScholarCross RefCross Ref
  93. Rui Yu, Chris Russell, Neill D. F. Campbell, and Lourdes Agapito. 2015. Direct, dense, and deformable: Template-based non-rigid 3D reconstruction from RGB video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Qing Zhang, Bo Fu, Mao Ye, and Ruigang Yang. 2014. Quality dynamic human body modeling using a single low-cost depth camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE, 676--683. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Shizhe Zhou, Hongbo Fu, Ligang Liu, Daniel Cohen-Or, and Xiaoguang Han. 2010. Parametric reshaping of human bodies in images. ACM Trans. Graph. (TOG) 29, 4 (2010), 126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Xiaowei Zhou, Spyridon Leonardos, Xiaoyan Hu, and Kostas Daniilidis. 2015. 3D shape estimation from 2D landmarks: A convex relaxation approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 4447--4455.Google ScholarGoogle ScholarCross RefCross Ref
  97. Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016a. Deep kinematic pose regression. arXiv Preprint arXiv:1609.05317 (2016).Google ScholarGoogle Scholar
  98. Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Konstantinos G. Derpanis, and Kostas Daniilidis. 2016b. Sparseness meets deepness: 3D human pose estimation from monocular video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4966--4975.Google ScholarGoogle ScholarCross RefCross Ref
  99. Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rhemann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. 33, 4, Article 156. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. MonoPerfCap: Human Performance Capture From Monocular Video

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 37, Issue 2
      April 2018
      244 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3191713
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 May 2018
      • Revised: 1 February 2018
      • Accepted: 1 February 2018
      • Received: 1 September 2017
      Published in tog Volume 37, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader