ABSTRACT
We present an end-to-end system for augmented and virtual reality telepresence, called Holoportation. Our system demonstrates high-quality, real-time 3D reconstructions of an entire space, including people, furniture and objects, using a set of new depth cameras. These 3D models can also be transmitted in real-time to remote users. This allows users wearing virtual or augmented reality displays to see, hear and interact with remote participants in 3D, almost as if they were present in the same physical space. From an audio-visual perspective, communicating and interacting with remote users edges closer to face-to-face communication. This paper describes the Holoportation technical system in full, its key interactive capabilities, the application scenarios it enables, and an initial qualitative study of using this new communication medium.
Supplemental Material
- 1. Balogh, T., and Kovács, P. T. Real-time 3d light field transmission. In SPIE Photonics Europe, International Society for Optics and Photonics (2010), 772406--772406.Google Scholar
- 2. Barnes, C., Shechtman, E., Finkelstein, A., and Goldman, D. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM SIGGRAPH and Transaction On Graphics (2009). Google ScholarDigital Library
- 3. Batlle, J., Mouaddib, E., and Salvi, J. Recent progress in coded structured light as a technique to solve the correspondence problem: a survey. Pattern recognition 31, 7 (1998), 963--982. Google ScholarCross Ref
- 4. Beck, S., Kunert, A., Kulik, A., and Froehlich, B. Immersive group-to-group telepresence. Visualization and Computer Graphics, IEEE Transactions on 19, 4 (2013), 616--625. Google ScholarDigital Library
- 5. Benko, H., Jota, R., and Wilson, A. Miragetable: freehand interaction on a projected augmented reality tabletop. In Proceedings of the SIGCHI conference on human factors in computing systems, ACM (2012), 199--208. Google ScholarDigital Library
- 6. Besl, P. J. Active, optical range imaging sensors. Machine vision and applications 1, 2 (1988), 127--152. Google ScholarDigital Library
- 7. Blanche, P.-A., Bablumian, A., Voorakaranam, R., Christenson, C., Lin, W., Gu, T., Flores, D., Wang, P., Hsieh, W.-Y., Kathaperumal, M., et al. Holographic three-dimensional telepresence using large-area photorefractive polymer. Nature 468, 7320 (2010), 80--83. Google ScholarCross Ref
- 8. Bleyer, M., Rhemann, C., and Rother, C. PatchMatch Stereo - Stereo Matching with Slanted Support Windows. In BMVC (2011).Google Scholar
- 9. Bogo, F., Black, M. J., Loper, M., and Romero, J. Detailed full-body reconstructions of moving people from monocular rgb-d sequences. In Proceedings of the IEEE International Conference on Computer Vision (2015), 2300--2308. Google ScholarDigital Library
- 10. Chen, W.-C., Towles, H., Nyland, L., Welch, G., and Fuchs, H. Toward a compelling sensation of telepresence: Demonstrating a portal to a distant (static) office. In Proceedings of the conference on Visualization'00, IEEE Computer Society Press (2000), 327--333. Google ScholarDigital Library
- 11. Collet, A., Chuang, M., Sweeney, P., Gillett, D., Evseev, D., Calabrese, D., Hoppe, H., Kirk, A., and Sullivan, S. High-quality streamable free-viewpoint video. ACM TOG 34, 4 (2015), 69. Google ScholarDigital Library
- 12. de Queiroz, R., and Chou, P. A. Compression of 3d point clouds using a region-adaptive hierarchical transform. Transactions on Image Processing (2016). To appear. Google ScholarDigital Library
- 13. de Queiroz, R., and Chou, P. A. Motion-compensated compression of dynamic voxelized point clouds. Transactions on Image Processing (2016). Submitted.Google Scholar
- 14. Dou, M., and Fuchs, H. Temporally enhanced 3d capture of room-sized dynamic scenes with commodity depth cameras. In Virtual Reality (VR), 2014 iEEE, IEEE (2014), 39--44. Google ScholarCross Ref
- 15. Dou, M., Khamis, S., Degtyarev, Y., Davidson, P., Fanello, S. R., Kowdle, A., Escolano, S. O., Rhemann, C., Kim, D., Taylor, J., Kohli, P., Tankovich, V., and Izadi, S. Fusion4d: Real-time performance capture of challenging scenes. ACM Trans. Graph. 35, 4 (July 2016), 114:1--114:13. Google ScholarDigital Library
- 16. Fanello, S., Rhemann, C., Tankovich, V., Kowdle, A., Orts Escolano, S., Kim, D., and Izadi, S. Hyperdepth: Learning depth from structured light without matching. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016). Google ScholarCross Ref
- 17. Fuchs, H., Bazin, J.-C., et al. Immersive 3d telepresence. Computer, 7 (2014), 46--52. Google ScholarDigital Library
- 18. Fuchs, H., Bishop, G., Arthur, K., McMillan, L., Bajcsy, R., Lee, S., Farid, H., and Kanade, T. Virtual space teleconferencing using a sea of cameras. In Proc. First International Conference on Medical Robotics and Computer Assisted Surgery, vol. 26 (1994).Google Scholar
- 19. Gal, R., Wexler, Y., Ofek, E., Hoppe, H., and Cohen-Or, D. Seamless montage for texturing models. In Computer Graphics Forum, vol. 29, Wiley Online Library (2010), 479--486. Google ScholarCross Ref
- 20. Gibbs, S. J., Arapis, C., and Breiteneder, C. J. Teleport-towards immersive copresence. Multimedia Systems 7, 3 (1999), 214--221. Google ScholarDigital Library
- 21. Gilkey, R. H., and Anderson, T. R., Eds. Binaural and Spatial Hearing in Real and Virtual Environments. Psychology Press, 2009.Google Scholar
- 22. Gross, M., Würmlin, S., Naef, M., Lamboray, E., Spagno, C., Kunz, A., Koller-Meier, E., Svoboda, T., Van Gool, L., Lang, S., et al. blue-c: a spatially immersive display and 3d video portal for telepresence. In ACM Transactions on Graphics (TOG), vol. 22, ACM (2003), 819--827. Google ScholarDigital Library
- 23. Hansard, M., Lee, S., Choi, O., and Horaud, R. P. Time-of-flight cameras: principles, methods and applications. Springer Science & Business Media, 2012. Google ScholarDigital Library
- 24. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R. A., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A. J., and Fitzgibbon, A. W. Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, Santa Barbara, CA, USA, October 16--19, 2011 (2011), 559--568. Google ScholarDigital Library
- 25. Jones, A., Lang, M., Fyffe, G., Yu, X., Busch, J., McDowall, I., Bolas, M., and Debevec, P. Achieving eye contact in a one-to-many 3d video teleconferencing system. ACM Transactions on Graphics (TOG) 28, 3 (2009), 64. Google ScholarDigital Library
- 26. Jones, B., Sodhi, R., Murdock, M., Mehra, R., Benko, H., Wilson, A., Ofek, E., MacIntyre, B., Raghuvanshi, N., and Shapira, L. Roomalive: Magical experiences enabled by scalable, adaptive projector-camera units. In Proceedings of the 27th annual ACM symposium on User interface software and technology, ACM (2014), 637--644. Google ScholarDigital Library
- 27. Jouppi, N. P. First steps towards mutually-immersive mobile telepresence. In Proceedings of the 2002 ACM conference on Computer supported cooperative work, ACM (2002), 354--363. Google ScholarDigital Library
- 28. Kanade, T., Rander, P., and Narayanan, P. Virtualized reality: Constructing virtual worlds from real scenes. IEEE multimedia, 1 (1997), 34--47. Google ScholarDigital Library
- 29. Kauff, P., and Schreer, O. An immersive 3d video-conferencing system using shared virtual team user environments. In Proceedings of the 4th international conference on Collaborative virtual environments, ACM (2002), 105--112. Google ScholarDigital Library
- 30. Kim, K., Bolton, J., Girouard, A., Cooperstock, J., and Vertegaal, R. Telehuman: effects of 3d perspective on gaze and pose estimation with a life-size cylindrical telepresence pod. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2012), 2531--2540. Google ScholarDigital Library
- 31. Kohli, P., Rihan, J., Bray, M., and Torr, P. H. S. Simultaneous segmentation and pose estimation of humans using dynamic graph cuts. IJCV 79, 3 (2008), 285--298. Google ScholarDigital Library
- 32. Krähenbühl, P., and Koltun, V. Efficient inference in fully connected crfs with gaussian edge potentials. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12--14 December 2011, Granada, Spain. (2011), 109--117. Google ScholarDigital Library
- 33. Kurillo, G., Bajcsy, R., Nahrsted, K., and Kreylos, O. Immersive 3d environment for remote collaboration and training of physical activities. In Virtual Reality Conference, 2008. VR'08. IEEE, IEEE (2008), 269--270.Google ScholarCross Ref
- 34. Kuster, C., Ranieri, N., Agustina, Zimmer, H., Bazin, J. C., Sun, C., Popa, T., and Gross, M. Towards next generation 3d teleconferencing systems. 1--4.Google Scholar
- 35. Kuster, C., Ranieri, N., Zimmer, H., Bazin, J., Sun, C., Popa, T., Gross, M., et al. Towards next generation 3d teleconferencing systems. In 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), 2012, IEEE (2012), 1--4.Google Scholar
- 36. Lee, K., Chu, D., Cuervo, E., Kopf, J., Degtyarev, Y., Grizan, S., Wolman, A., and Flinn, J. Outatime: Using speculation to enable low-latency continuous interaction for mobile cloud gaming. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, ACM (2015), 151--165. Google ScholarDigital Library
- 37. Lepetit, V., Moreno-Noguer, F., and Fua, P. Epnp: An accurate o(n) solution to the pnp problem. Int. J. Comput. Vision 81, 2 (Feb. 2009). Google ScholarDigital Library
- 38. Loop, C., Zhang, C., and Zhang, Z. Real-time high-resolution sparse voxelization with application to image-based modeling. In Proceedings of the 5th High-Performance Graphics Conference, ACM (2013), 73--79. Google ScholarDigital Library
- 39. Luff, P., and Heath, C. Mobility in collaboration. In Proceedings of the 1998 ACM conference on Computer supported cooperative work, ACM (1998), 305--314. Google ScholarDigital Library
- 40. Maimone, A., and Fuchs, H. Encumbrance-free telepresence system with real-time 3d capture and display using commodity depth cameras. In Mixed and Augmented Reality (ISMAR), 2011 10th IEEE International Symposium on, IEEE (2011), 137--146. Google ScholarDigital Library
- 41. Maimone, A., and Fuchs, H. Real-time volumetric 3d capture of room-sized scenes for telepresence. In 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), 2012, IEEE (2012), 1--4.Google Scholar
- 42. Maimone, A., Yang, X., Dierk, N., State, A., Dou, M., and Fuchs, H. General-purpose telepresence with head-worn optical see-through displays and projector-based lighting. In Virtual Reality (VR), 2013 IEEE, IEEE (2013), 23--26. Google ScholarCross Ref
- 43. Mark, W. R., McMillan, L., and Bishop, G. Post-rendering 3d warping. In Proceedings of the 1997 symposium on Interactive 3D graphics, ACM (1997), 7--ff. Google ScholarDigital Library
- 44. Matusik, W., and Pfister, H. 3d tv: a scalable system for real-time acquisition, transmission, and autostereoscopic display of dynamic scenes. In ACM Transactions on Graphics (TOG), vol. 23, ACM (2004), 814--824. Google ScholarDigital Library
- 45. Microsoft. Xaudio2 API Programming Reference. https://msdn.microsoft.com/enus/library/windows/desktop/mt186596(v=vs.85).aspx.Google Scholar
- 46. Molyneaux, D., Izadi, S., Kim, D., Hilliges, O., Hodges, S., Cao, X., Butler, A., and Gellersen, H. Interactive environment-aware handheld projectors for pervasive computing spaces. In Pervasive Computing. Springer, 2012, 197--215. Google ScholarDigital Library
- 47. Mori, M., MacDorman, K. F., and Kageki, N. The uncanny valley {from the field}. Robotics & Automation Magazine, IEEE 19, 2 (2012), 98--100. Google ScholarCross Ref
- 48. Nagano, K., Jones, A., Liu, J., Busch, J., Yu, X., Bolas, M., and Debevec, P. An autostereoscopic projector array optimized for 3d facial display. In ACM SIGGRAPH 2013 Emerging Technologies, ACM (2013), 3. Google ScholarDigital Library
- 49. Narayan, K. S., and Abbeel, P. Optimized color models for high-quality 3d scanning. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, IEEE (2015), 2503--2510.Google ScholarCross Ref
- 50. Pejsa, T., Kantor, J., Benko, H., Ofek, E., and Wilson, A. D. Room2room: Enabling life-size telepresence in a projected augmented reality environment. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, CSCW 2016, San Francisco, CA, USA, February 27 - March 2, 2016, D. Gergle, M. R. Morris, P. Bjrn, and J. A. Konstan, Eds., ACM (2016), 1714--1723. Google ScholarDigital Library
- 51. Petit, B., Lesage, J.-D., Menier, C., Allard, J., Franco, J.-S., Raffin, B., Boyer, E., and Faure, F. Multicamera real-time 3d modeling for telepresence and remote collaboration. International Journal of Digital Multimedia Broadcasting 2010 (2009).Google Scholar
- 52. Posdamer, J., and Altschuler, M. Surface measurement by space-encoded projected beam systems. Computer graphics and image processing 18, 1 (1982), 1--17. Google ScholarCross Ref
- 53. Pradeep, V., Rhemann, C., Izadi, S., Zach, C., Bleyer, M., and Bathiche, S. Monofusion: Real-time 3d reconstruction of small scenes with a single web camera. In ISMAR (2013).Google Scholar
- 54. Ren, S., Cao, X., Wei, Y., and Sun, J. Face alignment at 3000 fps via regressing local binary features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), 1685--1692. Google ScholarDigital Library
- 55. Rhemann, C., Hosni, A., Bleyer, M., Rother, C., and Gelautz, M. Fast cost-volume filtering for visual correspondence and beyond. In CVPR (2011). Google ScholarDigital Library
- 56. S. Kosov, T. T., and Seidel, H.-P. Accurate real-time disparity estimation with variational methods. In ISVC (2009). Google ScholarDigital Library
- 57. Scharstein, D., and Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vision 47, 1--3 (Apr. 2002), 7--42. Google ScholarDigital Library
- 58. Sumner, R. W., Schmid, J., and Pauly, M. Embedded deformation for shape manipulation. ACM TOG 26, 3 (2007), 80. Google ScholarDigital Library
- 59. Tanikawa, T., Suzuki, Y., Hirota, K., and Hirose, M. Real world video avatar: real-time and real-size transmission and presentation of human figure. In Proceedings of the 2005 international conference on Augmented tele-existence, ACM (2005), 112--118. Google ScholarDigital Library
- 60. Tombari, F., Mattoccia, S., Stefano, L. D., and Addimanda, E. Near real-time stereo based on effective cost aggregation. In ICPR (2008). Google ScholarCross Ref
- 61. Towles, H., Chen, W.-C., Yang, R., Kum, S.-U., Kelshikar, H. F. N., Mulligan, J., Daniilidis, K., Fuchs, H., Hill, C. C., Mulligan, N. K. J., et al. 3d tele-collaboration over internet2. In In: International Workshop on Immersive Telepresence, Juan Les Pins, Citeseer (2002).Google Scholar
- 62. Vineet, V., Warrell, J., and Torr, P. H. S. Filter-based mean-field inference for random fields with higher-order terms and product label-spaces. In ECCV 2012 - 12th European Conference on Computer Vision, vol. 7576 of Lecture Notes in Computer Science, Springer (2012), 31--44. Google ScholarDigital Library
- 63. Will, P. M., and Pennington, K. S. Grid coding: A preprocessing technique for robot and machine vision. Artificial Intelligence 2, 3 (1972), 319--329.Google Scholar
- 64. Williams, O., Barham, P., Isard, M., Wong, T., Woo, K., Klein, G., Service, D., Michail, A., Pearson, A., Shetter, M., et al. Late stage reprojection, Jan. 29 2015. US Patent App. 13/951,351.Google Scholar
- 65. Yang, Q., Wang, L., Yang, R., Wang, S., Liao, M., and Nistr, D. Real-time global stereo matching using hierarchical belief propagation. In BMVC (2006). Google ScholarCross Ref
- 66. Zhang, C., Cai, Q., Chou, P. A., Zhang, Z., and Martin-Brualla, R. Viewport: A distributed, immersive teleconferencing system with infrared dot pattern. IEEE Multimedia 20, 1 (2013), 17--27. Google ScholarDigital Library
- 67. Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22, 11 (Nov. 2000), 1330--1334. Google ScholarDigital Library
- 68. Zhou, Q.-Y., and Koltun, V. Color map optimization for 3d reconstruction with consumer depth cameras. ACM Transactions on Graphics (TOG) 33, 4 (2014), 155. Google ScholarDigital Library
- 69. Zollhöfer, M., Nießner, M., Izadi, S., Rhemann, C., Zach, C., Fisher, M., Wu, C., Fitzgibbon, A., Loop, C., Theobalt, C., and Stamminger, M. Real-time non-rigid reconstruction using an rgb-d camera. ACM Transactions on Graphics (TOG) 33, 4 (2014). Google ScholarDigital Library
- 70. Zuckerman, M., DePaulo, B. M., and Rosenthal, R. Verbal and nonverbal communication of deception. Advances in experimental social psychology 14, 1 (1981), 59. Google ScholarCross Ref
Index Terms
- Holoportation: Virtual 3D Teleportation in Real-time
Recommendations
A mixed reality telepresence system with limited DOF motion base and immersive display
ACE '09: Proceedings of the International Conference on Advances in Computer Entertainment TechnologyThis paper describes a mixed reality (MR) telepresence system for a ride to provide users with a highly realistic sensation. To make a realistic scene in a virtual environment, it is necessary to combine visual information with a reproduction of the ...
VROOM: Virtual Robot Overlay for Online Meetings
CHI EA '20: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing SystemsTelepresence robots allow users to freely explore a remote space and provide a physical embodiment in that space. However, they lack a compelling representation of the remote user in the local space. We present VROOM (Virtual Robot Overlay for Online ...
Real-time 3D video avatar in mixed reality: an implementation for immersive telecommunication
Symposium: virtual reality simulationThis article presents an implementation of a real-time dynamic 3D avatar from multiview cameras for immersive telecommunication. Immersive telecommunication is a new challenging field that enables a user to share a virtual space with remote participants,...
Comments