ABSTRACT
Properly handling occlusion between real and virtual objects is an important property for any mixed reality (MR) system. Existing methods have typically required known geometry of the real objects in the scene, either specified manually, or reconstructed using a dense mapping algorithm. This limits the situations in which they can be applied. Modern RGBD cameras are cheap and widely available, but the depth information they provide is typically too noisy and incomplete to use directly to provide quality results.
In this paper, a method is proposed which makes use of both the colour and depth information provided by an RGBD camera to provide improved occlusion. This method, Cost Volume Filtering Occlusion, is capable of running in real time, and can also handle occlusion of virtual objects by dynamic, moving objects - such as the user's hands. The method operates on individual RGBD frames as they arrive, meaning it can function immediately in unknown environments, and respond appropriately to sudden changes. The accuracy of the presented method is quantified using a novel approach capable of comparing the results of algorithms such as this to dense SLAM-based approaches. The proposed approach is shown to be capable of producing superior results to both previous image-based approaches and dense RGBD reconstruction, at lower computational cost.
Supplemental Material
- M-O Berger. 1997. Resolving occlusion in augmented reality: a contour based approach without 3D reconstruction. In Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on. IEEE, 91--96.Google ScholarCross Ref
- David E Breen, Ross T Whitaker, Eric Rose, and Mihran Tuceryan. 1996. Interactive occlusion and automatic object placement for augmented reality. In Computer Graphics Forum, Vol. 15. Wiley Online Library, 11--22.Google Scholar
- Derek Chan, Hylke Buisman, Christian Theobalt, and Sebastian Thrun. 2008. A noise-aware filter for real-time depth upsampling. In Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications-M2SFA2 2008.Google Scholar
- Chongyu Chen, Jianfei Cai, Jianmin Zheng, Tat-Jen Cham, and Guangming Shi. 2013. A color-guided, region-adaptive and depth-selective unified framework for Kinect depth recovery. In Multimedia Signal Processing (MMSP), 2013 IEEE 15th International Workshop on. IEEE, 007--012. Google ScholarCross Ref
- Li Chen, Hui Lin, and Shutao Li. 2012. Depth image enhancement for Kinect using region growing and bilateral filter. In Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, 3070--3073.Google Scholar
- Ji-Ho Cho, Satoshi Ikehata, Hyunjin Yoo, Margrit Gelautz, and Kiyoharu Aizawa. 2013. Depth map up-sampling using cost-volume filtering. In IVMSP Workshop, 2013 IEEE 11th. IEEE, 1--4. Google ScholarCross Ref
- Yung-Yu Chuang, Brian Curless, David H Salesin, and Richard Szeliski. 2001. A bayesian approach to digital matting. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 2. IEEE, II--II.Google ScholarCross Ref
- Ryan Crabb, Colin Tracey, Akshaya Puranik, and James Davis. 2008. Real-time foreground segmentation via range and color imaging. In Computer Vision and Pattern Recognition Workshops, 2008. CVPRW'08. IEEE Computer Society Conference on. IEEE, 1--5. Google ScholarCross Ref
- Franklin C Crow. 1984. Summed-area tables for texture mapping. ACM SIGGRAPH computer graphics 18, 3 (1984), 207--212. Google ScholarDigital Library
- Chao Du, Yen-Lin Chen, Mao Ye, and Liu Ren. 2016. Edge Snapping-Based Depth Enhancement for Dynamic Occlusion Handling in Augmented Reality. In Mixed and Augmented Reality (ISMAR), 2016 IEEE International Symposium on. IEEE, 54--62. Google ScholarCross Ref
- Elmar Eisemann and Frédo Durand. 2004. Flash photography enhancement via intrinsic relighting. In ACM transactions on graphics (TOG), Vol. 23. ACM, 673--678. Google ScholarDigital Library
- Stephen R. Ellis and Brian M. Menges. 1998. Localization of virtual objects in the near visual field. Human factors 40, 3 (09 1998), 415. http://search.proquest.com/docview/216459348?accountid=14511Google Scholar
- Steven Feiner, Blair Macintyre, and Dorée Seligmann. 1993. Knowledge-based augmented reality. Commun. ACM 36, 7 (1993), 53--62. Google ScholarDigital Library
- S. Garrido-Jurado, R. Muaoz-Salinas, F.J. Madrid-Cuevas, and M.J. Maran-Jiminez. 2014. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition 47, 6 (2014), 2280 -- 2292. Google ScholarDigital Library
- Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided image filtering. IEEE transactions on pattern analysis and machine intelligence 35, 6 (2013), 1397--1409. Google ScholarDigital Library
- Asmaa Hosni, Christoph Rhemann, Michael Bleyer, Carsten Rother, and Margrit Gelautz. 2013. Fast cost-volume filtering for visual correspondence and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 2 (2013), 504--511. Google ScholarDigital Library
- Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison, et al. 2011. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 559--568. Google ScholarDigital Library
- O. Kahler, V. A. Prisacariu, C. Y. Ren, X. Sun, P. H. S Torr, and D. W. Murray. 2015. Very High Frame Rate Volumetric Integration of Depth Images on Mobile Device. IEEE Transactions on Visualization and Computer Graphics (Proceedings International Symposium on Mixed and Augmented Reality 2015 22, 11 (2015).Google Scholar
- Denis Kalkofen, Erick Mendez, and Dieter Schmalstieg. 2007. Interactive focus and context visualization for augmented reality. In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. IEEE Computer Society, 1--10. Google ScholarDigital Library
- Masayuki Kanbara, Takashi Okuma, Haruo Takemura, and Naokazu Yokoya. 2000. A stereoscopic video see-through augmented reality system based on real-time vision-based registration. In Virtual Reality, 2000. Proceedings. IEEE. IEEE, 255--262. Google ScholarCross Ref
- Michael Kass, Andrew Witkin, and Demetri Terzopoulos. 1988. Snakes: Active contour models. International journal of computer vision 1, 4 (1988), 321--331. Google ScholarCross Ref
- Georg Klein and Tom Drummond. 2004. Sensor fusion and occlusion refinement for tablet-based AR. In Mixed and Augmented Reality, 2004. ISMAR 2004. Third IEEE and ACM International Symposium on. IEEE, 38--47. Google ScholarDigital Library
- Georg Klein and David Murray. 2007. Parallel Tracking and Mapping for Small AR Workspaces. In Proc. Sixth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR'07). Nara, Japan. Google ScholarDigital Library
- Vincent Lepetit and Marie-Odile Berger. 2000. A semi-automatic method for resolving occlusion in augmented reality. In Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on, Vol. 2. IEEE, 225--230. Google ScholarCross Ref
- Mirna Lerotic, Adrian J Chung, George Mylonas, and Guang-Zhong Yang. 2007. Pq-space based non-photorealistic rendering for augmented reality. In Medical Image Computing and Computer-Assisted Intervention-MICCAI 2007. Springer, 102--109.Google ScholarCross Ref
- Mark A Livingston, J Edward Swan II, Joseph L Gabbard, Tobias H Höllerer, Deborah Hix, Simon J Julier, Yohan Baillot, and Dennis Brown. 2003. Resolving multiple occluded layers in augmented reality. In Proceedings of the 2nd IEEE/ACM International Symposium on Mixed and Augmented Reality. IEEE Computer Society, 56.Google ScholarDigital Library
- Jiangbo Lu, Viet Anh Nguyen, Zeping Niu, Bhavdeep Singh, Zhiping Luo, and Minh N Do. 2011. CuteChat: a lightweight tele-immersive video chat system. In Proceedings of the 19th ACM international conference on Multimedia. ACM, 1309--1312. Google ScholarDigital Library
- Diego Nehab, André Maximo, Rodolfo S Lima, and Hugues Hoppe. 2011. GPU-efficient recursive filtering and summed-area tables. ACM Transactions on Graphics (TOG) 30, 6 (2011), 176.Google ScholarDigital Library
- Chuong V Nguyen, Shahram Izadi, and David Lovell. 2012. Modeling kinect sensor noise for improved 3d reconstruction and tracking. In 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), 2012 Second International Conference on. IEEE, 524--530.Google Scholar
- Christoph Rhemann, Carsten Rother, Jue Wang, Margrit Gelautz, Pushmeet Kohli, and Pamela Rott. 2009. A perceptually motivated online benchmark for image matting. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 1826--1833. Google ScholarCross Ref
- Alvy Ray Smith. 1995. Alpha and the history of digital compositing. https://www.cs.princeton.edu/courses/archive/fall00/cs426/papers/smith95c.pdf (1995).Google Scholar
- Alvy Ray Smith and James F Blinn. 1996. Blue screen matting. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 259--268. Google ScholarDigital Library
- Takahiro Tsuda, Haruyoshi Yamamoto, Yoshinari Kameda, and Yuichi Ohta. 2006. Visualization methods for outdoor see-through vision. IEICE transactions on information and systems 89, 6 (2006), 1781--1789. Google ScholarDigital Library
- Jonathan Ventura and Tobias Höllerer. 2009. Online environment model estimation for augmented reality. In Mixed and Augmented Reality, 2009. ISMAR 2009. 8th IEEE International Symposium on. IEEE, 103--106. Google ScholarDigital Library
- Jue Wang and Michael F Cohen. 2008. Image and video matting: a survey. Now Publishers Inc.Google Scholar
- Liang Wang, Minglun Gong, Chenxi Zhang, Ruigang Yang, Cha Zhang, and Yee-Hong Yang. 2012. Automatic real-time video matting using time-of-flight camera and multichannel Poisson equations. International journal of computer vision 97, 1 (2012), 104--121. Google ScholarDigital Library
- Liang Wang, Chenxi Zhang, Ruigang Yang, and Cha Zhang. 2010. Tofcut: Towards robust real-time foreground extraction using a time-of-flight camera. In Proc. of 3DPVT.Google Scholar
- Oliver Wang, Jonathan Finger, Qingxiong Yang, James Davis, and Ruigang Yang. 2007. Automatic natural video matting with depth. In Computer Graphics and Applications, 2007. PG'07. 15th Pacific Conference on. IEEE, 469--472. Google ScholarDigital Library
- Matthias M Wloka and Brian G Anderson. 1995. Resolving occlusion in augmented reality. In Proceedings of the 1995 symposium on Interactive 3D graphics. ACM, 5--12.Google ScholarDigital Library
- Qingxiong Yang, Ruigang Yang, James Davis, and David Nistér. 2007. Spatial-depth super resolution for range images. In Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 1--8. Google ScholarCross Ref
- Jiejie Zhu, Miao Liao, Ruigang Yang, and Zhigeng Pan. 2009. Joint depth and alpha matte optimization via fusion of stereo and time-of-flight sensor. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 453--460.Google ScholarCross Ref
Index Terms
- Accurate real-time occlusion for mixed reality
Recommendations
Fast depth densification for occlusion-aware augmented reality
Current AR systems only track sparse geometric features but do not compute depth for all pixels. For this reason, most AR effects are pure overlays that can never be occluded by real objects. We present a novel algorithm that propagates sparse depth to ...
Real time 3D avatar for interactive mixed reality
VRCAI '04: Proceedings of the 2004 ACM SIGGRAPH international conference on Virtual Reality continuum and its applications in industryThis paper presents real-time reconstruction of dynamic 3D avatar for interactive mixed reality. In computer graphics, one of the main goals is the combination of virtual scenes with real-world scenes. However, the views of the real world objects are ...
Real-time mixed-reality telepresence via 3D reconstruction with HoloLens and commodity depth sensors
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal InteractionWe present a demo of low-cost mixed reality telepresence system that performs real-time 3D reconstruction of a person or an object and wirelessly transmits the reconstructions to Microsoft's HoloLens head mounted display at frame rates perceived as ...
Comments