skip to main content
research-article

Efficient 3D Object Segmentation from Densely Sampled Light Fields with Applications to 3D Reconstruction

Published: 15 March 2016 Publication History

Abstract

Precise object segmentation in image data is a fundamental problem with various applications, including 3D object reconstruction. We present an efficient algorithm to automatically segment a static foreground object from highly cluttered background in light fields. A key insight and contribution of our article is that a significant increase of the available input data can enable the design of novel, highly efficient approaches. In particular, the central idea of our method is to exploit high spatio-angular sampling on the order of thousands of input frames, for example, captured as a hand-held video, such that new structures are revealed due to the increased coherence in the data. We first show how purely local gradient information contained in slices of such a dense light field can be combined with information about the camera trajectory to make efficient estimates of the foreground and background. These estimates are then propagated to textureless regions using edge-aware filtering in the epipolar volume. Finally, we enforce global consistency in a gathering step to derive a precise object segmentation in both 2D and 3D space, which captures fine geometric details even in very cluttered scenes. The design of each of these steps is motivated by efficiency and scalability, allowing us to handle large, real-world video datasets on a standard desktop computer. We demonstrate how the results of our method can be used for considerably improving the speed and quality of image-based 3D reconstruction algorithms, and we compare our results to state-of-the-art segmentation and multiview stereo methods.

Supplementary Material

yucer (yucer.zip)
Supplemental movie, appendix, image and software files for, Efficient 3D Object Segmentation from Densely Sampled Light Fields with Applications to 3D Reconstruction

References

[1]
Xiaobo An and Fabio Pellacini. 2008. AppProp: All-pairs appearance-space edit propagation. ACM Trans. Graphics 27, 3, 40:1--40:9.
[2]
Nicholas Apostoloff and Andrew W. Fitzgibbon. 2006. Automatic video segmentation using spatiotemporal t-junctions. In Proceedings of the British Machine Vision Conference. 1089--1098.
[3]
Jesse Berent and Pier Luigi Dragotti. 2007. Plenoptic manifolds -- exploiting structure and coherence in multiview images. IEEE Signal Proc. Mag. 24, 7, 34--44.
[4]
Robert C. Bolles, H. Harlyn Baker, and David H. Marimont. 1987. Epipolar-plane image analysis: An approach to determining structure from motion. Int. J. Comput. Vision 1, 1, 7--55.
[5]
Adam Bowen, Andrew Mullins, Roland Wilson, and Nasir Rajpoot. 2007. Bayesian surface estimation from multiple cameras using a prior based on the visual hull and its application to image based rendering. In Proceedings of the British Machine Vision Conference. 1--8.
[6]
Yuri Boykov and Marie-Pierre Jolly. 2001. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In Proceedings of the IEEE International Conference on Computer Vision. 105--112.
[7]
Derek Bradley, Tamy Boubekeur, and Wolfgang Heidrich. 2008. Accurate multi-view reconstruction using robust binocular stereo and surface meshing. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1--8.
[8]
Chris Buehler, Michael Bosse, Leonard McMillan, Steven Gortler, and Michael Cohen. 2001. Unstructured lumigraph rendering. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. 425--432.
[9]
N. D. F. Campbell, G. Vogiatzis, C. Hernández, and R. Cipolla. 2010. Automatic 3D object segmentation in multiple views using volumetric graph-cuts. Image Vision Comput. 28, 1, 14--25.
[10]
Neill D. F. Campbell, George Vogiatzis, Carlos Hernandez, and Roberto Cipolla. 2011. Automatic object segmentation from calibrated images. In Proceedings of the European Conference on Visual Media Production. 126--137.
[11]
Can Chen, Haiting Lin, Zhan Yu, Sing Bing Kang, and Jingyi Yu. 2014. Light field stereo matching using bilateral statistics of surface cameras. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1518--1525.
[12]
Yung-Yu Chuang, Aseem Agarwala, Brian Curless, David H. Salesin, and Richard Szeliski. 2002. Video matting of complex scenes. ACM Trans. Graphics 21, 3, 243--248.
[13]
Antonio Criminisi, Sing Bing Kang, Rahul Swaminathan, Richard Szeliski, and P. Anandan. 2005. Extracting layers and analyzing their specular properties using epipolar-plane-image analysis. Comput. Vision Image Understanding 97, 1, 51--85.
[14]
Abe Davis, Marc Levoy, and Fredo Durand. 2012. Unstructured light fields. Comput. Graphics Forum 31, 2, 305--314.
[15]
Elmar Eisemann and Frédo Durand. 2004. Flash photography enhancement via intrinsic relighting. ACM Trans. Graphics 23, 3, 673--678.
[16]
Martin Eisemann, Bert de Decker, Marcus A. Magnor, Philippe Bekaert, Edilson de Aguiar, Naveed Ahmed, Christian Theobalt, and Anita Sellent. 2008. Floating textures. Comput. Graphics Forum 27, 2, 409--418.
[17]
Ingo Feldmann, Peter Kauff, and Peter Eisert. 2003a. Extension of epipolar image analysis to circular camera movements. In Proceedings of the International Conference on Image Processing. 697--700.
[18]
Ingo Feldmann, Peter Kauff, and Peter Eisert. 2003b. Image cube trajectory analysis for 3D reconstruction of concentric mosaics. In Proceedings of the International Conference on Vision, Modeling and Visualization. 569--576.
[19]
Jean-Sébastien Franco and Edmond Boyer. 2005. Fusion of multi-view silhouette cues using a space occupancy grid. In Proceedings of the IEEE International Conference on Computer Vision. 1747--1753.
[20]
Simon Fuhrmann and Michael Goesele. 2014. Floating scale surface reconstruction. ACM Trans. Graphics 33, 4, 46:1--46:11.
[21]
Yasutaka Furukawa, Brian Curless, Steven M. Seitz, and Richard Szeliski. 2010. Towards internet-scale multi-view stereo. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1434--1441.
[22]
Yasutaka Furukawa and Jean Ponce. 2009. Carved visual hulls for image-based modeling. Int. J. Comput. Vision 81, 1, 53--67.
[23]
Yasutaka Furukawa and Jean Ponce. 2010. Accurate, dense, and robust multi-view stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32, 8, 1362--1376.
[24]
Eduardo S. L. Gastal and Manuel M. Oliveira. 2011. Domain transform for edge-aware image and video processing. ACM Trans. Graphics 30, 4, 69:1--69:12.
[25]
Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M. Seitz. 2007. Multi-view stereo for community photo collections. In Proceedings of the IEEE International Conference on Computer Vision. 1--8.
[26]
Bastian Goldlücke and Marcus A. Magnor. 2003. Joint 3D-reconstruction and background separation in multiple views using graph cuts. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 683--688.
[27]
Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F. Cohen. 1996. The lumigraph. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques. 43--54.
[28]
Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell. 2003. A Bayesian approach to image-based visual hull reconstruction. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 187--194.
[29]
Matthias Grundmann, Vivek Kwatra, Mei Han, and Irfan A. Essa. 2010. Efficient hierarchical graph-based video segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2141--2148.
[30]
Jean-Yves Guillemaut and Adrian Hilton. 2011. Joint multi-layer segmentation and reconstruction for free-viewpoint video applications. Int. J. Comput. Vision 93, 1, 73--100.
[31]
Christian Hane, Christopher Zach, Andrea Cohen, Roland Angst, and Marc Pollefeys. 2013. Joint 3D scene reconstruction and class segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 97--104.
[32]
Heiko Hirschmüller. 2006. Stereo vision in structured environments by consistent semi-global matching. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2386--2393.
[33]
John Isidoro and Stan Sclaroff. 2003. Stochastic refinement of the visual hull to satisfy photometric and silhouette consistency constraints. In Proceedings of the IEEE International Conference on Computer Vision. 1335--1342.
[34]
Armand Joulin, Francis R. Bach, and Jean Ponce. 2010. Discriminative clustering for image co-segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1943--1950.
[35]
Michael M. Kazhdan and Hugues Hoppe. 2013. Screened Poisson surface reconstruction. ACM Trans. Graphics 32, 3, 29:1--29:13.
[36]
Changil Kim, Henning Zimmer, Yael Pritch, Alexander Sorkine-Hornung, and Markus Gross. 2013. Scene reconstruction from high spatio-angular resolution light fields. ACM Trans. Graphics 32, 4, 73:1--73:12.
[37]
Kalin Kolev, Thomas Brox, and Daniel Cremers. 2006. Robust variational segmentation of 3D objects from multiple views. In Proceedings of the DAGM Symposium. 688--697.
[38]
Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matthew Uyttendaele. 2007. Joint bilateral upsampling. ACM Trans. Graphics 26, 3, 96:1--96:5.
[39]
Adarsh Kowdle, Sudipta N. Sinha, and Richard Szeliski. 2012. Multiple view object cosegmentation using appearance and stereo cues. In Proceedings of the European Conference on Computer Vision. 789--803.
[40]
Philipp Krähenbühl and Vladlen Koltun. 2012. Efficient inference in fully connected CRFs with Gaussian edge potentials. In Proceedings of the Annual Conference on Neural Information Processing Systems. 109--117.
[41]
Kiriakos N. Kutulakos. 1997. Shape from the light field boundary. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 53--59.
[42]
Manuel Lang, Oliver Wang, Tunc Aydin, Aljoscha Smolic, and Markus Gross. 2012. Practical temporal consistency for image-based graphics applications. ACM Trans. Graphics 31, 4, 34:1--34:8.
[43]
A. Laurentini. 1994. The visual hull concept for silhouette-based image understanding. IEEE TPAMI 16, 2, 150--162.
[44]
Wonwoo Lee, Woontack Woo, and Edmond Boyer. 2011. Silhouette segmentation in multiple views. IEEE TPAMI 33, 7, 1429--1441.
[45]
Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques. 31--42.
[46]
J. Lezama, Karteek Alahari, Josef Sivic, and Ivan Laptev. 2011. Track to the future: Spatio-temporal video segmentation with long-range motion cues. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3369--3376.
[47]
Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, and James M. Rehg. 2013. Video segmentation by tracking many figure-ground segments. In Proceedings of the IEEE International Conference on Computer Vision. 2192--2199.
[48]
Yin Li, Jian Sun, and Heung-Yeung Shum. 2005. Video object cut and paste. ACM Trans. Graphics 24, 3, 595--600.
[49]
Worthy N. Martin and J. K. Aggarwal. 1983. Volumetric descriptions of objects from multiple views. IEEE TPAMI 5, 2, 150--158.
[50]
Wojciech Matusik, Chris Buehler, Ramesh Raskar, Steven J. Gortler, and Leonard McMillan. 2000. Image-based visual hulls. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. 369--374.
[51]
M. R. Oswald and D. Cremers. 2013. A convex relaxation approach to space time multi-view 3D reconstruction. In Proceedings of the ICCV Workshop on Dynamic Shape Capture and Analysis (4DMOD’13). 291--298.
[52]
Sylvain Paris, Pierre Kornprobst, Jack Tumblin, and Frédo Durand. 2007. A gentle introduction to bilateral filtering and its applications. In ACM SIGGRAPH 2007 Courses. 1--50.
[53]
Georg Petschnigg, Richard Szeliski, Maneesh Agrawala, Michael Cohen, Hugues Hoppe, and Kentaro Toyama. 2004. Digital photography with flash and no-flash image pairs. ACM Trans. Graphics 23, 3, 664--672.
[54]
Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graphics 23, 3, 309--314.
[55]
Qi Shan, Brian Curless, Yasutaka Furukawa, Carlos Hernandez, and Steven M. Seitz. 2014. Occluding contours for multi-view stereo. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4002--4009.
[56]
Sudipta N. Sinha and Marc Pollefeys. 2005. Multi-view reconstruction using photo-consistency and exact silhouette constraints: A maximum-flow formulation. In Proceedings of the IEEE International Conference on Computer Vision. 349--356.
[57]
Dan Snow, Paul Viola, and Ramin Zabih. 2000. Exact voxel occupancy with graph cuts. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 345--352.
[58]
Jonathan Starck, Gregor Miller, and Adrian Hilton. 2006. Volumetric stereo with silhouette and feature constraints. In Proceedings of the British Machine Vision Conference. 1189--1198.
[59]
Richard Szeliski. 1993. Rapid octree construction from image sequences. CVGIP: Image Underst. 58, 1, 23--32.
[60]
Amy Tabb. 2013. Shape from silhouette probability maps: Reconstruction of thin objects in the presence of silhouette extraction and calibration error. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 161--168.
[61]
George Vogiatzis, Philip H. S. Torr, and Roberto Cipolla. 2005. Multi-view stereo via volumetric graph-cuts. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 391--398.
[62]
Jue Wang, Pravin Bhat, R. Alex Colburn, Maneesh Agrawala, and Michael F. Cohen. 2005. Interactive video cutout. ACM Trans. Graphics 24, 3, 585--594.
[63]
S. Wanner and B. Goldluecke. 2012. Globally consistent depth labeling of 4D lightfields. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 41--48.
[64]
Sven Wanner, Christoph N. Straehle, and Bastian Goldluecke. 2013. Globally consistent multi-label assignment on the ray space of 4D light fields. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1011--1018.
[65]
Changchang Wu. 2013. Towards linear-time incremental structure from motion. In Proceedings of the 3DTV Conference. 127--134.
[66]
Anthony J. Yezzi and Stefano Soatto. 2001. Stereoscopic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 59--66.
[67]
Zhan Yu, Xinqing Guo, Haibin Ling, Andrew Lumsdaine, and Jingyi Yu. 2013. Line assisted light field triangulation and stereo matching. In Proceedings of the IEEE International Conference on Computer Vision. 2792--2799.
[68]
Guofeng Zhang, Jiaya Jia, Tien-Tsin Wong, and Hujun Bao. 2009. Consistent depth maps recovery from a video sequence. IEEE Trans. Pattern Anal. Mach. Intell. 31, 6, 974--988.

Cited By

View all
  • (2025)OptiViewNeRF: Optimizing 3D reconstruction via batch view selection and scene uncertainty in Neural Radiance FieldsInternational Journal of Applied Earth Observation and Geoinformation10.1016/j.jag.2024.104306136(104306)Online publication date: Feb-2025
  • (2024)Learning Spherical Radiance Field for Efficient 360° Unbounded Novel View SynthesisIEEE Transactions on Image Processing10.1109/TIP.2024.340905233(3722-3734)Online publication date: 10-Jun-2024
  • (2024) L 0 -Sampler: An L 0 Model Guided Volume Sampling for NeRF 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02021(21390-21400)Online publication date: 16-Jun-2024
  • Show More Cited By

Index Terms

  1. Efficient 3D Object Segmentation from Densely Sampled Light Fields with Applications to 3D Reconstruction

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 35, Issue 3
        June 2016
        128 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/2903775
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 15 March 2016
        Accepted: 01 January 2016
        Revised: 01 January 2016
        Received: 01 September 2015
        Published in TOG Volume 35, Issue 3

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. 3D object and image segmentation
        2. image-based reconstruction
        3. light field processing
        4. visual hulls

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)40
        • Downloads (Last 6 weeks)7
        Reflects downloads up to 01 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)OptiViewNeRF: Optimizing 3D reconstruction via batch view selection and scene uncertainty in Neural Radiance FieldsInternational Journal of Applied Earth Observation and Geoinformation10.1016/j.jag.2024.104306136(104306)Online publication date: Feb-2025
        • (2024)Learning Spherical Radiance Field for Efficient 360° Unbounded Novel View SynthesisIEEE Transactions on Image Processing10.1109/TIP.2024.340905233(3722-3734)Online publication date: 10-Jun-2024
        • (2024) L 0 -Sampler: An L 0 Model Guided Volume Sampling for NeRF 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02021(21390-21400)Online publication date: 16-Jun-2024
        • (2024)Light Field Reconstruction With Dual Features Extraction and Macro-Pixel UpsamplingIEEE Access10.1109/ACCESS.2024.344659212(121624-121634)Online publication date: 2024
        • (2024)Reconstructing angular light field by learning spatial features from quadrilateral epipolar geometryScientific Reports10.1038/s41598-024-81296-z14:1Online publication date: 30-Nov-2024
        • (2024)FisherRF: Active View Selection and Mapping with Radiance Fields Using Fisher InformationComputer Vision – ECCV 202410.1007/978-3-031-72624-8_24(422-440)Online publication date: 29-Sep-2024
        • (2024)An Improved 4D Convolutional Neural Network for Light Field ReconstructionMobile Networks and Management10.1007/978-3-031-55471-1_9(108-120)Online publication date: 17-Mar-2024
        • (2023)Analysis of error propagation: from raw light-field data to depth estimationApplied Optics10.1364/AO.50089762:33(8704)Online publication date: 13-Nov-2023
        • (2023)Light field reconstruction with decoupled fusion and angular attention mechanismJournal of Electronic Imaging10.1117/1.JEI.32.6.06302932:06Online publication date: 1-Nov-2023
        • (2023)Learning Reliable Gradients From Undersampled Circular Light Field for 3D ReconstructionIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.320620729:12(5194-5207)Online publication date: 1-Dec-2023
        • Show More Cited By

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media