research-article

Efficient 3D Object Segmentation from Densely Sampled Light Fields with Applications to 3D Reconstruction

Authors:

Alexander Sorkine-Hornung,

Olga Sorkine-HornungAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 35, Issue 3

Article No.: 22, Pages 1 - 15

https://doi.org/10.1145/2876504

Published: 15 March 2016 Publication History

Abstract

Precise object segmentation in image data is a fundamental problem with various applications, including 3D object reconstruction. We present an efficient algorithm to automatically segment a static foreground object from highly cluttered background in light fields. A key insight and contribution of our article is that a significant increase of the available input data can enable the design of novel, highly efficient approaches. In particular, the central idea of our method is to exploit high spatio-angular sampling on the order of thousands of input frames, for example, captured as a hand-held video, such that new structures are revealed due to the increased coherence in the data. We first show how purely local gradient information contained in slices of such a dense light field can be combined with information about the camera trajectory to make efficient estimates of the foreground and background. These estimates are then propagated to textureless regions using edge-aware filtering in the epipolar volume. Finally, we enforce global consistency in a gathering step to derive a precise object segmentation in both 2D and 3D space, which captures fine geometric details even in very cluttered scenes. The design of each of these steps is motivated by efficiency and scalability, allowing us to handle large, real-world video datasets on a standard desktop computer. We demonstrate how the results of our method can be used for considerably improving the speed and quality of image-based 3D reconstruction algorithms, and we compare our results to state-of-the-art segmentation and multiview stereo methods.

Supplementary Material

yucer (yucer.zip)

Supplemental movie, appendix, image and software files for, Efficient 3D Object Segmentation from Densely Sampled Light Fields with Applications to 3D Reconstruction

Download
217.20 MB

References

[1]

Xiaobo An and Fabio Pellacini. 2008. AppProp: All-pairs appearance-space edit propagation. ACM Trans. Graphics 27, 3, 40:1--40:9.

Digital Library

[2]

Nicholas Apostoloff and Andrew W. Fitzgibbon. 2006. Automatic video segmentation using spatiotemporal t-junctions. In Proceedings of the British Machine Vision Conference. 1089--1098.

[3]

Jesse Berent and Pier Luigi Dragotti. 2007. Plenoptic manifolds -- exploiting structure and coherence in multiview images. IEEE Signal Proc. Mag. 24, 7, 34--44.

[4]

Robert C. Bolles, H. Harlyn Baker, and David H. Marimont. 1987. Epipolar-plane image analysis: An approach to determining structure from motion. Int. J. Comput. Vision 1, 1, 7--55.

[5]

Adam Bowen, Andrew Mullins, Roland Wilson, and Nasir Rajpoot. 2007. Bayesian surface estimation from multiple cameras using a prior based on the visual hull and its application to image based rendering. In Proceedings of the British Machine Vision Conference. 1--8.

[6]

Yuri Boykov and Marie-Pierre Jolly. 2001. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In Proceedings of the IEEE International Conference on Computer Vision. 105--112.

[7]

Derek Bradley, Tamy Boubekeur, and Wolfgang Heidrich. 2008. Accurate multi-view reconstruction using robust binocular stereo and surface meshing. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1--8.

[8]

Chris Buehler, Michael Bosse, Leonard McMillan, Steven Gortler, and Michael Cohen. 2001. Unstructured lumigraph rendering. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. 425--432.

Digital Library

[9]

N. D. F. Campbell, G. Vogiatzis, C. Hernández, and R. Cipolla. 2010. Automatic 3D object segmentation in multiple views using volumetric graph-cuts. Image Vision Comput. 28, 1, 14--25.

Digital Library

[10]

Neill D. F. Campbell, George Vogiatzis, Carlos Hernandez, and Roberto Cipolla. 2011. Automatic object segmentation from calibrated images. In Proceedings of the European Conference on Visual Media Production. 126--137.

Digital Library

[11]

Can Chen, Haiting Lin, Zhan Yu, Sing Bing Kang, and Jingyi Yu. 2014. Light field stereo matching using bilateral statistics of surface cameras. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1518--1525.

Digital Library

[12]

Yung-Yu Chuang, Aseem Agarwala, Brian Curless, David H. Salesin, and Richard Szeliski. 2002. Video matting of complex scenes. ACM Trans. Graphics 21, 3, 243--248.

Digital Library

[13]

Antonio Criminisi, Sing Bing Kang, Rahul Swaminathan, Richard Szeliski, and P. Anandan. 2005. Extracting layers and analyzing their specular properties using epipolar-plane-image analysis. Comput. Vision Image Understanding 97, 1, 51--85.

Digital Library

[14]

Abe Davis, Marc Levoy, and Fredo Durand. 2012. Unstructured light fields. Comput. Graphics Forum 31, 2, 305--314.

Digital Library

[15]

Elmar Eisemann and Frédo Durand. 2004. Flash photography enhancement via intrinsic relighting. ACM Trans. Graphics 23, 3, 673--678.

Digital Library

[16]

Martin Eisemann, Bert de Decker, Marcus A. Magnor, Philippe Bekaert, Edilson de Aguiar, Naveed Ahmed, Christian Theobalt, and Anita Sellent. 2008. Floating textures. Comput. Graphics Forum 27, 2, 409--418.

[17]

Ingo Feldmann, Peter Kauff, and Peter Eisert. 2003a. Extension of epipolar image analysis to circular camera movements. In Proceedings of the International Conference on Image Processing. 697--700.

[18]

Ingo Feldmann, Peter Kauff, and Peter Eisert. 2003b. Image cube trajectory analysis for 3D reconstruction of concentric mosaics. In Proceedings of the International Conference on Vision, Modeling and Visualization. 569--576.

[19]

Jean-Sébastien Franco and Edmond Boyer. 2005. Fusion of multi-view silhouette cues using a space occupancy grid. In Proceedings of the IEEE International Conference on Computer Vision. 1747--1753.

Digital Library

[20]

Simon Fuhrmann and Michael Goesele. 2014. Floating scale surface reconstruction. ACM Trans. Graphics 33, 4, 46:1--46:11.

Digital Library

[21]

Yasutaka Furukawa, Brian Curless, Steven M. Seitz, and Richard Szeliski. 2010. Towards internet-scale multi-view stereo. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1434--1441.

[22]

Yasutaka Furukawa and Jean Ponce. 2009. Carved visual hulls for image-based modeling. Int. J. Comput. Vision 81, 1, 53--67.

Digital Library

[23]

Yasutaka Furukawa and Jean Ponce. 2010. Accurate, dense, and robust multi-view stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32, 8, 1362--1376.

Digital Library

[24]

Eduardo S. L. Gastal and Manuel M. Oliveira. 2011. Domain transform for edge-aware image and video processing. ACM Trans. Graphics 30, 4, 69:1--69:12.

Digital Library

[25]

Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M. Seitz. 2007. Multi-view stereo for community photo collections. In Proceedings of the IEEE International Conference on Computer Vision. 1--8.

[26]

Bastian Goldlücke and Marcus A. Magnor. 2003. Joint 3D-reconstruction and background separation in multiple views using graph cuts. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 683--688.

Digital Library

[27]

Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F. Cohen. 1996. The lumigraph. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques. 43--54.

Digital Library

[28]

Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell. 2003. A Bayesian approach to image-based visual hull reconstruction. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 187--194.

Digital Library

[29]

Matthias Grundmann, Vivek Kwatra, Mei Han, and Irfan A. Essa. 2010. Efficient hierarchical graph-based video segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2141--2148.

[30]

Jean-Yves Guillemaut and Adrian Hilton. 2011. Joint multi-layer segmentation and reconstruction for free-viewpoint video applications. Int. J. Comput. Vision 93, 1, 73--100.

Digital Library

[31]

Christian Hane, Christopher Zach, Andrea Cohen, Roland Angst, and Marc Pollefeys. 2013. Joint 3D scene reconstruction and class segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 97--104.

Digital Library

[32]

Heiko Hirschmüller. 2006. Stereo vision in structured environments by consistent semi-global matching. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2386--2393.

Digital Library

[33]

John Isidoro and Stan Sclaroff. 2003. Stochastic refinement of the visual hull to satisfy photometric and silhouette consistency constraints. In Proceedings of the IEEE International Conference on Computer Vision. 1335--1342.

Digital Library

[34]

Armand Joulin, Francis R. Bach, and Jean Ponce. 2010. Discriminative clustering for image co-segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1943--1950.

[35]

Michael M. Kazhdan and Hugues Hoppe. 2013. Screened Poisson surface reconstruction. ACM Trans. Graphics 32, 3, 29:1--29:13.

Digital Library

[36]

Changil Kim, Henning Zimmer, Yael Pritch, Alexander Sorkine-Hornung, and Markus Gross. 2013. Scene reconstruction from high spatio-angular resolution light fields. ACM Trans. Graphics 32, 4, 73:1--73:12.

Digital Library

[37]

Kalin Kolev, Thomas Brox, and Daniel Cremers. 2006. Robust variational segmentation of 3D objects from multiple views. In Proceedings of the DAGM Symposium. 688--697.

Digital Library

[38]

Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matthew Uyttendaele. 2007. Joint bilateral upsampling. ACM Trans. Graphics 26, 3, 96:1--96:5.

Digital Library

[39]

Adarsh Kowdle, Sudipta N. Sinha, and Richard Szeliski. 2012. Multiple view object cosegmentation using appearance and stereo cues. In Proceedings of the European Conference on Computer Vision. 789--803.

Digital Library

[40]

Philipp Krähenbühl and Vladlen Koltun. 2012. Efficient inference in fully connected CRFs with Gaussian edge potentials. In Proceedings of the Annual Conference on Neural Information Processing Systems. 109--117.

[41]

Kiriakos N. Kutulakos. 1997. Shape from the light field boundary. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 53--59.

Digital Library

[42]

Manuel Lang, Oliver Wang, Tunc Aydin, Aljoscha Smolic, and Markus Gross. 2012. Practical temporal consistency for image-based graphics applications. ACM Trans. Graphics 31, 4, 34:1--34:8.

Digital Library

[43]

A. Laurentini. 1994. The visual hull concept for silhouette-based image understanding. IEEE TPAMI 16, 2, 150--162.

Digital Library

[44]

Wonwoo Lee, Woontack Woo, and Edmond Boyer. 2011. Silhouette segmentation in multiple views. IEEE TPAMI 33, 7, 1429--1441.

Digital Library

[45]

Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques. 31--42.

Digital Library

[46]

J. Lezama, Karteek Alahari, Josef Sivic, and Ivan Laptev. 2011. Track to the future: Spatio-temporal video segmentation with long-range motion cues. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3369--3376.

Digital Library

[47]

Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, and James M. Rehg. 2013. Video segmentation by tracking many figure-ground segments. In Proceedings of the IEEE International Conference on Computer Vision. 2192--2199.

Digital Library

[48]

Yin Li, Jian Sun, and Heung-Yeung Shum. 2005. Video object cut and paste. ACM Trans. Graphics 24, 3, 595--600.

Digital Library

[49]

Worthy N. Martin and J. K. Aggarwal. 1983. Volumetric descriptions of objects from multiple views. IEEE TPAMI 5, 2, 150--158.

Digital Library

[50]

Wojciech Matusik, Chris Buehler, Ramesh Raskar, Steven J. Gortler, and Leonard McMillan. 2000. Image-based visual hulls. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. 369--374.

Digital Library

[51]

M. R. Oswald and D. Cremers. 2013. A convex relaxation approach to space time multi-view 3D reconstruction. In Proceedings of the ICCV Workshop on Dynamic Shape Capture and Analysis (4DMOD’13). 291--298.

Digital Library

[52]

Sylvain Paris, Pierre Kornprobst, Jack Tumblin, and Frédo Durand. 2007. A gentle introduction to bilateral filtering and its applications. In ACM SIGGRAPH 2007 Courses. 1--50.

Digital Library

[53]

Georg Petschnigg, Richard Szeliski, Maneesh Agrawala, Michael Cohen, Hugues Hoppe, and Kentaro Toyama. 2004. Digital photography with flash and no-flash image pairs. ACM Trans. Graphics 23, 3, 664--672.

Digital Library

[54]

Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graphics 23, 3, 309--314.

Digital Library

[55]

Qi Shan, Brian Curless, Yasutaka Furukawa, Carlos Hernandez, and Steven M. Seitz. 2014. Occluding contours for multi-view stereo. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4002--4009.

Digital Library

[56]

Sudipta N. Sinha and Marc Pollefeys. 2005. Multi-view reconstruction using photo-consistency and exact silhouette constraints: A maximum-flow formulation. In Proceedings of the IEEE International Conference on Computer Vision. 349--356.

Digital Library

[57]

Dan Snow, Paul Viola, and Ramin Zabih. 2000. Exact voxel occupancy with graph cuts. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 345--352.

[58]

Jonathan Starck, Gregor Miller, and Adrian Hilton. 2006. Volumetric stereo with silhouette and feature constraints. In Proceedings of the British Machine Vision Conference. 1189--1198.

[59]

Richard Szeliski. 1993. Rapid octree construction from image sequences. CVGIP: Image Underst. 58, 1, 23--32.

Digital Library

[60]

Amy Tabb. 2013. Shape from silhouette probability maps: Reconstruction of thin objects in the presence of silhouette extraction and calibration error. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 161--168.

Digital Library

[61]

George Vogiatzis, Philip H. S. Torr, and Roberto Cipolla. 2005. Multi-view stereo via volumetric graph-cuts. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 391--398.

Digital Library

[62]

Jue Wang, Pravin Bhat, R. Alex Colburn, Maneesh Agrawala, and Michael F. Cohen. 2005. Interactive video cutout. ACM Trans. Graphics 24, 3, 585--594.

Digital Library

[63]

S. Wanner and B. Goldluecke. 2012. Globally consistent depth labeling of 4D lightfields. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 41--48.

Digital Library

[64]

Sven Wanner, Christoph N. Straehle, and Bastian Goldluecke. 2013. Globally consistent multi-label assignment on the ray space of 4D light fields. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1011--1018.

Digital Library

[65]

Changchang Wu. 2013. Towards linear-time incremental structure from motion. In Proceedings of the 3DTV Conference. 127--134.

Digital Library

[66]

Anthony J. Yezzi and Stefano Soatto. 2001. Stereoscopic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 59--66.

[67]

Zhan Yu, Xinqing Guo, Haibin Ling, Andrew Lumsdaine, and Jingyi Yu. 2013. Line assisted light field triangulation and stereo matching. In Proceedings of the IEEE International Conference on Computer Vision. 2792--2799.

Digital Library

[68]

Guofeng Zhang, Jiaya Jia, Tien-Tsin Wong, and Hujun Bao. 2009. Consistent depth maps recovery from a video sequence. IEEE Trans. Pattern Anal. Mach. Intell. 31, 6, 974--988.

Digital Library

Cited By

Li YLi RLi ZGuo RTang S(2025)OptiViewNeRF: Optimizing 3D reconstruction via batch view selection and scene uncertainty in Neural Radiance FieldsInternational Journal of Applied Earth Observation and Geoinformation10.1016/j.jag.2024.104306136(104306)Online publication date: Feb-2025
https://doi.org/10.1016/j.jag.2024.104306
Chen MWang LLei YDong ZGuo Y(2024)Learning Spherical Radiance Field for Efficient 360° Unbounded Novel View SynthesisIEEE Transactions on Image Processing10.1109/TIP.2024.340905233(3722-3734)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3409052
Li LZhang J(2024) L 0 -Sampler: An L 0 Model Guided Volume Sampling for NeRF 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02021(21390-21400)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02021
Show More Cited By

Index Terms

Efficient 3D Object Segmentation from Densely Sampled Light Fields with Applications to 3D Reconstruction
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Physical simulation
    2. Shape modeling
2. Mathematics of computing
  1. Continuous mathematics
    1. Topology
      1. Geometric topology

Recommendations

MVE-An image-based reconstruction environment

We present an image-based reconstruction system, the Multi-View Environment. MVE is an end-to-end multi-view geometry reconstruction software which takes photos of a scene as input and produces a textured surface mesh as result. The system covers a ...
Camera Network Calibration and Synchronization from Silhouettes in Archived Video

In this paper we present an automatic method for calibrating a network of cameras that works by analyzing only the motion of silhouettes in the multiple video streams. This is particularly useful for automatic reconstruction of a dynamic event using a ...
Wide-baseline multi-view video segmentation for 3D reconstruction
3DVP '10: Proceedings of the 1st international workshop on 3D video processing

Obtaining a foreground silhouette across multiple views is one of the fundamental steps in 3D reconstruction. In this paper we present a novel video segmentation approach, to obtain a foreground silhouette, for scenes captured by a wide-baseline camera ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 35, Issue 3

June 2016

128 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/2903775

Editor:
Kavita Bala
Cornell University

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 March 2016

Accepted: 01 January 2016

Revised: 01 January 2016

Received: 01 September 2015

Published in TOG Volume 35, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

66
Total Citations
View Citations
1,365
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)7

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YLi RLi ZGuo RTang S(2025)OptiViewNeRF: Optimizing 3D reconstruction via batch view selection and scene uncertainty in Neural Radiance FieldsInternational Journal of Applied Earth Observation and Geoinformation10.1016/j.jag.2024.104306136(104306)Online publication date: Feb-2025
https://doi.org/10.1016/j.jag.2024.104306
Chen MWang LLei YDong ZGuo Y(2024)Learning Spherical Radiance Field for Efficient 360° Unbounded Novel View SynthesisIEEE Transactions on Image Processing10.1109/TIP.2024.340905233(3722-3734)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3409052
Li LZhang J(2024) L 0 -Sampler: An L 0 Model Guided Volume Sampling for NeRF 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02021(21390-21400)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02021
Salem AElkady EIbrahem HSuh JKang H(2024)Light Field Reconstruction With Dual Features Extraction and Macro-Pixel UpsamplingIEEE Access10.1109/ACCESS.2024.344659212(121624-121634)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3446592
Elkady ESalem AKang HSuh J(2024)Reconstructing angular light field by learning spatial features from quadrilateral epipolar geometryScientific Reports10.1038/s41598-024-81296-z14:1Online publication date: 30-Nov-2024
https://doi.org/10.1038/s41598-024-81296-z
Jiang WLei BDaniilidis K(2024)FisherRF: Active View Selection and Mapping with Radiance Fields Using Fisher InformationComputer Vision – ECCV 202410.1007/978-3-031-72624-8_24(422-440)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72624-8_24
Liu QLi RYan KWang YLuo Y(2024)An Improved 4D Convolutional Neural Network for Light Field ReconstructionMobile Networks and Management10.1007/978-3-031-55471-1_9(108-120)Online publication date: 17-Mar-2024
https://doi.org/10.1007/978-3-031-55471-1_9
Xu SShi S(2023)Analysis of error propagation: from raw light-field data to depth estimationApplied Optics10.1364/AO.50089762:33(8704)Online publication date: 13-Nov-2023
https://doi.org/10.1364/AO.500897
Liu FBai XHou JZhang QYan TWang BYuan F(2023)Light field reconstruction with decoupled fusion and angular attention mechanismJournal of Electronic Imaging10.1117/1.JEI.32.6.06302932:06Online publication date: 1-Nov-2023
https://doi.org/10.1117/1.JEI.32.6.063029
Song ZWang XZhu HZhou GWang Q(2023)Learning Reliable Gradients From Undersampled Circular Light Field for 3D ReconstructionIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.320620729:12(5194-5207)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TVCG.2022.3206207
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents