skip to main content
research-article

Sampling based scene-space video processing

Published: 27 July 2015 Publication History

Abstract

Many compelling video processing effects can be achieved if per-pixel depth information and 3D camera calibrations are known. However, the success of such methods is highly dependent on the accuracy of this "scene-space" information. We present a novel, sampling-based framework for processing video that enables high-quality scene-space video effects in the presence of inevitable errors in depth and camera pose estimation. Instead of trying to improve the explicit 3D scene representation, the key idea of our method is to exploit the high redundancy of approximate scene information that arises due to most scene points being visible multiple times across many frames of video. Based on this observation, we propose a novel pixel gathering and filtering approach. The gathering step is general and collects pixel samples in scene-space, while the filtering step is application-specific and computes a desired output video from the gathered sample sets. Our approach is easily parallelizable and has been implemented on GPU, allowing us to take full advantage of large volumes of video data and facilitating practical runtimes on HD video using a standard desktop computer. Our generic scene-space formulation is able to comprehensively describe a multitude of video processing applications such as denoising, deblurring, super resolution, object removal, computational shutter functions, and other scene-space camera effects. We present results for various casually captured, hand-held, moving, compressed, monocular videos depicting challenging scenes recorded in uncontrolled environments.

Supplementary Material

ZIP File (a67-klose.zip)
Supplemental files

References

[1]
Alexa, M., Behr, J., Cohen-Or, D., Fleishman, S., Levin, D., and Silva, C. T. 2003. Computing and rendering point set surfaces. TVCG.
[2]
Aubry, M., Paris, S., Hasinoff, S. W., Kautz, J., and Durand, F. 2014. Fast local Laplacian filters: Theory and applications. ACM Trans. Graphics.
[3]
Bhat, P., Zitnick, C. L., Snavely, N., Agarwala, A., Agrawala, M., Cohen, M. F., Curless, B., and Kang, S. B. 2007. Using photographs to enhance videos of a static scene. In EGSR.
[4]
Cho, S., Wang, J., and Lee, S. 2012. Video deblurring for hand-held cameras using patch-based synthesis. ACM Trans. Graphics (Proc. SIGGRAPH).
[5]
Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K. O. 2007. Image denoising by sparse 3D transform-domain collaborative filtering. Trans. Image Processing.
[6]
Furukawa, Y., and Ponce, J. 2010. Accurate, dense, and robust multiview stereopsis. TPAMI.
[7]
Gastal, E. S. L., and Oliveira, M. M. 2011. Domain transform for edge-aware image and video processing. ACM Trans. Graphics (Proc. SIGGRAPH).
[8]
Goesele, M., Ackermann, J., Fuhrmann, S., Haubold, C., Klowsky, R., Steedly, D., and Szeliski, R. 2010. Ambient point clouds for view interpolation. ACM Trans. Graphics (Proc. SIGGRAPH).
[9]
Google, 2015. Project Tango. https://www.google.com/atap/projecttango/#project.
[10]
Granados, M., Kim, K. I., Andgtango Jan Kautz, J. T., and Theobalt, C. 2012. Background inpainting for videos with dynamic objects and a free-moving camera. In ECCV.
[11]
Gupta, A., Bhat, P., Dontcheva, M., Curless, B., Deussen, O., and Cohen, M. 2009. Enhancing and experiencing spacetime resolution with videos and stills. In ICCP.
[12]
Infognition, 2015. Infognition superresolution plugin. http://www.infognition.com/super_resolution/.
[13]
Joo, H., Park, H. S., and Sheikh, Y. 2014. Map visibility estimation for large-scale dynamic 3D reconstruction. In CVPR.
[14]
Kholgade, N., Simon, T., Efros, A. A., and Sheikh, Y. 2014. 3D object manipulation in a single photograph using stock 3D models. ACM Trans. Graphics (Proc. SIGGRAPH).
[15]
Kolev, K., Klodt, M., Brox, T., and Cremers, D. 2009. Continuous global optimization in multiview 3D reconstruction. IJCV.
[16]
Kopf, J., Cohen, M. F., Lischinski, D., and Uyttendaele, M. 2007. Joint bilateral upsampling. ACM Trans. Graphics (Proc. SIGGRAPH).
[17]
Kopf, J., Cohen, M. F., and Szeliski, R. 2014. First-person hyper-lapse videos. ACM Trans. Graphics (Proc. SIGGRAPH).
[18]
Kuster, C., Bazin, J.-C., Öztireli, A. C., Deng, T., Martin, T., Popa, T., and Gross, M. 2014. Spatio-temporal geometry fusion for multiple hybrid cameras using moving least squares surfaces. CGF (Eurographics).
[19]
Lang, M., Wang, O., Aydin, T. O., Smolic, A., and Gross, M. 2012. Practical temporal consistency for image-based graphics applications. ACM Trans. Graphics (Proc. SIGGRAPH).
[20]
Lipski, C., Klose, F., and Magnor, M. A. 2014. Correspondence and depth-image based rendering a hybrid approach for free-viewpoint video. T-CSVT.
[21]
Newcombe, R. A., and Davison, A. J. 2010. Live dense reconstruction with a single moving camera. In CVPR.
[22]
Öztireli, A. C., Guennebaud, G., and Gross, M. 2009. Feature preserving point set surfaces based on non-linear kernel regression. CGF (Eurographics).
[23]
Paris, S., Kornprobst, P., Tumblin, J., and Durand, F. 2007. A gentle introduction to bilateral filtering and its applications. In ACM SIGGRAPH courses.
[24]
Pritch, Y., Rav-Acha, A., and Peleg, S. 2008. Nonchronological video synopsis and indexing. TPAMI.
[25]
Richardt, C., Stoll, C., Dodgson, N. A., Seidel, H., and Theobalt, C. 2012. Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos. CGF (Eurographics).
[26]
Scharstein, D., and Szeliski, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV.
[27]
Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., and Szeliski, R. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR.
[28]
Shum, H., Chan, S., and Kang, S. B. 2007. Image-based rendering. Springer.
[29]
Sun, J., Xu, Z., and Shum, H. 2008. Image super-resolution using gradient profile prior. In CVPR.
[30]
Sunkavalli, K., Joshi, N., Kang, S. B., Cohen, M. F., and Pfister, H. 2012. Video snapshots: Creating high-quality images from video clips. TVCG.
[31]
Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., and Pollefeys, M. 2013. Live metric 3D reconstruction on mobile phones. In ICCV.
[32]
Vaish, V., Garg, G., Talvala, E.-V., Antunez, E., Wilburn, B., Horowitz, M., and Levoy, M. 2005. Synthetic aperture focusing using a shear-warp factorization of the viewing transform. In CVPR Workshop.
[33]
Wilburn, B., Joshi, N., Vaish, V., Talvala, E., Antúnez, E. R., Barth, A., Adams, A., Horowitz, M., and Levoy, M. 2005. High performance imaging using large camera arrays. ACM Trans. Graphics (Proc. SIGGRAPH).
[34]
Zhang, G., Dong, Z., Jia, J., Wan, L., Wong, T.-T., and Bao, H. 2009. Refilming with depth-inferred videos. TVCG.
[35]
Zhang, G., Jia, J., Wong, T., and Bao, H. 2009. Consistent depth maps recovery from a video sequence. TPAMI.
[36]
Zhang, L., Vaddadi, S., Jin, H., and Nayar, S. K. 2009. Multiple view image denoising. In CVPR.
[37]
Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S. A. J., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. ACM Trans. Graphics (Proc. SIGGRAPH).
[38]
Zwicker, M., Pfister, H., van Baar, J., and Gross, M. 2001. Surface splatting. In SIGGRAPH.

Cited By

View all
  • (2024)ST-SAIL: Spatio-temporal Semantic Analysis of Light Fields: Optimizing the Sampling Pattern of Light Field Arrays2024 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE59016.2024.10444447(1-6)Online publication date: 6-Jan-2024
  • (2024)Depth-of-Field Region Detection and Recognition From a Single Image Using Adaptively Sampled Learning RepresentationIEEE Access10.1109/ACCESS.2024.337766712(42248-42263)Online publication date: 2024
  • (2023)VideoDoodles: Hand-Drawn Animations on Videos with Scene-Aware CanvasesACM Transactions on Graphics10.1145/359241342:4(1-12)Online publication date: 26-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 34, Issue 4
August 2015
1307 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2809654
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 July 2015
Published in TOG Volume 34, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computational shutters
  2. denoising
  3. inpainting
  4. sampling
  5. video processing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)ST-SAIL: Spatio-temporal Semantic Analysis of Light Fields: Optimizing the Sampling Pattern of Light Field Arrays2024 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE59016.2024.10444447(1-6)Online publication date: 6-Jan-2024
  • (2024)Depth-of-Field Region Detection and Recognition From a Single Image Using Adaptively Sampled Learning RepresentationIEEE Access10.1109/ACCESS.2024.337766712(42248-42263)Online publication date: 2024
  • (2023)VideoDoodles: Hand-Drawn Animations on Videos with Scene-Aware CanvasesACM Transactions on Graphics10.1145/359241342:4(1-12)Online publication date: 26-Jul-2023
  • (2023)Implicit View-Time Interpolation of Stereo Videos Using Multi-Plane Disparities and Non-Uniform Coordinates2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00092(888-898)Online publication date: Jun-2023
  • (2023)Synthetic Football Sprite Animations Learned Across the PitchAdvances in Computational Collective Intelligence10.1007/978-3-031-41774-0_48(610-618)Online publication date: 22-Sep-2023
  • (2023)Few-Shots Novel Space-Time View Synthesis from Consecutive PhotosThe 12th Conference on Information Technology and Its Applications10.1007/978-3-031-36886-8_20(240-249)Online publication date: 26-Jul-2023
  • (2021)Contiguous Loss for Motion-Based, Non-Aligned Image DeblurringSymmetry10.3390/sym1304063013:4(630)Online publication date: 9-Apr-2021
  • (2021)Object-Wise Video EditingApplied Sciences10.3390/app1102067111:2(671)Online publication date: 12-Jan-2021
  • (2021)Recursive Neural Network for Video DeblurringIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2020.303572231:8(3025-3036)Online publication date: Aug-2021
  • (2021)Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR46437.2021.00643(6494-6504)Online publication date: Jun-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media