research-article

Open Access

LFSphereNet: Real Time Spherical Light Field Reconstruction from a Single Omnidirectional Image

Authors:
Manu Gond

Mid Sweden University, SE

Mid Sweden University, SE

0009-0006-9845-1652
View Profile

,
Emin Zerman

Mid Sweden University, SE

Mid Sweden University, SE

0000-0002-3210-8978
View Profile

,
Sebastian Knorr

Ernst-Abbe University of Applied Sciences, DE

Ernst-Abbe University of Applied Sciences, DE

0000-0001-9745-8605
View Profile

,
Mårten Sjöström

Mid Sweden University, SE

Mid Sweden University, SE

0000-0003-3751-6089
View Profile

CVMP '23: Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media ProductionNovember 2023Article No.: 10Pages 1–10https://doi.org/10.1145/3626495.3626500

Published:30 November 2023Publication History

CVMP '23: Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production

Pages 1–10

ABSTRACT

Recent developments in immersive imaging technologies have enabled improved telepresence applications. Being fully matured in the commercial sense, omnidirectional (360-degree) content provides full vision around the camera with three degrees of freedom (3DoF). Considering the applications in real-time immersive telepresence, this paper investigates how a single omnidirectional image (ODI) can be used to extend 3DoF to 6DoF. To achieve this, we propose a fully learning-based method for spherical light field reconstruction from a single omnidirectional image. The proposed LFSphereNet utilizes two different networks: The first network learns to reconstruct the light field in cubemap projection (CMP) format given the six cube faces of an omnidirectional image and the corresponding cube face positions as input. The cubemap format implies a linear re-projection, which is more appropriate for a neural network. The second network refines the reconstructed cubemaps in equirectangular projection (ERP) format by removing cubemap border artifacts. The network learns the geometric features implicitly for both translation and zooming when an appropriate cost function is employed. Furthermore, it runs with very low inference time, which enables real-time applications. We demonstrate that LFSphereNet outperforms state-of-the-art approaches in terms of quality and speed when tested on different synthetic and real world scenes. The proposed method represents a significant step towards achieving real-time immersive remote telepresence experiences.

Supplemental Material

Available for Download

zip

The supplementary materials contain a short video which describes the output of the networks and the pdf file which describes some additional comparisons relevant to the article, as well as the network layers. (37.9 MB)

References

Benjamin Attal, Selena Ling, Aaron Gokaslan, Christian Richardt, and James Tompkin. 2020. MatryODShka: Real-time 6DoF video view synthesis using multi-sphere images. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I. Springer, 441–459.Google Scholar
Kyuho Bae, Andre Ivan, Hajime Nagahara, and In Kyu Park. 2021. 5d light field synthesis from a monocular video. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 7157–7164.Google ScholarCross Ref
Michael Broxton, John Flynn, Ryan Overbeck, Daniel Erickson, Peter Hedman, Matthew Duvall, Jason Dourgarian, Jay Busch, Matt Whalen, and Paul Debevec. 2020. Immersive Light Field Video with a Layered Mesh Representation. ACM Trans. Graph. 39, 4, Article 86 (aug 2020), 15 pages.Google ScholarDigital Library
Kjell Brunnström, Elijs Dima, Tahir Qureshi, Mathias Johanson, Mattias Andersson, and Mårten Sjöström. 2020. Latency impact on quality of experience in a virtual reality simulator for remote control of machines. Signal Processing: Image Communication 89 (2020), 116005.Google ScholarCross Ref
Fabio Bruno, Antonio Lagudi, Loris Barbieri, Domenico Rizzo, Maurizio Muzzupappa, and Luigi De Napoli. 2018. Augmented reality visualization of scene depth for aiding ROV pilots in underwater manipulation. Ocean Engineering 168 (2018), 140–154.Google ScholarCross Ref
Paramanand Chandramouli, Kanchana Vaishnavi Gandikota, Andreas Goerlitz, Andreas Kolb, and Michael Moeller. 2020. A generative model for generic light field reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 4 (2020), 1712–1724.Google ScholarCross Ref
Bin Chen, Lingyan Ruan, and Miu-Ling Lam. 2020. LFGAN: 4D Light Field Synthesis from a Single RGB Image. ACM Trans. Multimedia Comput. Commun. Appl. 16 (2 2020). Issue 1. https://doi.org/10.1145/3366371Google ScholarDigital Library
Yangling Chen, Shuo Zhang, Song Chang, and Youfang Lin. 2022. Light Field Reconstruction Using Efficient Pseudo 4D Epipolar-Aware Structure. IEEE Transactions on Computational Imaging 8 (2022), 397–410.Google ScholarCross Ref
François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1251–1258.Google ScholarCross Ref
Taco S. Cohen, Mario Geiger, Jonas Köhler, and Max Welling. 2018. Spherical CNNs. In International Conference on Learning Representations.Google Scholar
Xiaodong Cun, Feng Xu, Chi-Man Pun, and Hao Gao. 2019. Depth-Assisted Full Resolution Network for Single Image-Based View Synthesis. IEEE Computer Graphics and Applications 39 (2019), 52–64. Issue 2. https://doi.org/10.1109/MCG.2018.2884188Google ScholarCross Ref
Elijs Dima and Mårten Sjöström. 2021. Camera and Lidar-Based View Generation for Augmented Remote Operation in Mining Applications. IEEE Access 9 (2021), 82199–82212.Google ScholarCross Ref
Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. 2020. Image quality assessment: Unifying structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence 44, 5 (2020), 2567–2581.Google Scholar
Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Daniilidis. 2018. Learning SO(3) Equivariant Representations with Spherical CNNs. In Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarDigital Library
John Flynn, Michael Broxton, Paul Debevec, Matthew DuVall, Graham Fyffe, Ryan Overbeck, Noah Snavely, and Richard Tucker. 2019. DeepView: View Synthesis With Learned Gradient Descent. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Kai Gu, Thomas Maugey, Sebastian Knorr, and Christine Guillemot. 2022. Omni-NeRF: Neural Radiance Field from 360° Image Captures. (2022), 1–6.Google Scholar
Kang Han and Wei Xiang. 2022. Inference-Reconstruction Variational Autoencoder for Light Field Image Reconstruction. IEEE Transactions on Image Processing 31 (2022), 5629–5644. https://doi.org/10.1109/TIP.2022.3197976Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
Andre Ivan, In Kyu Park, 2019. Synthesizing a 4D spatio-angular consistent light field from a single image. arXiv preprint arXiv:1903.12364 (2019).Google Scholar
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 694–711.Google ScholarCross Ref
Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. 31–42.Google ScholarDigital Library
Qinbo Li and Nima Khademi Kalantari. 2020. Synthesizing Light Field from a Single Image with Variable MPI and Two Network Fusion. ACM Trans. Graph. 39 (11 2020). Issue 6.Google Scholar
Xiao Li, Wen Yi, Hung-Lin Chi, Xiangyu Wang, and Albert P.C. Chan. 2018. A critical review of virtual and augmented reality (VR/AR) applications in construction safety. Automation in Construction 86 (2018), 150–162.Google ScholarCross Ref
Kai-En Lin, Zexiang Xu, Ben Mildenhall, Pratul P Srinivasan, Yannick Hold-Geoffroy, Stephen DiVerdi, Qi Sun, Kalyan Sunkavalli, and Ravi Ramamoorthi. 2020. Deep multi depth panoramas for view synthesis. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII. Springer, 328–344.Google Scholar
Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, and Shenghua Gao. 2019. Liquid warping gan: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5904–5913.Google ScholarCross Ref
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2021. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Commun. ACM 65, 1 (dec 2021), 99–106.Google ScholarDigital Library
Ren Ng, Marc Levoy, Mathieu Brédif, Gene Duval, Mark Horowitz, and Pat Hanrahan. 2005. Light Field Photography with a Hand-held Plenoptic Camera. Research Report CSTR 2005-02. Stanford university. Stanford University Computer Science Tech Report pages.Google Scholar
Ryan S Overbeck, Daniel Erickson, Daniel Evangelakos, Matt Pharr, and Paul Debevec. 2018. A system for acquiring, processing, and rendering panoramic light field stills for virtual reality. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1–15.Google ScholarDigital Library
Abhilash Sunder Raj, Michael Lowney, Raj Shah, and Gordon Wetzstein. 2016. Stanford lytro light field archive.Google Scholar
Haoyu Ren, Mostafa El-Khamy, and Jungwon Lee. 2017. Image super resolution based on fusing multiple convolution neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 54–61.Google ScholarCross Ref
Martin Rerabek and Touradj Ebrahimi. 2016. New light field image dataset. In 8th International Conference on Quality of Multimedia Experience (QoMEX).Google Scholar
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241.Google Scholar
Ana Serrano, Incheol Kim, Zhili Chen, Stephen DiVerdi, Diego Gutierrez, Aaron Hertzmann, and Belen Masia. 2019. Motion parallax for 360 RGBD video. IEEE Transactions on Visualization and Computer Graphics 25, 5 (2019), 1817–1827.Google ScholarCross Ref
Hamid R Sheikh and Alan C Bovik. 2006. Image information and visual quality. IEEE Transactions on image processing 15, 2 (2006), 430–444.Google ScholarDigital Library
Lixin Shi, Haitham Hassanieh, Abe Davis, Dina Katabi, and Fredo Durand. 2014. Light field reconstruction using sparsity in the continuous fourier domain. ACM Transactions on Graphics (TOG) 34, 1 (2014), 1–13.Google ScholarDigital Library
Pratul P Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng. 2017. Learning to Synthesize a 4D RGBD Light Field From a Single Image. Proceedings of the IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
Yu-Chuan Su and Kristen Grauman. 2019. Kernel Transformer Networks for Compact Spherical Convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Xiaoyang Tian, Jie Shao, Deqiang Ouyang, and Heng Tao Shen. 2021. Uav-satellite view synthesis for cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology 32, 7 (2021), 4804–4815.Google ScholarDigital Library
Paolo Tripicchio, Emanuele Ruffaldi, Paolo Gasparello, Shingo Eguchi, Junya Kusuno, Keita Kitano, Masaki Yamada, Alfredo Argiolas, Marta Niccolini, Matteo Ragaglia, 2017. A stereo-panoramic telepresence system for construction machines. Procedia Manufacturing 11 (2017), 1552–1559.Google ScholarCross Ref
Richard Tucker and Noah Snavely. 2020. Single-view view synthesis with multiplane images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 551–560.Google ScholarCross Ref
Suren Vagharshakyan, Robert Bregovic, and Atanas Gotchev. 2018. Light Field Reconstruction Using Shearlet Transform. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 1 (2018), 133–147. https://doi.org/10.1109/TPAMI.2017.2653101Google ScholarCross Ref
John Waidhofer, Richa Gadgil, Anthony Dickson, Stefanie Zollmann, and Jonathan Ventura. 2022. PanoSynthVR: Toward Light-weight 360-Degree View Synthesis from a Single Panoramic Input. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 584–592.Google ScholarCross Ref
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. 2021. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1905–1914.Google ScholarCross Ref
Yunlong Wang, Fei Liu, Zilei Wang, Guangqi Hou, Zhenan Sun, and Tieniu Tan. 2018. End-to-end View Synthesis for Light Field Imaging with Pseudo 4DCNN. Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarDigital Library
Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612. https://doi.org/10.1109/TIP.2003.819861Google ScholarDigital Library
Zhou Wang, Eero P Simoncelli, and Alan C Bovik. 2003. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2. Ieee, 1398–1402.Google ScholarCross Ref
Gaochang Wu, Yebin Liu, Lu Fang, Qionghai Dai, and Tianyou Chai. 2019. Light Field Reconstruction Using Convolutional Network on EPI and Extended Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 7 (2019), 1681–1694. https://doi.org/10.1109/TPAMI.2018.2845393Google ScholarCross Ref
G Wu, L Zhao, L Wang, Q Dai, T Chai, and Y Liu. 2017. Light field reconstruction using deep convolutional network on epi, IEEE. In CVF Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Han Xu, Jiayi Ma, Junjun Jiang, Xiaojie Guo, and Haibin Ling. 2020. U2Fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 1 (2020), 502–518.Google ScholarDigital Library
Jiale Xu, Jia Zheng, Yanyu Xu, Rui Tang, and Shenghua Gao. 2021. Layout-guided novel view synthesis from a single indoor panorama. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16438–16447.Google ScholarCross Ref
Yeohun Yun, Seung Joon Lee, and Suk-Ju Kang. 2020. Motion recognition-based robot arm control system using head mounted display. IEEE Access 8 (2020), 15017–15026.Google ScholarCross Ref
Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang. 2011. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Transactions on Image Processing 20, 8 (2011), 2378–2386.Google ScholarDigital Library
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.Google Scholar
Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. 2016. Loss functions for image restoration with neural networks. IEEE Transactions on computational imaging 3, 1 (2016), 47–57.Google ScholarCross Ref
Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. 2018. Stereo Magnification: Learning View Synthesis Using Multiplane Images. ACM Trans. Graph. 37, 4, Article 65 (jul 2018), 12 pages.Google ScholarDigital Library
Wenhui Zhou, Gaomin Liu, Jiangwei Shi, Hua Zhang, and Guojun Dai. 2020. Depth-guided view synthesis for light field reconstruction from a single image. Image and Vision Computing 95 (2020), 103874.Google ScholarDigital Library
Wenhui Zhou, Jiangwei Shi, Yongjie Hong, Lili Lin, and Ercan Engin Kuruoglu. 2021. Robust dense light field reconstruction from sparse noisy sampling. Signal Processing 186 (9 2021). https://doi.org/10.1016/j.sigpro.2021.108121Google ScholarCross Ref
Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, Federico Alvarez, and Petros Daras. 2019. Spherical View Synthesis for Self-Supervised 360° Depth Estimation. Proceedings - 2019 International Conference on 3D Vision, 3DV 2019 (2019), 690–699. https://doi.org/10.1109/3DV.2019.00081Google ScholarCross Ref

Index Terms

LFSphereNet: Real Time Spherical Light Field Reconstruction from a Single Omnidirectional Image
1. Computing methodologies
  1. Computer graphics
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Real-Time Omnidirectional Image Sensors
Special Issue on Omni-Directional Research in Japan

Conventional T.V. cameras are limited in their field of view. A real-time omnidirectional camera which can acquire an omnidirectional (360 degrees) field of view at video rate and which could be applied in a variety of fields, such as autonomous ...
Read More
3D Scene Reconstruction with an Un-calibrated Light Field Camera
Abstract
This paper is concerned with the problem of multi-view 3D reconstruction with an un-calibrated micro-lens array based light field camera. To acquire 3D Euclidean reconstruction, existing approaches commonly apply the calibration with a ... $^{}$
Read More
Real-time global illumination using precomputed light field probes
I3D '17: Proceedings of the 21st ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games

We introduce a new data structure and algorithms that employ it to compute real-time global illumination from static environments. Light field probes encode a scene's full light field and internal visibility. They extend current radiance and irradiance ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CVMP '23: Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production
November 2023
112 pages
ISBN:9798400704260
DOI:10.1145/3626495
Editors:
Marco Volino
University of Surrey, UK
,
Armin Mustafa
University of Surrey, UK
,
Peter Vangorp
Utrecht University, Netherlands
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 November 2023
Check for updates
Author Tags
360 Degree Image
6DoF
Deep Learning
Immersive Imaging
Light Field
Omnidirectional image
View Synthesis
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate40of67submissions,60%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 195
  Total Downloads
- Downloads (Last 12 months)195
- Downloads (Last 6 weeks)58
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

LFSphereNet: Real Time Spherical Light Field Reconstruction from a Single Omnidirectional Image

CVMP '23: Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Real-Time Omnidirectional Image Sensors

3D Scene Reconstruction with an Un-calibrated Light Field Camera

Real-time global illumination using precomputed light field probes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

LFSphereNet: Real Time Spherical Light Field Reconstruction from a Single Omnidirectional Image

CVMP '23: Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Real-Time Omnidirectional Image Sensors

3D Scene Reconstruction with an Un-calibrated Light Field Camera

Real-time global illumination using precomputed light field probes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media