ABSTRACT
Recent developments in immersive imaging technologies have enabled improved telepresence applications. Being fully matured in the commercial sense, omnidirectional (360-degree) content provides full vision around the camera with three degrees of freedom (3DoF). Considering the applications in real-time immersive telepresence, this paper investigates how a single omnidirectional image (ODI) can be used to extend 3DoF to 6DoF. To achieve this, we propose a fully learning-based method for spherical light field reconstruction from a single omnidirectional image. The proposed LFSphereNet utilizes two different networks: The first network learns to reconstruct the light field in cubemap projection (CMP) format given the six cube faces of an omnidirectional image and the corresponding cube face positions as input. The cubemap format implies a linear re-projection, which is more appropriate for a neural network. The second network refines the reconstructed cubemaps in equirectangular projection (ERP) format by removing cubemap border artifacts. The network learns the geometric features implicitly for both translation and zooming when an appropriate cost function is employed. Furthermore, it runs with very low inference time, which enables real-time applications. We demonstrate that LFSphereNet outperforms state-of-the-art approaches in terms of quality and speed when tested on different synthetic and real world scenes. The proposed method represents a significant step towards achieving real-time immersive remote telepresence experiences.
Supplemental Material
- Benjamin Attal, Selena Ling, Aaron Gokaslan, Christian Richardt, and James Tompkin. 2020. MatryODShka: Real-time 6DoF video view synthesis using multi-sphere images. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I. Springer, 441–459.Google Scholar
- Kyuho Bae, Andre Ivan, Hajime Nagahara, and In Kyu Park. 2021. 5d light field synthesis from a monocular video. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 7157–7164.Google ScholarCross Ref
- Michael Broxton, John Flynn, Ryan Overbeck, Daniel Erickson, Peter Hedman, Matthew Duvall, Jason Dourgarian, Jay Busch, Matt Whalen, and Paul Debevec. 2020. Immersive Light Field Video with a Layered Mesh Representation. ACM Trans. Graph. 39, 4, Article 86 (aug 2020), 15 pages.Google ScholarDigital Library
- Kjell Brunnström, Elijs Dima, Tahir Qureshi, Mathias Johanson, Mattias Andersson, and Mårten Sjöström. 2020. Latency impact on quality of experience in a virtual reality simulator for remote control of machines. Signal Processing: Image Communication 89 (2020), 116005.Google ScholarCross Ref
- Fabio Bruno, Antonio Lagudi, Loris Barbieri, Domenico Rizzo, Maurizio Muzzupappa, and Luigi De Napoli. 2018. Augmented reality visualization of scene depth for aiding ROV pilots in underwater manipulation. Ocean Engineering 168 (2018), 140–154.Google ScholarCross Ref
- Paramanand Chandramouli, Kanchana Vaishnavi Gandikota, Andreas Goerlitz, Andreas Kolb, and Michael Moeller. 2020. A generative model for generic light field reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 4 (2020), 1712–1724.Google ScholarCross Ref
- Bin Chen, Lingyan Ruan, and Miu-Ling Lam. 2020. LFGAN: 4D Light Field Synthesis from a Single RGB Image. ACM Trans. Multimedia Comput. Commun. Appl. 16 (2 2020). Issue 1. https://doi.org/10.1145/3366371Google ScholarDigital Library
- Yangling Chen, Shuo Zhang, Song Chang, and Youfang Lin. 2022. Light Field Reconstruction Using Efficient Pseudo 4D Epipolar-Aware Structure. IEEE Transactions on Computational Imaging 8 (2022), 397–410.Google ScholarCross Ref
- François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1251–1258.Google ScholarCross Ref
- Taco S. Cohen, Mario Geiger, Jonas Köhler, and Max Welling. 2018. Spherical CNNs. In International Conference on Learning Representations.Google Scholar
- Xiaodong Cun, Feng Xu, Chi-Man Pun, and Hao Gao. 2019. Depth-Assisted Full Resolution Network for Single Image-Based View Synthesis. IEEE Computer Graphics and Applications 39 (2019), 52–64. Issue 2. https://doi.org/10.1109/MCG.2018.2884188Google ScholarCross Ref
- Elijs Dima and Mårten Sjöström. 2021. Camera and Lidar-Based View Generation for Augmented Remote Operation in Mining Applications. IEEE Access 9 (2021), 82199–82212.Google ScholarCross Ref
- Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. 2020. Image quality assessment: Unifying structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence 44, 5 (2020), 2567–2581.Google Scholar
- Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Daniilidis. 2018. Learning SO(3) Equivariant Representations with Spherical CNNs. In Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarDigital Library
- John Flynn, Michael Broxton, Paul Debevec, Matthew DuVall, Graham Fyffe, Ryan Overbeck, Noah Snavely, and Richard Tucker. 2019. DeepView: View Synthesis With Learned Gradient Descent. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Kai Gu, Thomas Maugey, Sebastian Knorr, and Christine Guillemot. 2022. Omni-NeRF: Neural Radiance Field from 360° Image Captures. (2022), 1–6.Google Scholar
- Kang Han and Wei Xiang. 2022. Inference-Reconstruction Variational Autoencoder for Light Field Image Reconstruction. IEEE Transactions on Image Processing 31 (2022), 5629–5644. https://doi.org/10.1109/TIP.2022.3197976Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
- Andre Ivan, In Kyu Park, 2019. Synthesizing a 4D spatio-angular consistent light field from a single image. arXiv preprint arXiv:1903.12364 (2019).Google Scholar
- Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 694–711.Google ScholarCross Ref
- Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. 31–42.Google ScholarDigital Library
- Qinbo Li and Nima Khademi Kalantari. 2020. Synthesizing Light Field from a Single Image with Variable MPI and Two Network Fusion. ACM Trans. Graph. 39 (11 2020). Issue 6.Google Scholar
- Xiao Li, Wen Yi, Hung-Lin Chi, Xiangyu Wang, and Albert P.C. Chan. 2018. A critical review of virtual and augmented reality (VR/AR) applications in construction safety. Automation in Construction 86 (2018), 150–162.Google ScholarCross Ref
- Kai-En Lin, Zexiang Xu, Ben Mildenhall, Pratul P Srinivasan, Yannick Hold-Geoffroy, Stephen DiVerdi, Qi Sun, Kalyan Sunkavalli, and Ravi Ramamoorthi. 2020. Deep multi depth panoramas for view synthesis. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII. Springer, 328–344.Google Scholar
- Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, and Shenghua Gao. 2019. Liquid warping gan: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5904–5913.Google ScholarCross Ref
- Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2021. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Commun. ACM 65, 1 (dec 2021), 99–106.Google ScholarDigital Library
- Ren Ng, Marc Levoy, Mathieu Brédif, Gene Duval, Mark Horowitz, and Pat Hanrahan. 2005. Light Field Photography with a Hand-held Plenoptic Camera. Research Report CSTR 2005-02. Stanford university. Stanford University Computer Science Tech Report pages.Google Scholar
- Ryan S Overbeck, Daniel Erickson, Daniel Evangelakos, Matt Pharr, and Paul Debevec. 2018. A system for acquiring, processing, and rendering panoramic light field stills for virtual reality. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1–15.Google ScholarDigital Library
- Abhilash Sunder Raj, Michael Lowney, Raj Shah, and Gordon Wetzstein. 2016. Stanford lytro light field archive.Google Scholar
- Haoyu Ren, Mostafa El-Khamy, and Jungwon Lee. 2017. Image super resolution based on fusing multiple convolution neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 54–61.Google ScholarCross Ref
- Martin Rerabek and Touradj Ebrahimi. 2016. New light field image dataset. In 8th International Conference on Quality of Multimedia Experience (QoMEX).Google Scholar
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241.Google Scholar
- Ana Serrano, Incheol Kim, Zhili Chen, Stephen DiVerdi, Diego Gutierrez, Aaron Hertzmann, and Belen Masia. 2019. Motion parallax for 360 RGBD video. IEEE Transactions on Visualization and Computer Graphics 25, 5 (2019), 1817–1827.Google ScholarCross Ref
- Hamid R Sheikh and Alan C Bovik. 2006. Image information and visual quality. IEEE Transactions on image processing 15, 2 (2006), 430–444.Google ScholarDigital Library
- Lixin Shi, Haitham Hassanieh, Abe Davis, Dina Katabi, and Fredo Durand. 2014. Light field reconstruction using sparsity in the continuous fourier domain. ACM Transactions on Graphics (TOG) 34, 1 (2014), 1–13.Google ScholarDigital Library
- Pratul P Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng. 2017. Learning to Synthesize a 4D RGBD Light Field From a Single Image. Proceedings of the IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
- Yu-Chuan Su and Kristen Grauman. 2019. Kernel Transformer Networks for Compact Spherical Convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Xiaoyang Tian, Jie Shao, Deqiang Ouyang, and Heng Tao Shen. 2021. Uav-satellite view synthesis for cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology 32, 7 (2021), 4804–4815.Google ScholarDigital Library
- Paolo Tripicchio, Emanuele Ruffaldi, Paolo Gasparello, Shingo Eguchi, Junya Kusuno, Keita Kitano, Masaki Yamada, Alfredo Argiolas, Marta Niccolini, Matteo Ragaglia, 2017. A stereo-panoramic telepresence system for construction machines. Procedia Manufacturing 11 (2017), 1552–1559.Google ScholarCross Ref
- Richard Tucker and Noah Snavely. 2020. Single-view view synthesis with multiplane images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 551–560.Google ScholarCross Ref
- Suren Vagharshakyan, Robert Bregovic, and Atanas Gotchev. 2018. Light Field Reconstruction Using Shearlet Transform. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 1 (2018), 133–147. https://doi.org/10.1109/TPAMI.2017.2653101Google ScholarCross Ref
- John Waidhofer, Richa Gadgil, Anthony Dickson, Stefanie Zollmann, and Jonathan Ventura. 2022. PanoSynthVR: Toward Light-weight 360-Degree View Synthesis from a Single Panoramic Input. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 584–592.Google ScholarCross Ref
- Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. 2021. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1905–1914.Google ScholarCross Ref
- Yunlong Wang, Fei Liu, Zilei Wang, Guangqi Hou, Zhenan Sun, and Tieniu Tan. 2018. End-to-end View Synthesis for Light Field Imaging with Pseudo 4DCNN. Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarDigital Library
- Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612. https://doi.org/10.1109/TIP.2003.819861Google ScholarDigital Library
- Zhou Wang, Eero P Simoncelli, and Alan C Bovik. 2003. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2. Ieee, 1398–1402.Google ScholarCross Ref
- Gaochang Wu, Yebin Liu, Lu Fang, Qionghai Dai, and Tianyou Chai. 2019. Light Field Reconstruction Using Convolutional Network on EPI and Extended Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 7 (2019), 1681–1694. https://doi.org/10.1109/TPAMI.2018.2845393Google ScholarCross Ref
- G Wu, L Zhao, L Wang, Q Dai, T Chai, and Y Liu. 2017. Light field reconstruction using deep convolutional network on epi, IEEE. In CVF Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- Han Xu, Jiayi Ma, Junjun Jiang, Xiaojie Guo, and Haibin Ling. 2020. U2Fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 1 (2020), 502–518.Google ScholarDigital Library
- Jiale Xu, Jia Zheng, Yanyu Xu, Rui Tang, and Shenghua Gao. 2021. Layout-guided novel view synthesis from a single indoor panorama. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16438–16447.Google ScholarCross Ref
- Yeohun Yun, Seung Joon Lee, and Suk-Ju Kang. 2020. Motion recognition-based robot arm control system using head mounted display. IEEE Access 8 (2020), 15017–15026.Google ScholarCross Ref
- Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang. 2011. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Transactions on Image Processing 20, 8 (2011), 2378–2386.Google ScholarDigital Library
- Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.Google Scholar
- Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. 2016. Loss functions for image restoration with neural networks. IEEE Transactions on computational imaging 3, 1 (2016), 47–57.Google ScholarCross Ref
- Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. 2018. Stereo Magnification: Learning View Synthesis Using Multiplane Images. ACM Trans. Graph. 37, 4, Article 65 (jul 2018), 12 pages.Google ScholarDigital Library
- Wenhui Zhou, Gaomin Liu, Jiangwei Shi, Hua Zhang, and Guojun Dai. 2020. Depth-guided view synthesis for light field reconstruction from a single image. Image and Vision Computing 95 (2020), 103874.Google ScholarDigital Library
- Wenhui Zhou, Jiangwei Shi, Yongjie Hong, Lili Lin, and Ercan Engin Kuruoglu. 2021. Robust dense light field reconstruction from sparse noisy sampling. Signal Processing 186 (9 2021). https://doi.org/10.1016/j.sigpro.2021.108121Google ScholarCross Ref
- Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, Federico Alvarez, and Petros Daras. 2019. Spherical View Synthesis for Self-Supervised 360° Depth Estimation. Proceedings - 2019 International Conference on 3D Vision, 3DV 2019 (2019), 690–699. https://doi.org/10.1109/3DV.2019.00081Google ScholarCross Ref
Index Terms
- LFSphereNet: Real Time Spherical Light Field Reconstruction from a Single Omnidirectional Image
Recommendations
Real-Time Omnidirectional Image Sensors
Special Issue on Omni-Directional Research in JapanConventional T.V. cameras are limited in their field of view. A real-time omnidirectional camera which can acquire an omnidirectional (360 degrees) field of view at video rate and which could be applied in a variety of fields, such as autonomous ...
3D Scene Reconstruction with an Un-calibrated Light Field Camera
AbstractThis paper is concerned with the problem of multi-view 3D reconstruction with an un-calibrated micro-lens array based light field camera. To acquire 3D Euclidean reconstruction, existing approaches commonly apply the calibration with a ...
Real-time global illumination using precomputed light field probes
I3D '17: Proceedings of the 21st ACM SIGGRAPH Symposium on Interactive 3D Graphics and GamesWe introduce a new data structure and algorithms that employ it to compute real-time global illumination from static environments. Light field probes encode a scene's full light field and internal visibility. They extend current radiance and irradiance ...
Comments