skip to main content
10.1145/3083165.3083180acmconferencesArticle/Chapter ViewAbstractPublication PagesmmsysConference Proceedingsconference-collections
research-article

Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality

Authors Info & Claims
Published:20 June 2017Publication History

ABSTRACT

We study the problem of predicting the Field-of-Views (FoVs) of viewers watching 360° videos using commodity Head-Mounted Displays (HMDs). Existing solutions either use the viewer's current orientation to approximate the FoVs in the future, or extrapolate future FoVs using the historical orientations and dead-reckoning algorithms. In this paper, we develop fixation prediction networks that concurrently leverage sensor- and content-related features to predict the viewer fixation in the future, which is quite different from the solutions in the literature. The sensor-related features include HMD orientations, while the content-related features include image saliency maps and motion maps. We build a 360° video streaming testbed to HMDs, and recruit twenty-five viewers to watch ten 360° videos. We then train and validate two design alternatives of our proposed networks, which allows us to identify the better-performing design with the optimal parameter settings. Trace-driven simulation results show the merits of our proposed fixation prediction networks compared to the existing solutions, including: (i) lower consumed bandwidth, (ii) shorter initial buffering time, and (iii) short running time.

References

  1. 2016. Augmented Virtual Reality revenue forecast revised to hit $120 billion by 2020. (2016). https://goo.gl/nw9mtP.Google ScholarGoogle Scholar
  2. 2016. Global 360-Degree Camera Market 2016-2020. (2016). https://goo.gl/zJCdnO.Google ScholarGoogle Scholar
  3. T. Alshawi, Z. Long, and G. AlRegib. 2016. Understanding spatial correlation in eye-fixation maps for visual attention in videos. In Proc. of IEEE International Conference on Multimedia and Expo (ICME'16). 1--6.Google ScholarGoogle Scholar
  4. A. Borji, M. Cheng, H. Jiang, and J. Li. 2014. Salient object detection: A survey. arXiv preprint arXiv:1411.5878 (2014).Google ScholarGoogle Scholar
  5. L. Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proc. of International Conference on Computational Statistics (COMPSTAT'10). 177--186.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Chaabouni, J. Benois-Pineau, and C. Amar. 2016. Transfer learning with deep networks for saliency prediction in natural video. In Proc. of IEEE International Conference on Image Processing (ICIP'16). 1604--1608.Google ScholarGoogle Scholar
  7. C. Chang, C. Hsu, C. Hsu, and K. Chen. 2016. Performance measurements of virtual reality systems: Quantifying the timing and positioning accuracy. In Proc. of ACM Conference on Multimedia (MM'16). 655--659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara. 2016. A Deep Multi-Level Network for Saliency Prediction. In International Conference on Pattern Recognition (ICPR'16). 3488--3493.Google ScholarGoogle Scholar
  9. T. El-Ganainy and M. Hefeeda. 2016. Streaming Virtual Reality Content. arXiv preprint arXiv:1612.08350 (2016).Google ScholarGoogle Scholar
  10. S. Friston and A. Steed. 2014. Measuring latency in virtual environments. Transactions on Visualization and Computer Graphics 20, 4 (2014), 616--625. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. V Gaddam, M. Riegler, R. Eg, C. Griwodz, and P. Halvorsen. 2016. Tiling in Interactive Panoramic Video: Approaches and Evaluation. IEEE Transactions on Multimedia 18, 9 (2016), 1819--1831. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Guntur and W. Ooi. 2012. On tile assignment for region-of-interest video streaming in a wireless LAN. In Proc. of ACM international workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV'12). 59--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chun-Ying Huang, Kuan-Ta Chen, De-Yu Chen, Hwai-Jung Hsu, and Cheng-Hsin Hsu. 2014. GamingAnywhere: The First Open Source Cloud Gaming System. ACM Transactions on Multimedia Computing, Communications, and Applications 10, 1 (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Judd, K. Ehinger, F. Durand, and A. Torralba. 2009. Learning to predict where humans look. In IEEE International Conference on Computer Vision (ICCV'09). 2106--2113.Google ScholarGoogle Scholar
  16. Y. Kavak, E. Erdem, and A. Erdem. 2017. A comparative study for feature integration strategies in dynamic saliency estimation. Signal Processing: Image Communication 51 (2017), 13--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Kimata, D. Ochi, A. Kameda, H. Noto, K. Fukazawa, and A. Kojima. 2012. Mobile and multi-device interactive panorama video distribution system. In Proc. of IEEE Global Conference on Consumer Electronics (GCCE'12). 574--578.Google ScholarGoogle Scholar
  18. B. Lucas and T. Kanade. 1981. An iterative image registration technique with an application to stereo vision. In Proc. of the International Joint Conference on Artificial Intelligence. 674--679. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Lakshman M. Yu and B. Girod. 2015. A Framework to Evaluate Omnidirectional Video Coding Schemes. In IEEE International Symposium on Mixed and Augmented Reality. 31--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Mavlankar and B. Girod. 2009. Pre-fetching based on video analysis for interactive region-of-interest streaming of soccer sequences. In Proc. of IEEE International Conference on Image Processing (ICIP'09). 3061--3064. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Mavlankar and B. Girod. 2010. Video streaming with interactive pan/tilt/zoom. In Signals and Communication Technology. 431--455.Google ScholarGoogle Scholar
  22. T. Nguyen, M. Xu, G. Gao, M. Kankanhalli, Q. Tian, and S. Yan. 2013. Static saliency vs. dynamic saliency: a comparative study. In Proc. of ACM International Conference on Multimedia (MM'13). 987--996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  24. K. Skarseth, H. Bjørlo, P. Halvorsen, M. Riegler, and C. Griwodz. 2016. OpenVQ: a video quality assessment toolkit. In Proc. of ACM International Conference on Multimedia (MM'16), OSSC paper. 1197--1200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. Sodagar. 2011. The mpeg-dash standard for multimedia streaming over the internet. IEEE MultiMedia 18, 4 (2011), 62--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Vig, M. Dorr, and D. Cox. 2014. Large-scale optimization of hierarchical features for saliency prediction in natural images. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR'14). 2798--2805. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Simon X. Corbillon, A. Devlic and J. Chakareski. 2017. Viewport-Adaptive Navigable 360-Degree Video Delivery. In IEEE International Conference on Communications (ICC'17). Accepted to appear.Google ScholarGoogle Scholar
  28. M. Young, G. Gaylor, S. Andrus, and B. Bodenheimer. 2014. A comparison of two cost-differentiated virtual reality systems for perception and action tasks. In Proc. of the ACM Symposium on Applied Perception. 83--90. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        NOSSDAV'17: Proceedings of the 27th Workshop on Network and Operating Systems Support for Digital Audio and Video
        June 2017
        105 pages
        ISBN:9781450350037
        DOI:10.1145/3083165

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 June 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        NOSSDAV'17 Paper Acceptance Rate15of40submissions,38%Overall Acceptance Rate118of363submissions,33%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader