skip to main content
10.1145/1315184.1315187acmconferencesArticle/Chapter ViewAbstractPublication PagesvrstConference Proceedingsconference-collections
Article

Real-time tracking of visually attended objects in interactive virtual environments

Published: 05 November 2007 Publication History

Abstract

This paper presents a real-time framework for computationally tracking objects visually attended by the user while navigating in interactive virtual environments. In addition to the conventional bottom-up (stimulus-driven) features, the framework also uses topdown (goal-directed) contexts to predict the human gaze. The framework first builds feature maps using preattentive features such as luminance, hue, depth, size, and motion. The feature maps are then integrated into a single saliency map using the center-surround difference operation. This pixel-level bottom-up saliency map is converted to an object-level saliency map using the item buffer. Finally, the top-down contexts are inferred from the user's spatial and temporal behaviors during interactive navigation and used to select the most plausibly attended object among candidates produced in the object saliency map. The computational framework was implemented using the GPU and exhibited extremely fast computing performance (5.68 msec for a 256X256 saliency map), substantiating its adequacy for interactive virtual environments. A user experiment was also conducted to evaluate the prediction accuracy of the visual attention tracking framework with respect to actual human gaze data. The attained accuracy level was well supported by the theory of human cognition for visually identifying a single and multiple attentive targets, especially due to the addition of top-down contextual information. The framework can be effectively used for perceptually based rendering without employing an expensive eye tracker, such as providing the depth-of-field effects and managing the level-of-detail in virtual environments.

References

[1]
Awh, E., and Pashler, H. 2000. Evidence for split attentional foci. Journal of Experimental Psychology 26, 2, 834--846.
[2]
Backer, G., Mertsching, B., and Bollmann, M. 2001. Data- and model-driven gaze control for an active-vision system. IEEE Trans. Pattern Analysis and Machine Intelligence 23, 12, 1415--1429.
[3]
Beeharee, A. K., West, A. J., and Hubbold, R. J. 2003. Visual attention based information culling for distributed virtual environments. In Proceedings of ACM Symposium on Virtual Reality Software and Technology, 213--222.
[4]
Brown, R., Cooper, L., and Pham, B. 2003. Visual attentionbased polygon level of detail management. In Proceedings of GRAPHITE 2003, 55--62.
[5]
Burns, D., and Osfield, R., 2006. OpenSceneGraph. http://www.openscenegraph.org.
[6]
Cater, K., Chalmers, A., and Ledda, P. 2002. Selective quality rendering by exploiting human inattentional blindness: looking but not seeing. In Proceedings of ACM Symposium on Virtual Reality Software and Technology, 17--24.
[7]
Connor, C., Egeth, H., and Yantis, S. 2004. Visual attention: Bottom-up vs. top-down. Current Biology 14, 19, 850--852.
[8]
Culhane, S. M., and Tsotsos, J. K. 1992. An attentional prototype for early vision. In Proceedings of European Conference on Computer Vision 1992, 551--560.
[9]
Engel, S., Zhang, X., and Wandell, B. 1997. Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature 388, 6637, 68--71.
[10]
Enns, J. T. 1990. Three-dimensional features that pop out in visual search. In Visual Search, Taylor and Francis, Eds. New York, 37--45.
[11]
Haber, J., Myszkowski, K., Yamauchi, H., and Seidel, H.-P. 2001. Perceptually guided corrective splatting. Computer Graphics Forum 20, 3.
[12]
Henderson, J. M. 2003. Human gaze control during real-world scene perception. Trends in Cognitive Sciences 7, 11, 498--504.
[13]
Itti, L., Koch, C., and Niebur, E. 1998. A model of saliencybased visual attention for rapid scene analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 20, 11, 1254--1259.
[14]
Itti, L. 2000. Models of Bottom-Up and Top-Down Visual Attention. PhD thesis, California Institute of Technology, Pasadena, California.
[15]
Jobson, D. J., ur Rahman, Z., and Woodell, G. A. 1997. Properties and performance of a center/surround retinex. IEEE Trans. on Image Processing 6, 3, 451--462.
[16]
Kalman, R. E. 1960. A new approach to linear filtering and predictive problems. Trans. ASME, Journal of basic engineering 82, 34--45.
[17]
Kessenich, J., Baldwin, D., and Rost, R., 2004. The OpenGL Shading Language. Version 1.10.59. 3Dlabs, Inc. Ltd. http://developer.3dlabs.com/documents/index.htm.
[18]
Koch, C., and Ullman, S. 1985. Shifts in selective visual attention. Human Neurobiology 4, 219--227.
[19]
Kuipers, B. 1978. Modeling spatial knowledge. Cognitive Science 2, 129--153.
[20]
Lee, C. H., Varshney, A., and Jacobs, D. W. 2005. Mesh saliency. In Proceedings of SIGGRAPH 2005, 659--666.
[21]
Loftus, G. R., and Mackworth, N. H. 1978. Cognitive determinants of fixation duration during picture viewing. Journal of Experimental Psychology 4, 565--572.
[22]
Longhurst, P., Debattista, K., and Chalmers, A. 2006. A GPU based saliency map for high-fidelity selective rendering. In Proceedings of AFRIGRAPH 2006, 21--29.
[23]
Ma, Y.-F., Hua, X.-S., Lu, L., and Zhang, H. 2005. A generic framework of user attention model and its application in video summarization. IEEE Trans. on Multimedia 7, 5, 907--919.
[24]
Marshall, J., Burbeck, C., Ariely, D., Rolland, J., and Martin, K. 1996. Occlusion edge blur: a cue to relative visual depth. Journal of the Optical Society of America 13, 681--688.
[25]
Mather, G. 1997. The use of image blur as a depth cue. Perception 26, 1147--1158.
[26]
Mozer, M. C., and Sitton, M. 1998. Computational modeling of spatial attention. In Attention, H. Pashler, Ed. UCL Press, London, 341--393.
[27]
Nagy, A. L., and Sanchez, R. R. 1990. Critical color differences determined with a visual search task. Journal of the Optical Society of America 7, 7, 1209--1217.
[28]
Nakayama, K., and Silverman, G. 1986. Serial and parallel processing of visual feature conjunctions. Nature 320, 264--265.
[29]
O'Craven, K. M., Downing, P. E., and Kanwisher, N. 1999. fMRI evidence for objects as the units of attentional selection. Nature 401, 6753, 584--587.
[30]
OpenCV, 2006. http://sourceforge.net/projects/opencvlibrary/.
[31]
Ouerhani, N., Bracamonte, J., Hugli, H., Ansorge, M., and Pellandini, F. 2001. Adaptive color image compression based on visual attention. In Proceedings of ICIAP, 416--421.
[32]
Ouerhani, N., von Wartburg, R., and Hugli, H. 2004. Empirical validation of the saliency-based model of visual attention. Electronic Letters on Computer Vision and Image Analysis 3, 1, 13--24.
[33]
Rutishauser, U., Walther, D., Koch, C., and Perona, P. 2004. Is bottom-up attention useful for object recognition? In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2004, 37--44.
[34]
Santella, A., and DeCarlo, D. 2004. Visual interest and NPR: an evaluation and manifesto. In Proceedings of Symposium on Non-Photorealistic Animation and Rendering, 71--150.
[35]
Sears, C., and Pylyshyn, Z. 2000. Multiple object tracking and attentional processing. Journal of Experimental Psychology 54, 1, 1--14.
[36]
Siegel, A. W., and White, S. H. 1975. The development of spatial representations of large-scale environments. In Advances in Child Development and Behavior, H. Reese, Ed., vol. 10. Academic Press, New York, 10--55.
[37]
Speed, F. M., Hocking, R. R., and Hackney, O. P. 1978. Methods of analysis of linear models with unbalanced data. Journal of the American Statistical Association 73, 361, 105--112.
[38]
Treisman, A. M., and Gelade, G. 1980. A feature-integration theory of attention. Cognitive Psychology 12, 97--136.
[39]
Vishton, P., and Cutting, J. 1995. Wayfinding, displacements, and mental maps: velocity field are not typically used to determine one's aimpoint. Journal of Experimental Psychology 21, 978--995.
[40]
Watson, B., Walker, N., Hodges, L. F., and Worden, A. 1997. Managing level of detail through peripheral degradation: Effects on search performance with a head-mounted display. ACM Trans. on Computer-Human Interaction 4, 4, 323--346.
[41]
Weghorst, H., Hooper, G., and Greenberg, D. P. 1984. Improved computational methods for ray tracing. ACM Trans. on Graphics 3, 1, 52--69.
[42]
Welsh, G., and Bishop, G. 1995. An introduction to the Kalman filter. Tech. Rep. 95--041, Univ. of North Carolina at Chapel Hill.
[43]
Wolfe, and Jeremy, M. 1993. Guided search 2.0. In Proceedings of the Human Factors and Ergonomics Society 37th Annual Meeting, 1295--1299.
[44]
Yee, H., Pattanaik, S., and Greenberg, D. P. 2001. Spatiotemporal sensitivity and visual attention for efficient rendering of dynamic environments. ACM Trans. on Graphics 20, 39--65.

Cited By

View all
  • (2014)GPU-Based Selective Sparse Sampling for Interactive High-Fidelity Rendering2014 6th International Conference on Games and Virtual Worlds for Serious Applications (VS-GAMES)10.1109/VS-Games.2014.7012159(1-8)Online publication date: Sep-2014
  • (2012)Gazing at Games: An Introduction to Eye Tracking ControlSynthesis Lectures on Computer Graphics and Animation10.2200/S00395ED1V01Y201111CGR0145:1(1-113)Online publication date: 5-Mar-2012
  • (2012)Real-time tracking of humans and visualization of their future footsteps in public indoor environmentsMultimedia Tools and Applications10.1007/s11042-010-0691-z59:1(65-88)Online publication date: 1-Jul-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VRST '07: Proceedings of the 2007 ACM symposium on Virtual reality software and technology
November 2007
259 pages
ISBN:9781595938633
DOI:10.1145/1315184
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention tracking
  2. bottom-up feature
  3. saliency map
  4. top-down context
  5. virtual environment
  6. visual attention

Qualifiers

  • Article

Conference

VRST07

Acceptance Rates

Overall Acceptance Rate 66 of 254 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2014)GPU-Based Selective Sparse Sampling for Interactive High-Fidelity Rendering2014 6th International Conference on Games and Virtual Worlds for Serious Applications (VS-GAMES)10.1109/VS-Games.2014.7012159(1-8)Online publication date: Sep-2014
  • (2012)Gazing at Games: An Introduction to Eye Tracking ControlSynthesis Lectures on Computer Graphics and Animation10.2200/S00395ED1V01Y201111CGR0145:1(1-113)Online publication date: 5-Mar-2012
  • (2012)Real-time tracking of humans and visualization of their future footsteps in public indoor environmentsMultimedia Tools and Applications10.1007/s11042-010-0691-z59:1(65-88)Online publication date: 1-Jul-2012
  • (2011)Directing attention and influencing memory with visual saliency modulationProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1978942.1979158(1471-1480)Online publication date: 7-May-2011
  • (2011)Parallel implementation of a spatio-temporal visual saliency modelJournal of Real-Time Image Processing10.1007/s11554-010-0164-76:1(3-14)Online publication date: 1-Mar-2011
  • (2010)Focus and context in mixed reality by modulating first order salient featuresProceedings of the 10th international conference on Smart graphics10.5555/1894345.1894374(232-243)Online publication date: 24-Jun-2010
  • (2010)An empirical pipeline to derive gaze prediction heuristics for 3D action gamesACM Transactions on Applied Perception10.1145/1857893.18578978:1(1-30)Online publication date: 10-Nov-2010
  • (2010)Visual attention & multi-cue fusion based human motion tracking method2010 Sixth International Conference on Natural Computation10.1109/ICNC.2010.5584296(2044-2054)Online publication date: Aug-2010
  • (2010)Focus and Context in Mixed Reality by Modulating First Order Salient FeaturesSmart Graphics10.1007/978-3-642-13544-6_22(232-243)Online publication date: 2010
  • (2009)The whys, how tos, and pitfalls of user studiesACM SIGGRAPH 2009 Courses10.1145/1667239.1667264(1-205)Online publication date: 3-Aug-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media