ABSTRACT
In this work we present our recent approach on estimating head orientations and foci of attention of multiple people in a smart room, which is equipped with several cameras to monitor the room. In our approach, we estimate each person's head orientation with respect to the room coordinate system by using all camera views. We implemented a Neural Network to estimate head pose on every single camera view, a Bayes filter is then applied to integrate every estimate into one final, joint hypothesis. Using this scheme, we can track peoples' horizontal head orientations in a full 360° range at almost all positions within the room. The tracked head orientations are then used to determine who is looking at whom, i.e. people's focus of attention. We report experimental results on one meeting video, that was recorded in the smart room.
- S. O. Ba and J.-M. Obodez. A probabilistic framework for joint head tracking and pose estimation. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. Google ScholarDigital Library
- A. H. Gee and R. Cipolla. Non-intrusive gaze tracking for human-computer interaction. In Proceedings of Mechatronics and Machine Vision in Practise, pages 112--117, 1994.Google Scholar
- T. Horprasert, Y. Yacoob, and L. S. Davis. Computing 3-d head orientation from a monocular image sequence. In Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition, 1996. Google ScholarDigital Library
- M. Katzenmaier, R. Stiefelhagen, T. Schultz, I. Rogina, and A. Waibel. Identifying the addressee in human-human-robot interactions based on head pose and speech. In International Conference on Multimodal Interfaces ICMI, 2004. Google ScholarDigital Library
- S. R. Langton. The mutual influence of gaze and head orientation in the analysis of social attention direction. In The Quaterly Journal of Experimental Psychology, 53A(3):825--845, 2000.Google ScholarCross Ref
- M. C. Michael Argyle. Gaze and Mutual Gaze. Cambridge University Press, 1976.Google Scholar
- K. Otsuka, Y. Takemae, J. Yamamoto, and H. Murase. A probabilistic inference of multiparty-conversation structure based on markov-switching models of gaze patterns, head directions and utterances. In Proceedings of the International Conference on Multimodal Interfaces - ICMI, 2005. Google ScholarDigital Library
- R. Pappu and P. Beardsley. A qualitative approach to classifying gaze direction. In Proceedings of FG98, pages 160--165, 1998. Google ScholarDigital Library
- V. B. Stephen R. H. Langton, Roger J. Watt. Do the eyes have it? cues to the direction of social attention. In Trends in Cognitive Neuroscience, 4(2):50--58, 2000.Google ScholarCross Ref
- R. Stiefelhagen. Tracking focus of attention in meetings. In IEEE International Conference on Multimodal Interfaces, pages 273--280, 2002. Google ScholarDigital Library
- R. Stiefelhagen, J. Yang, and A. Waibel. Simultaneous tracking of head poses in a panoramic view. In International Conference on Pattern Recognition, volume 3, pages 726--729, September 2000. Google ScholarDigital Library
- R. Stiefelhagen, J. Yang, and A. Waibel. Modeling focus of attention for meeting indexing based on multiple cues. IEEE Transactions on Neural Networks, 13(4), 2002. Google ScholarDigital Library
- R. Stiefelhagen and J. Zhu. Head orientation and gaze direction in meetings. In Conference on Human Factors in Computing Systems (CHI2002), Minneapolis, April 2002. Google ScholarDigital Library
- Y.-L. Tian, L. Brown, J. Connell, S. Pankanti, A. Hampapur, A. Senior, and R. Bolle. Absolute head pose estimation from overhead wide-angle cameras. In IEEE International Workshop on Analysis and Modeling of Faces and Gestures, 2003. Google ScholarDigital Library
- K. van Turhnout, J. Terken, I. Bakx, and B. Eggen. Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features. In Proceedings of the International Conference on Multimodal Interfaces - ICMI, 2005. Google ScholarDigital Library
- R. Vertegaal. Attentive user interfaces. Communications of the ACM, 46(3), 2003. Google ScholarDigital Library
- M. Voit, K. Nickel, and R. Stiefelhagen. Multi-view head pose estimation using neural networks. In Second Workshop on Face Processing in Video (FPiV'05), in Proceedings of Second Canadian Conference on Computer and Robot Vision. (CRV'05), 9-11 May 2005, Victoria, BC, Canada, 2005. Google ScholarDigital Library
- M. L. Z. Zhang, G. Potamianos and T. Huang. Robust multi-view multi-camera face detection inside smart rooms using spatio-temporal dynamic programming. In Proceedings of Automatic Face and Gesture Recognition (FG), Southampton, United Kingdom, 2006. Google ScholarDigital Library
Index Terms
- Tracking head pose and focus of attention with multiple far-field cameras
Recommendations
3D point-of-regard, position and head orientation from a portable monocular video-based eye tracker
ETRA '08: Proceedings of the 2008 symposium on Eye tracking research & applicationsTo study an observer's eye movements during realistic tasks, the observer should be free to move naturally throughout our three-dimensional world. Therefore, a technique to determine an observer's point-of-regard (POR) as well as his/her motion ...
Head pose estimation with one camera, in uncalibrated environments
EGIHMI '10: Proceedings of the 2010 workshop on Eye gaze in intelligent human machine interactionHead pose together with eye gaze are a reliable indication regarding the estimate of the focus of attention of a person standing in front of a camera, with applications ranging from driver's attention estimation to meeting environments. As gaze ...
An Affordable Solution for Binocular Eye Tracking and Calibration in Head-mounted Displays
MM '15: Proceedings of the 23rd ACM international conference on MultimediaImmersion is the ultimate goal of head-mounted displays (HMD) for Virtual Reality (VR) in order to produce a convincing user experience. Two important aspects in this context are motion sickness, often due to imprecise calibration, and the integration ...
Comments