research-article

Public Access

Estimating Head Motion from Egocentric Vision

Authors:
Satoshi Tsutsui

Indiana University, Bloomington, IN, USA

Indiana University, Bloomington, IN, USA
View Profile

,
Sven Bambach

Indiana University, Bloomington, IN, USA

Indiana University, Bloomington, IN, USA
View Profile

,
David Crandall

Indiana University, Bloomington, IN, USA

Indiana University, Bloomington, IN, USA
View Profile

,
Chen Yu

Indiana University, Bloomington, IN, USA

Indiana University, Bloomington, IN, USA
View Profile

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal InteractionOctober 2018Pages 342–346https://doi.org/10.1145/3242969.3242982

Published:02 October 2018Publication History

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Pages 342–346

ABSTRACT

The recent availability of lightweight, wearable cameras allows for collecting video data from a "first-person' perspective, capturing the visual world of the wearer in everyday interactive contexts. In this paper, we investigate how to exploit egocentric vision to infer multimodal behaviors from people wearing head-mounted cameras. More specifically, we estimate head (camera) motion from egocentric video, which can be further used to infer non-verbal behaviors such as head turns and nodding in multimodal interactions. We propose several approaches based on Convolutional Neural Networks (CNNs) that combine raw images and optical flow fields to learn to distinguish regions with optical flow caused by global ego-motion from those caused by other motion in a scene. Our results suggest that CNNs do not directly learn useful visual features with end-to-end training from raw images alone; instead, a better approach is to first extract optical flow explicitly and then train CNNs to integrate optical flow and visual information.

References

Maedeh Aghaei. 2017. Social signal extraction from egocentric photo-streams. In International Conference on Multimodal Interaction (ICMI). Google ScholarDigital Library
Sven Bambach, David J. Crandall, and Chen Yu. 2015. Viewpoint integration for hand-based recognition of social interactions from a first-person view. In International Conference on Multimodal Interaction (ICMI). Google ScholarDigital Library
Sven Bambach, John Franchak, David Crandall, and Chen Yu. 2014. Detecting hands in children's egocentric views to understand embodied attention during social interaction. In Annual Meeting of the Cognitive Science Society (CogSci).Google Scholar
Sven Bambach, Linda B. Smith, David J. Crandall, and Chen Yu. 2016. Objects in the center: How the infant's body constrains infant scenes. In International Conference on Development and Learning and Epigenetic Robotics (ICDL).Google ScholarCross Ref
Alejandro Betancourt, Pietro Morerio, Carlo S. Regazzoni, and Matthias Rauterberg. 2015. The evolution of first person vision methods: A survey. IEEE Transactions on Circuits and Systems for Video Technology 25, 5 (2015), 744--760.Google ScholarDigital Library
Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In International Conference on Computer Vision (ICCV). Google ScholarDigital Library
Martin A Fischler and Robert C Bolles. 1987. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. In Readings in computer vision. Elsevier, 726--740. Google ScholarDigital Library
Jeffrey M. Girard. 2014. Perceptions of interpersonal behavior are influenced by gender, facial expression intensity, and head pose. In International Conference on Multimodal Interaction (ICMI). Google ScholarDigital Library
Google. {n. d.}. ARCore Overview. https://developers.google.com/ar/discover/. Accessed: 2018-05-01.Google Scholar
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning (ICML). Google ScholarDigital Library
Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).Google Scholar
Kris M. Kitani, Takahiro Okabe, Yoichi Sato, and Akihiro Sugimoto. 2011. Fast unsupervised ego-action learning for first-person sports videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. Google ScholarDigital Library
Yin Li, Alireza Fathi, and James M Rehg. 2013. Learning to predict gaze in egocentric video. In International Conference on Computer Vision (ICCV). Google ScholarDigital Library
Yin Li, Zhefan Ye, and James M Rehg. 2015. Delving into egocentric actions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Minghuang Ma, Haoqi Fan, and Kris M Kitani. 2016. Going deeper into first-person activity recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. 2015. ORBSLAM: a versatile and accurate monocular SLAM system. IEEE Transactions on Robotics 31, 5 (2015), 1147--1163.Google ScholarDigital Library
Erik Murphy-Chutorian and Mohan Manubhai Trivedi. 2009. Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 4 (2009), 607--626. Google ScholarDigital Library
Fumio Nihei, Yukiko I Nakano, Yuki Hayashi, Hung-Hsuan Hung, and Shogo Okada. 2014. Predicting influential statements in group discussions using speech and head motion information. In International Conference on Multimodal Interaction (ICMI). Google ScholarDigital Library
Jeff Pelz, Mary Hayhoe, and Russ Loeber. 2001. The coordination of eye, head, and hand movements in a natural task. Experimental Brain Research 139, 3 (2001), 266--277.Google ScholarCross Ref
Yair Poleg, Ariel Ephrat, Shmuel Peleg, and Chetan Arora. 2016. Compact CNN for Indexing Egocentric Videos. In IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE.Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems (NIPS). Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR).Google Scholar
Oleg Špakov, Poika Isokoski, Jari Kangas, Jussi Rantala, Deepak Akkil, and Roope Raisamo. 2016. Comparison of three implementations of HeadTurn: a multimodal interaction technique with gaze and head turns. In International Conference on Multimodal Interaction (ICMI). Google ScholarDigital Library
Ramanathan Subramanian, Yan Yan, Jacopo Staiano, Oswald Lanz, and Nicu Sebe. 2013. On the relationship between head pose, social attention and personality prediction for unstructured and dynamic group interactions. In International conference on Multimodal Interaction (ICMI). Google ScholarDigital Library
Deqing Sun, Stefan Roth, and Michael Black. 2010. Secrets of optical flow estimation and their principles. In Conference Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, and Yoichi Sato. 2018. Future Person Localization in First-Person Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Akiko Yamazaki, Keiichi Yamazaki, Takaya Ohyama, Yoshinori Kobayashi, and Yoshinori Kuno. 2012. A techno-sociological solution for designing a museum guide robot: regarding choosing an appropriate visitor. In Human-Robot Interaction (HRI), 2012 7th ACM/IEEE International Conference on. IEEE, 309--316. Google ScholarDigital Library
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV).Google Scholar

Index Terms

Estimating Head Motion from Egocentric Vision
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Left/right hand segmentation in egocentric videos
Abstract
Wearable cameras allow people to record their daily activities from a user-centered (First Person Vision) perspective. Due to their favorable location, wearable cameras frequently capture the hands of the user, and may thus represent a ...
Read More
Fixation detection for head-mounted eye tracking based on visual similarity of gaze targets
ETRA '18: Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications

Fixations are widely analysed in human vision, gaze-based interaction, and experimental psychology research. However, robust fixation detection in mobile settings is profoundly challenging given the prevalence of user and gaze target motion. These ...
Read More
Head or gaze?: controlling remote camera for hands-busy tasks in teleoperation: a comparison
OZCHI '10: Proceedings of the 22nd Conference of the Computer-Human Interaction Special Interest Group of Australia on Computer-Human Interaction

Head motion and eye gaze are general models of natural human interaction. Recent computer vision based head tracking and eye tracking technologies have expanded the possibilities of designing and developing more natural and intuitive user interfaces for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction
October 2018
687 pages
ISBN:9781450356923
DOI:10.1145/3242969
General Chairs:
Sidney K. D'Mello
University of Illinois, USA
,
Panayiotis (Panos) Georgiou
University of Southern California, USA
,
Stefan Scherer
University of Southern California, USA
,
Program Chairs:
Emily Mower Provost
University of Michigan, USA
,
Mohammad Soleymani
University of Southern California, USA
,
Marcelo Worsley
Northwestern University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
egocentric vision
head motion
Qualifiers
- research-article
Conference

Acceptance Rates
ICMI '18 Paper Acceptance Rate63of149submissions,42%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 250
  Total Downloads
- Downloads (Last 12 months)49
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Estimating Head Motion from Egocentric Vision

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Left/right hand segmentation in egocentric videos

Fixation detection for head-mounted eye tracking based on visual similarity of gaze targets

Head or gaze?: controlling remote camera for hands-busy tasks in teleoperation: a comparison

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Estimating Head Motion from Egocentric Vision

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Left/right hand segmentation in egocentric videos

Fixation detection for head-mounted eye tracking based on visual similarity of gaze targets

Head or gaze?: controlling remote camera for hands-busy tasks in teleoperation: a comparison

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media