research-article

Egocentric Hand Detection Via Dynamic Region Growing

Authors:
Shao Huang

University of Chinese Academy of Sciences, City University of Hong Kong, Beijing, China

University of Chinese Academy of Sciences, City University of Hong Kong, Beijing, China

0000-0001-5043-1448
View Profile

,
Weiqiang Wang

University of Chinese Academy of Sciences, Beijing, China

University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Shengfeng He

South China University of Technology, Guangzhou, China

South China University of Technology, Guangzhou, China
View Profile

,
Rynson W. H. Lau

City University of Hong Kong, Kowloon, Hong Kong SAR

City University of Hong Kong, Kowloon, Hong Kong SAR
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 14 Issue 1Article No.: 10pp 1–17https://doi.org/10.1145/3152129

Published:13 December 2017Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Egocentric videos, which mainly record the activities carried out by the users of wearable cameras, have drawn much research attention in recent years. Due to its lengthy content, a large number of ego-related applications have been developed to abstract the captured videos. As the users are accustomed to interacting with the target objects using their own hands, while their hands usually appear within their visual fields during the interaction, an egocentric hand detection step is involved in tasks like gesture recognition, action recognition, and social interaction understanding. In this work, we propose a dynamic region-growing approach for hand region detection in egocentric videos, by jointly considering hand-related motion and egocentric cues. We first determine seed regions that most likely belong to the hand, by analyzing the motion patterns across successive frames. The hand regions can then be located by extending from the seed regions, according to the scores computed for the adjacent superpixels. These scores are derived from four egocentric cues: contrast, location, position consistency, and appearance continuity. We discuss how to apply the proposed method in real-life scenarios, where multiple hands irregularly appear and disappear from the videos. Experimental results on public datasets show that the proposed method achieves superior performance compared with the state-of-the-art methods, especially in complicated scenarios.

References

Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Susstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. TPAMI 34, 11 (2012), 2274--2282. Google ScholarDigital Library
Sven Bambach. 2015. A survey on recent advances of computer vision algorithms for egocentric video. arXiv:1501.02825 (2015).Google Scholar
Sven Bambach, Stefan Lee, David J. Crandall, and Chen Yu. 2015. Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. In Proceedings of the ICCV. 1949--1957. Google ScholarDigital Library
Lorenzo Baraldi, Francesco Paci, Giuseppe Serra, Luca Benini, and Rita Cucchiara. 2014. Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In Proceedings of the CVPR Workshops. 688--693. Google ScholarDigital Library
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. SURF: Speeded up robust features. In Proceedings of the ECCV. 404--417. Google ScholarDigital Library
Alejandro Betancourt, Miriam Lopez, Carlo Regazzoni, and Matthias Rauterberg. 2014. A sequential classifier for hand detection in the framework of egocentric vision. In Proceedings of the CVPR Workshops. 586--591. Google ScholarDigital Library
Alejandro Betancourt, Pietro Morerio, Carlo S. Regazzoni, and Matthias Rauterberg. 2015. The evolution of first person vision methods: A survey. TCSVT 25, 5 (2015), 744--760.Google ScholarDigital Library
Hakan Cevikalp, Bill Triggs, and Vojtech Franc. 2013. Face and landmark detection by using cascade of classifiers. In Automatic Face and Gesture Recognition. 1--7.Google Scholar
Ana Garcia del Molino, Cheston Tan, Joo-Hwee Lim, and Ah-Hwee Tan. 2017. Summarization of egocentric videos: A comprehensive survey. IEEE Trans. Hum.-Mach. Syst. 47, 1 (2017), 65--76.Google Scholar
Xiaoming Deng, Ye Yuan, Yinda Zhang, Ping Tan, Liang Chang, Shuo Yang, and Hongan Wang. 2016. Joint hand detection and rotation estimation by using CNN. arXiv:1612.02742 (2016).Google Scholar
Sylvia M. Dominguez, Trish Keaton, and Ali H. Sayed. 2006. A robust finger tracking method for multimodal wearable computer interfacing. IEEE Trans. Multimedia 8, 5 (2006), 956--972. Google ScholarDigital Library
Alireza Fathi, Yin Li, and James M. Rehg. 2012. Learning to recognize daily actions using gaze. In Proceedings of the ECCV. 314--327. Google ScholarDigital Library
Alireza Fathi, Xiaofeng Ren, and James M. Rehg. 2011. Learning to recognize objects in egocentric activities. In Proceedings of the CVPR. 3281--3288. Google ScholarDigital Library
Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (1981), 381--395. Google ScholarDigital Library
Serkan Genç, Muhammet Baştan, Uğur Güdükbay, Volkan Atalay, and Özgür Ulusoy. 2015. HandVR: A hand-gesture-based interface to a video retrieval system. Signal Image Video Process. 9, 7 (2015), 1717--1726.Google ScholarCross Ref
Joydeep Ghosh, Yong Jae Lee, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization. In Proceedings of the CVPR. 1346--1353. Google ScholarDigital Library
Eric Hayman and Jan-Olof Eklundh. 2003. Statistical background subtraction for a mobile observer. In Proceedings of the ICCV. 67--74. Google ScholarDigital Library
Yedid Hoshen and Shmuel Peleg. 2016. An egocentric look at video photographer identity. In Proceedings of the CVPR. 4284--4292.Google ScholarCross Ref
Michael J. Jones and James M. Rehg. 2002. Statistical color models with application to skin detection. IJCV 46, 1 (2002), 81--96. Google ScholarDigital Library
Mathias Kolsch and Matthew Turk. 2004. Fast 2D hand tracking with flocks of features and multi-cue integration. In Proceedings of the CVPR Workshops. 158--158. Google ScholarDigital Library
Shiro Kumano, Kazuhiro Otsuka, Ryo Ishii, and Junji Yamato. 2017. Collective first-person vision for automatic gaze analysis in multiparty cnversations. IEEE Trans. Multimedia 19, 1 (2017), 107--122. Google ScholarDigital Library
Jayant Kumar, Qun Li, Survi Kyal, Edgar A. Bernal, and Raja Bala. 2015. On-the-fly hand detection training with application in egocentric action recognition. In Proceedings of the CVPR Workshops. 18--27.Google ScholarCross Ref
Stefan Lee, Sven Bambach, David Crandall, John Franchak, and Chen Yu. 2014. This hand is my hand: A probabilistic approach to hand disambiguation in egocentric video. In Proceedings of the CVPR Workshops. 543--550. Google ScholarDigital Library
Cheng Li and Kris Kitani. 2013. Model recommendation with virtual probes for egocentric hand detection. In Proceedings of the ICCV. 2624--2631. Google ScholarDigital Library
Cheng Li and Kris Kitani. 2013. Pixel-level hand detection in ego-centric videos. In Proceedings of the CVPR. 3570--3577. Google ScholarDigital Library
Yin Li, Alireza Fathi, and James Rehg. 2013. Learning to predict gaze in egocentric video. In Proceedings of the ICCV. 3216--3223. Google ScholarDigital Library
Yin Li, Zhefan Ye, and James M. Rehg. 2015. Delving into egocentric actions. In Proceedings of the CVPR. 287--295.Google Scholar
Hui Liang, Junsong Yuan, Daniel Thalmann, and Nadia Magnenat Thalmann. 2015. AR in hand: Egocentric palm pose tracking and gesture recognition for augmented reality applications. In ACM Multimedia. 743--744. Google ScholarDigital Library
David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. IJCV 60, 2 (2004), 91--110. Google ScholarDigital Library
Zheng Lu and Kristen Grauman. 2013. Story-driven summarization for egocentric video. In Proceedings of the CVPR. 2714--2721. Google ScholarDigital Library
Bruce D. Lucas, Takeo Kanade, et al. 1981. An iterative image registration technique with an application to stereo vision. In Proceedings of the IJCAI, Vol. 81. 674--679. Google ScholarDigital Library
Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. In Proceedings of the CVPR. 2847--2854. Google ScholarDigital Library
Akshay Rangesh, Eshed Ohn-Bar, Mohan M. Trivedi, et al. 2016. Driver hand localization and grasp analysis: A vision-based real-time approach. In Proceedings of the ITSC. 2545--2550.Google Scholar
Xiaofeng Ren and Chunhui Gu. 2010. Figure-ground segmentation improves handled object recognition in egocentric video. In Proceedings of the CVPR. 3137--3144.Google ScholarCross Ref
Grégory Rogez, James S. Supancic, and Deva Ramanan. 2015. First-person pose recognition using egocentric workspaces. In Proceedings of the CVPR. 4325--4333.Google ScholarCross Ref
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the ICCV. 2564--2571. Google ScholarDigital Library
Yaser Sheikh, Omar Javed, and Takeo Kanade. 2009. Background subtraction for freely moving cameras. In Proceedings of the ICCV. 1219--1225.Google ScholarCross Ref
Jasper RR Uijlings, Koen EA van de Sande, Theo Gevers, and Arnold WM Smeulders. 2013. Selective search for object recognition. IJCV 104, 2 (2013), 154--171. Google ScholarDigital Library
Jing Wang, Yu Cheng, and Rogerio Schmidt Feris. 2016. Walk and learn: Facial attribute representation learning from egocentric video and contextual data. In Proceedings of the CVPR. 2295--2304.Google ScholarCross Ref
Bo Xiong and Kristen Grauman. 2014. Detecting snap points in egocentric video with a web photo prior. In Proceedings of the ECCV. 282--298.Google ScholarCross Ref

Index Terms

Egocentric Hand Detection Via Dynamic Region Growing
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking
      2. Computer vision tasks
        Scene understanding

Recommendations

Recognizing Camera Wearer from Hand Gestures in Egocentric Videos: https://egocentricbiometric.github.io/
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Wearable egocentric cameras are typically harnessed to a wearer's head, giving them the unique advantage of capturing their points of view. Hoshen and Peleg have shown that egocentric cameras indirectly capture the wearer's gait, which can be used to ...
Read More
Robust hand detection for augmented reality interface
VRCAI '09: Proceedings of the 8th International Conference on Virtual Reality Continuum and its Applications in Industry

For interactive augmented reality, vision-based and hand-gesture-based interface are most desirable due to being natural and human-friendly. However, detecting hands and recognizing hand gestures in cluttered background are still challenging. Especially,...
Read More
Unified learning approach for egocentric hand gesture recognition and fingertip detection
Highlights
- Unified approach to recognize egocentric hand gesture and detect fingertips.
- ...
Abstract
Head-mounted device-based human-computer interaction often requires egocentric recognition of hand gestures and fingertips detection. In this paper, a unified approach of egocentric hand gesture recognition and fingertip detection is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 14, Issue 1
February 2018
287 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3173554
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 December 2017
- Accepted: 1 October 2017
- Revised: 1 August 2017
- Received: 1 April 2017
Published in tomm Volume 14, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Egocentric videos
egocentric hand detection
hand region growing
seed region generation
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 183
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Egocentric Hand Detection Via Dynamic Region Growing

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

Recognizing Camera Wearer from Hand Gestures in Egocentric Videos: https://egocentricbiometric.github.io/

Robust hand detection for augmented reality interface

Unified learning approach for egocentric hand gesture recognition and fingertip detection