Abstract
Egocentric videos, which mainly record the activities carried out by the users of wearable cameras, have drawn much research attention in recent years. Due to its lengthy content, a large number of ego-related applications have been developed to abstract the captured videos. As the users are accustomed to interacting with the target objects using their own hands, while their hands usually appear within their visual fields during the interaction, an egocentric hand detection step is involved in tasks like gesture recognition, action recognition, and social interaction understanding. In this work, we propose a dynamic region-growing approach for hand region detection in egocentric videos, by jointly considering hand-related motion and egocentric cues. We first determine seed regions that most likely belong to the hand, by analyzing the motion patterns across successive frames. The hand regions can then be located by extending from the seed regions, according to the scores computed for the adjacent superpixels. These scores are derived from four egocentric cues: contrast, location, position consistency, and appearance continuity. We discuss how to apply the proposed method in real-life scenarios, where multiple hands irregularly appear and disappear from the videos. Experimental results on public datasets show that the proposed method achieves superior performance compared with the state-of-the-art methods, especially in complicated scenarios.
- Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Susstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. TPAMI 34, 11 (2012), 2274--2282. Google ScholarDigital Library
- Sven Bambach. 2015. A survey on recent advances of computer vision algorithms for egocentric video. arXiv:1501.02825 (2015).Google Scholar
- Sven Bambach, Stefan Lee, David J. Crandall, and Chen Yu. 2015. Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. In Proceedings of the ICCV. 1949--1957. Google ScholarDigital Library
- Lorenzo Baraldi, Francesco Paci, Giuseppe Serra, Luca Benini, and Rita Cucchiara. 2014. Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In Proceedings of the CVPR Workshops. 688--693. Google ScholarDigital Library
- Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. SURF: Speeded up robust features. In Proceedings of the ECCV. 404--417. Google ScholarDigital Library
- Alejandro Betancourt, Miriam Lopez, Carlo Regazzoni, and Matthias Rauterberg. 2014. A sequential classifier for hand detection in the framework of egocentric vision. In Proceedings of the CVPR Workshops. 586--591. Google ScholarDigital Library
- Alejandro Betancourt, Pietro Morerio, Carlo S. Regazzoni, and Matthias Rauterberg. 2015. The evolution of first person vision methods: A survey. TCSVT 25, 5 (2015), 744--760.Google ScholarDigital Library
- Hakan Cevikalp, Bill Triggs, and Vojtech Franc. 2013. Face and landmark detection by using cascade of classifiers. In Automatic Face and Gesture Recognition. 1--7.Google Scholar
- Ana Garcia del Molino, Cheston Tan, Joo-Hwee Lim, and Ah-Hwee Tan. 2017. Summarization of egocentric videos: A comprehensive survey. IEEE Trans. Hum.-Mach. Syst. 47, 1 (2017), 65--76.Google Scholar
- Xiaoming Deng, Ye Yuan, Yinda Zhang, Ping Tan, Liang Chang, Shuo Yang, and Hongan Wang. 2016. Joint hand detection and rotation estimation by using CNN. arXiv:1612.02742 (2016).Google Scholar
- Sylvia M. Dominguez, Trish Keaton, and Ali H. Sayed. 2006. A robust finger tracking method for multimodal wearable computer interfacing. IEEE Trans. Multimedia 8, 5 (2006), 956--972. Google ScholarDigital Library
- Alireza Fathi, Yin Li, and James M. Rehg. 2012. Learning to recognize daily actions using gaze. In Proceedings of the ECCV. 314--327. Google ScholarDigital Library
- Alireza Fathi, Xiaofeng Ren, and James M. Rehg. 2011. Learning to recognize objects in egocentric activities. In Proceedings of the CVPR. 3281--3288. Google ScholarDigital Library
- Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (1981), 381--395. Google ScholarDigital Library
- Serkan Genç, Muhammet Baştan, Uğur Güdükbay, Volkan Atalay, and Özgür Ulusoy. 2015. HandVR: A hand-gesture-based interface to a video retrieval system. Signal Image Video Process. 9, 7 (2015), 1717--1726.Google ScholarCross Ref
- Joydeep Ghosh, Yong Jae Lee, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization. In Proceedings of the CVPR. 1346--1353. Google ScholarDigital Library
- Eric Hayman and Jan-Olof Eklundh. 2003. Statistical background subtraction for a mobile observer. In Proceedings of the ICCV. 67--74. Google ScholarDigital Library
- Yedid Hoshen and Shmuel Peleg. 2016. An egocentric look at video photographer identity. In Proceedings of the CVPR. 4284--4292.Google ScholarCross Ref
- Michael J. Jones and James M. Rehg. 2002. Statistical color models with application to skin detection. IJCV 46, 1 (2002), 81--96. Google ScholarDigital Library
- Mathias Kolsch and Matthew Turk. 2004. Fast 2D hand tracking with flocks of features and multi-cue integration. In Proceedings of the CVPR Workshops. 158--158. Google ScholarDigital Library
- Shiro Kumano, Kazuhiro Otsuka, Ryo Ishii, and Junji Yamato. 2017. Collective first-person vision for automatic gaze analysis in multiparty cnversations. IEEE Trans. Multimedia 19, 1 (2017), 107--122. Google ScholarDigital Library
- Jayant Kumar, Qun Li, Survi Kyal, Edgar A. Bernal, and Raja Bala. 2015. On-the-fly hand detection training with application in egocentric action recognition. In Proceedings of the CVPR Workshops. 18--27.Google ScholarCross Ref
- Stefan Lee, Sven Bambach, David Crandall, John Franchak, and Chen Yu. 2014. This hand is my hand: A probabilistic approach to hand disambiguation in egocentric video. In Proceedings of the CVPR Workshops. 543--550. Google ScholarDigital Library
- Cheng Li and Kris Kitani. 2013. Model recommendation with virtual probes for egocentric hand detection. In Proceedings of the ICCV. 2624--2631. Google ScholarDigital Library
- Cheng Li and Kris Kitani. 2013. Pixel-level hand detection in ego-centric videos. In Proceedings of the CVPR. 3570--3577. Google ScholarDigital Library
- Yin Li, Alireza Fathi, and James Rehg. 2013. Learning to predict gaze in egocentric video. In Proceedings of the ICCV. 3216--3223. Google ScholarDigital Library
- Yin Li, Zhefan Ye, and James M. Rehg. 2015. Delving into egocentric actions. In Proceedings of the CVPR. 287--295.Google Scholar
- Hui Liang, Junsong Yuan, Daniel Thalmann, and Nadia Magnenat Thalmann. 2015. AR in hand: Egocentric palm pose tracking and gesture recognition for augmented reality applications. In ACM Multimedia. 743--744. Google ScholarDigital Library
- David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. IJCV 60, 2 (2004), 91--110. Google ScholarDigital Library
- Zheng Lu and Kristen Grauman. 2013. Story-driven summarization for egocentric video. In Proceedings of the CVPR. 2714--2721. Google ScholarDigital Library
- Bruce D. Lucas, Takeo Kanade, et al. 1981. An iterative image registration technique with an application to stereo vision. In Proceedings of the IJCAI, Vol. 81. 674--679. Google ScholarDigital Library
- Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. In Proceedings of the CVPR. 2847--2854. Google ScholarDigital Library
- Akshay Rangesh, Eshed Ohn-Bar, Mohan M. Trivedi, et al. 2016. Driver hand localization and grasp analysis: A vision-based real-time approach. In Proceedings of the ITSC. 2545--2550.Google Scholar
- Xiaofeng Ren and Chunhui Gu. 2010. Figure-ground segmentation improves handled object recognition in egocentric video. In Proceedings of the CVPR. 3137--3144.Google ScholarCross Ref
- Grégory Rogez, James S. Supancic, and Deva Ramanan. 2015. First-person pose recognition using egocentric workspaces. In Proceedings of the CVPR. 4325--4333.Google ScholarCross Ref
- Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the ICCV. 2564--2571. Google ScholarDigital Library
- Yaser Sheikh, Omar Javed, and Takeo Kanade. 2009. Background subtraction for freely moving cameras. In Proceedings of the ICCV. 1219--1225.Google ScholarCross Ref
- Jasper RR Uijlings, Koen EA van de Sande, Theo Gevers, and Arnold WM Smeulders. 2013. Selective search for object recognition. IJCV 104, 2 (2013), 154--171. Google ScholarDigital Library
- Jing Wang, Yu Cheng, and Rogerio Schmidt Feris. 2016. Walk and learn: Facial attribute representation learning from egocentric video and contextual data. In Proceedings of the CVPR. 2295--2304.Google ScholarCross Ref
- Bo Xiong and Kristen Grauman. 2014. Detecting snap points in egocentric video with a web photo prior. In Proceedings of the ECCV. 282--298.Google ScholarCross Ref
Index Terms
- Egocentric Hand Detection Via Dynamic Region Growing
Recommendations
Recognizing Camera Wearer from Hand Gestures in Egocentric Videos: https://egocentricbiometric.github.io/
MM '20: Proceedings of the 28th ACM International Conference on MultimediaWearable egocentric cameras are typically harnessed to a wearer's head, giving them the unique advantage of capturing their points of view. Hoshen and Peleg have shown that egocentric cameras indirectly capture the wearer's gait, which can be used to ...
Robust hand detection for augmented reality interface
VRCAI '09: Proceedings of the 8th International Conference on Virtual Reality Continuum and its Applications in IndustryFor interactive augmented reality, vision-based and hand-gesture-based interface are most desirable due to being natural and human-friendly. However, detecting hands and recognizing hand gestures in cluttered background are still challenging. Especially,...
Unified learning approach for egocentric hand gesture recognition and fingertip detection
Highlights- Unified approach to recognize egocentric hand gesture and detect fingertips.
- ...
AbstractHead-mounted device-based human-computer interaction often requires egocentric recognition of hand gestures and fingertips detection. In this paper, a unified approach of egocentric hand gesture recognition and fingertip detection is ...
Comments