skip to main content
research-article

Egocentric Hand Detection Via Dynamic Region Growing

Authors Info & Claims
Published:13 December 2017Publication History
Skip Abstract Section

Abstract

Egocentric videos, which mainly record the activities carried out by the users of wearable cameras, have drawn much research attention in recent years. Due to its lengthy content, a large number of ego-related applications have been developed to abstract the captured videos. As the users are accustomed to interacting with the target objects using their own hands, while their hands usually appear within their visual fields during the interaction, an egocentric hand detection step is involved in tasks like gesture recognition, action recognition, and social interaction understanding. In this work, we propose a dynamic region-growing approach for hand region detection in egocentric videos, by jointly considering hand-related motion and egocentric cues. We first determine seed regions that most likely belong to the hand, by analyzing the motion patterns across successive frames. The hand regions can then be located by extending from the seed regions, according to the scores computed for the adjacent superpixels. These scores are derived from four egocentric cues: contrast, location, position consistency, and appearance continuity. We discuss how to apply the proposed method in real-life scenarios, where multiple hands irregularly appear and disappear from the videos. Experimental results on public datasets show that the proposed method achieves superior performance compared with the state-of-the-art methods, especially in complicated scenarios.

References

  1. Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Susstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. TPAMI 34, 11 (2012), 2274--2282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sven Bambach. 2015. A survey on recent advances of computer vision algorithms for egocentric video. arXiv:1501.02825 (2015).Google ScholarGoogle Scholar
  3. Sven Bambach, Stefan Lee, David J. Crandall, and Chen Yu. 2015. Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. In Proceedings of the ICCV. 1949--1957. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lorenzo Baraldi, Francesco Paci, Giuseppe Serra, Luca Benini, and Rita Cucchiara. 2014. Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In Proceedings of the CVPR Workshops. 688--693. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. SURF: Speeded up robust features. In Proceedings of the ECCV. 404--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alejandro Betancourt, Miriam Lopez, Carlo Regazzoni, and Matthias Rauterberg. 2014. A sequential classifier for hand detection in the framework of egocentric vision. In Proceedings of the CVPR Workshops. 586--591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Alejandro Betancourt, Pietro Morerio, Carlo S. Regazzoni, and Matthias Rauterberg. 2015. The evolution of first person vision methods: A survey. TCSVT 25, 5 (2015), 744--760.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hakan Cevikalp, Bill Triggs, and Vojtech Franc. 2013. Face and landmark detection by using cascade of classifiers. In Automatic Face and Gesture Recognition. 1--7.Google ScholarGoogle Scholar
  9. Ana Garcia del Molino, Cheston Tan, Joo-Hwee Lim, and Ah-Hwee Tan. 2017. Summarization of egocentric videos: A comprehensive survey. IEEE Trans. Hum.-Mach. Syst. 47, 1 (2017), 65--76.Google ScholarGoogle Scholar
  10. Xiaoming Deng, Ye Yuan, Yinda Zhang, Ping Tan, Liang Chang, Shuo Yang, and Hongan Wang. 2016. Joint hand detection and rotation estimation by using CNN. arXiv:1612.02742 (2016).Google ScholarGoogle Scholar
  11. Sylvia M. Dominguez, Trish Keaton, and Ali H. Sayed. 2006. A robust finger tracking method for multimodal wearable computer interfacing. IEEE Trans. Multimedia 8, 5 (2006), 956--972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alireza Fathi, Yin Li, and James M. Rehg. 2012. Learning to recognize daily actions using gaze. In Proceedings of the ECCV. 314--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alireza Fathi, Xiaofeng Ren, and James M. Rehg. 2011. Learning to recognize objects in egocentric activities. In Proceedings of the CVPR. 3281--3288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (1981), 381--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Serkan Genç, Muhammet Baştan, Uğur Güdükbay, Volkan Atalay, and Özgür Ulusoy. 2015. HandVR: A hand-gesture-based interface to a video retrieval system. Signal Image Video Process. 9, 7 (2015), 1717--1726.Google ScholarGoogle ScholarCross RefCross Ref
  16. Joydeep Ghosh, Yong Jae Lee, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization. In Proceedings of the CVPR. 1346--1353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Eric Hayman and Jan-Olof Eklundh. 2003. Statistical background subtraction for a mobile observer. In Proceedings of the ICCV. 67--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yedid Hoshen and Shmuel Peleg. 2016. An egocentric look at video photographer identity. In Proceedings of the CVPR. 4284--4292.Google ScholarGoogle ScholarCross RefCross Ref
  19. Michael J. Jones and James M. Rehg. 2002. Statistical color models with application to skin detection. IJCV 46, 1 (2002), 81--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mathias Kolsch and Matthew Turk. 2004. Fast 2D hand tracking with flocks of features and multi-cue integration. In Proceedings of the CVPR Workshops. 158--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Shiro Kumano, Kazuhiro Otsuka, Ryo Ishii, and Junji Yamato. 2017. Collective first-person vision for automatic gaze analysis in multiparty cnversations. IEEE Trans. Multimedia 19, 1 (2017), 107--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jayant Kumar, Qun Li, Survi Kyal, Edgar A. Bernal, and Raja Bala. 2015. On-the-fly hand detection training with application in egocentric action recognition. In Proceedings of the CVPR Workshops. 18--27.Google ScholarGoogle ScholarCross RefCross Ref
  23. Stefan Lee, Sven Bambach, David Crandall, John Franchak, and Chen Yu. 2014. This hand is my hand: A probabilistic approach to hand disambiguation in egocentric video. In Proceedings of the CVPR Workshops. 543--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Cheng Li and Kris Kitani. 2013. Model recommendation with virtual probes for egocentric hand detection. In Proceedings of the ICCV. 2624--2631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Cheng Li and Kris Kitani. 2013. Pixel-level hand detection in ego-centric videos. In Proceedings of the CVPR. 3570--3577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yin Li, Alireza Fathi, and James Rehg. 2013. Learning to predict gaze in egocentric video. In Proceedings of the ICCV. 3216--3223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yin Li, Zhefan Ye, and James M. Rehg. 2015. Delving into egocentric actions. In Proceedings of the CVPR. 287--295.Google ScholarGoogle Scholar
  28. Hui Liang, Junsong Yuan, Daniel Thalmann, and Nadia Magnenat Thalmann. 2015. AR in hand: Egocentric palm pose tracking and gesture recognition for augmented reality applications. In ACM Multimedia. 743--744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. IJCV 60, 2 (2004), 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zheng Lu and Kristen Grauman. 2013. Story-driven summarization for egocentric video. In Proceedings of the CVPR. 2714--2721. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Bruce D. Lucas, Takeo Kanade, et al. 1981. An iterative image registration technique with an application to stereo vision. In Proceedings of the IJCAI, Vol. 81. 674--679. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. In Proceedings of the CVPR. 2847--2854. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Akshay Rangesh, Eshed Ohn-Bar, Mohan M. Trivedi, et al. 2016. Driver hand localization and grasp analysis: A vision-based real-time approach. In Proceedings of the ITSC. 2545--2550.Google ScholarGoogle Scholar
  34. Xiaofeng Ren and Chunhui Gu. 2010. Figure-ground segmentation improves handled object recognition in egocentric video. In Proceedings of the CVPR. 3137--3144.Google ScholarGoogle ScholarCross RefCross Ref
  35. Grégory Rogez, James S. Supancic, and Deva Ramanan. 2015. First-person pose recognition using egocentric workspaces. In Proceedings of the CVPR. 4325--4333.Google ScholarGoogle ScholarCross RefCross Ref
  36. Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the ICCV. 2564--2571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yaser Sheikh, Omar Javed, and Takeo Kanade. 2009. Background subtraction for freely moving cameras. In Proceedings of the ICCV. 1219--1225.Google ScholarGoogle ScholarCross RefCross Ref
  38. Jasper RR Uijlings, Koen EA van de Sande, Theo Gevers, and Arnold WM Smeulders. 2013. Selective search for object recognition. IJCV 104, 2 (2013), 154--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jing Wang, Yu Cheng, and Rogerio Schmidt Feris. 2016. Walk and learn: Facial attribute representation learning from egocentric video and contextual data. In Proceedings of the CVPR. 2295--2304.Google ScholarGoogle ScholarCross RefCross Ref
  40. Bo Xiong and Kristen Grauman. 2014. Detecting snap points in egocentric video with a web photo prior. In Proceedings of the ECCV. 282--298.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Egocentric Hand Detection Via Dynamic Region Growing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 14, Issue 1
        February 2018
        287 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3173554
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 December 2017
        • Accepted: 1 October 2017
        • Revised: 1 August 2017
        • Received: 1 April 2017
        Published in tomm Volume 14, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader