skip to main content
10.1145/1631272.1631342acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Localizing volumetric motion for action recognition in realistic videos

Authors Info & Claims
Published:19 October 2009Publication History

ABSTRACT

This paper presents a novel motion localization approach for recognizing actions and events in real videos. Examples include StandUp and Kiss in Hollywood movies. The challenge can be attributed to the large visual and motion variations imposed by realistic action poses. Previous works mainly focus on learning from descriptors of cuboids around space time interest points (STIP) to characterize actions. The size, shape and space-time position of cuboids are fixed without considering the underlying motion dynamics. This often results in large set of fragmentized cuboids which fail to capture long-term dynamic properties of realistic actions. This paper proposes the detection of spatio-temporal motion volumes (namely Volume of Interest, VOI) of scale and position adaptive to localize actions. First, motions are described as bags of point trajectories by tracking keypoints along the time dimension. VOIs are then adaptively extracted by clustering trajectory on the motion mainfold. The resulting VOIs, of varying scales and centering at arbitrary positions depending on motion dynamics, are eventually described by SIFT and 3D gradient features for action recognition. Comparing with fixed-size cuboids, VOI allows comprehensive modeling of long-term motion and shows better capability in capturing contextual information associated with motion dynamics. Experiments on a realistic Hollywood movie dataset show that the proposed approach can achieve 20\% relative improvement compared to the state-of-the-art STIP based algorithm.

References

  1. J. Sun, X. Wu, SC. Yan, LF. Cheong, TS. Chua and J. Li. Hierarchical spatio-temporal context modeling for action recognition. CVPR, 2009.Google ScholarGoogle Scholar
  2. F. Wang, Y. Jiang, and C. Ngo. Video event detection using motion relativity and visual relatedness. ACM Multimedia, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Liu, J. Luo, et al. Recognizing realistic actions from videos in the Wild. CVPR, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  4. B. Morris, et al. A survey of vision-based trajectory learning and analysis for surveillance. TCSVT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Laptev, M. Marsza lek, C. Schmid, et al. Learning realistic human actions from movies. CVPR, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  6. D. Batra, T. Chen and R. Sukthankar. Space-Time shapelets for action recognition. IEEE WMVC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Gorelick, M. Blank, E. Shechtman, et al. Actions as space-time shapes. TPAMI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Tron, et al. A benchmark for the comparison of 3D motion segmentation algorithms. CVPR, 2008.Google ScholarGoogle Scholar
  9. Y. Cheng, et al. Mean shift, mode seeking, and clustering. TPAMI, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Dollar, V. Rabaud, et al. Behavior recognition via sparse spatio-temporal features. In VS-PETS, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  11. X. Wu, Y. Zhang, Y. Wu, J. Guo and J. Li. Invariant visual patterns for video copy detection. ICPR, 2008.Google ScholarGoogle Scholar
  12. OpenCV: sourceforge.net/projects/opencvlibrary.Google ScholarGoogle Scholar

Index Terms

  1. Localizing volumetric motion for action recognition in realistic videos

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '09: Proceedings of the 17th ACM international conference on Multimedia
      October 2009
      1202 pages
      ISBN:9781605586083
      DOI:10.1145/1631272

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 October 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader