skip to main content
10.1145/2660114.2660124acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Users Tagging Visual Moments: Timed Tags in Social Video

Authors Info & Claims
Published:07 November 2014Publication History

ABSTRACT

A timed tag is a tag that a user has assigned to a specific time point in a video. Although timed tags are supported by an increasing number of social video platforms on the Internet, multimedia research remains focused on conventional tags, here called "timeless tags", which users assign to the video as a whole, rather than to a specific moment. This paper presents a video data set consisting of social videos and user-contributed timed tags. A large crowdsourcing experiment was used to annotate this data set. The annotations allow us to better understand the phenomenon of timed tagging. We describe the design of the crowdsourcing experiment, and how it was executed. Then we present results of our analysis, which reveal the properties of timed tags, and their differences from timeless tags. The results suggest that the two differ with respect to what the user is attempting to express about the video. We close with an outlook that lays the groundwork for further study of timed tags in social video within the research community.

References

  1. K. Ali, D. Hasler, and F. Fleuret. Flowboost: Appearance learning from sparsely annotated video. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR '11, pages 1433--1440, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Budanitsky and G. Hirst. Evaluating WordNet-based measures of lexical semantic relatedness. Comput. Linguist., 32(1):13--47, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. X. Che, H. Yang, and C. Meinel. Lecture video segmentation by automatically analyzing the synchronized slides. In Proceedings of the 21st ACM International Conference on Multimedia, MM '13, pages 345--348, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Chorianopoulos, I. Leftheriotis, and C. Gkonela. Socialskip: Pragmatic understanding within web video. In Proceedings of the 9th International Interactive Conference on Interactive Television, EuroITV '11, pages 25--28, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U. Gargi and J. Yagnik. Solving the label resolution problem in supervised video content classification. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, MIR '08, pages 276--282, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Hanjalic, C. Kofler, and M. Larson. Intent and its discontents: The user at the wheel of the online video search engine. In Proceedings of the 20th ACM International Conference on Multimedia, MM '12, pages 1239--1248, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, pages 29:1--29:8, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Larson, M. Melenhorst, M. Menéndez, and P. Xu. Using crowdsourcing to capture complexity in human interpretations of multimedia content. In B. Ionescu, J. Benois-Pineau, T. Piatrik, and G. Quénot, editors, Fusion in Computer Vision, Advances in Computer Vision and Pattern Recognition, pages 229--269. Springer, 2014.Google ScholarGoogle Scholar
  9. M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, and G. J. F. Jones. Automatic tagging and geotagging in video collections and communities. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, pages 51:1--51:8, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Leacock and M. Chodorow. Combining Local Context and WordNet Similarity for Word Sense Identification, chapter 11, pages 265--283. The MIT Press, 1998.Google ScholarGoogle Scholar
  11. G. Li, M. Wang, Y.-T. Zheng, H. Li, Z.-J. Zha, and T.-S. Chua. Shottagger: Tag location for internet videos. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, pages 37:1--37:8, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L.-J. Li, H. Su, E. P. Xing, and L. Fei-Fei. High-level image representation for scene classification and semantic feature sparsification. In Proceedings of the Neural Information Processing Systems, NIPS '10, pages 1378--1386, 2010.Google ScholarGoogle Scholar
  13. B. Loni, L. Y. Cheung, M. Riegler, A. Bozzon, L. Gottlieb, and M. Larson. Fashion 10000: An enriched social image dataset for fashion and clothing. In Proceedings of the 5th ACM Multimedia Systems Conference, MMSys '14, pages 41--46, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, NIPS '01, pages 849--856, 2001.Google ScholarGoogle Scholar
  15. P. Over, G. Awad, J. Fiscus, B. Antonishek, M. Michel, A. F. Smeaton, W. Kraaij, G. Quénot, et al. TRECVID 2011--an overview of the goals, tasks, data, evaluation mechanisms and metrics. In TRECVID 2011--TREC Video Retrieval Evaluation Online, 2011.Google ScholarGoogle Scholar
  16. T. Pedersen, S. Patwardhan, and J. Michelizzi. WordNet::Similarity: Measuring the relatedness of concepts. In Demonstration Papers at HLT-NAACL 2004, HLT-NAACL--Demonstrations '04, pages 38--41, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vision, 77(1--3):157--173, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Ulges, C. Schulze, M. Koch, and T. M. Breuel. Learning automatic concept detectors from online video. Comput. Vis. Image Underst., 114(4):429--438, Apr. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Vliegendhart, B. Loni, M. Larson, and A. Hanjalic. How do we deep-link?: Leveraging user-contributed time-links for non-linear video access. In Proceedings of the 21st ACM International Conference on Multimedia, MM '13, pages 517--520, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Vondrick, D. Patterson, and D. Ramanan. Efficiently scaling up crowdsourced video annotation. Int. J. Comput. Vision, 101(1):184--204, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Users Tagging Visual Moments: Timed Tags in Social Video

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CrowdMM '14: Proceedings of the 2014 International ACM Workshop on Crowdsourcing for Multimedia
      November 2014
      84 pages
      ISBN:9781450331289
      DOI:10.1145/2660114
      • General Chairs:
      • Judith Redi,
      • Mathias Lux

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 November 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CrowdMM '14 Paper Acceptance Rate8of26submissions,31%Overall Acceptance Rate16of42submissions,38%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader