ABSTRACT
A timed tag is a tag that a user has assigned to a specific time point in a video. Although timed tags are supported by an increasing number of social video platforms on the Internet, multimedia research remains focused on conventional tags, here called "timeless tags", which users assign to the video as a whole, rather than to a specific moment. This paper presents a video data set consisting of social videos and user-contributed timed tags. A large crowdsourcing experiment was used to annotate this data set. The annotations allow us to better understand the phenomenon of timed tagging. We describe the design of the crowdsourcing experiment, and how it was executed. Then we present results of our analysis, which reveal the properties of timed tags, and their differences from timeless tags. The results suggest that the two differ with respect to what the user is attempting to express about the video. We close with an outlook that lays the groundwork for further study of timed tags in social video within the research community.
- K. Ali, D. Hasler, and F. Fleuret. Flowboost: Appearance learning from sparsely annotated video. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR '11, pages 1433--1440, 2011. Google ScholarDigital Library
- A. Budanitsky and G. Hirst. Evaluating WordNet-based measures of lexical semantic relatedness. Comput. Linguist., 32(1):13--47, 2006. Google ScholarDigital Library
- X. Che, H. Yang, and C. Meinel. Lecture video segmentation by automatically analyzing the synchronized slides. In Proceedings of the 21st ACM International Conference on Multimedia, MM '13, pages 345--348, 2013. Google ScholarDigital Library
- K. Chorianopoulos, I. Leftheriotis, and C. Gkonela. Socialskip: Pragmatic understanding within web video. In Proceedings of the 9th International Interactive Conference on Interactive Television, EuroITV '11, pages 25--28, 2011. Google ScholarDigital Library
- U. Gargi and J. Yagnik. Solving the label resolution problem in supervised video content classification. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, MIR '08, pages 276--282, 2008. Google ScholarDigital Library
- A. Hanjalic, C. Kofler, and M. Larson. Intent and its discontents: The user at the wheel of the online video search engine. In Proceedings of the 20th ACM International Conference on Multimedia, MM '12, pages 1239--1248, 2012. Google ScholarDigital Library
- Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, pages 29:1--29:8, 2011. Google ScholarDigital Library
- M. Larson, M. Melenhorst, M. Menéndez, and P. Xu. Using crowdsourcing to capture complexity in human interpretations of multimedia content. In B. Ionescu, J. Benois-Pineau, T. Piatrik, and G. Quénot, editors, Fusion in Computer Vision, Advances in Computer Vision and Pattern Recognition, pages 229--269. Springer, 2014.Google Scholar
- M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, and G. J. F. Jones. Automatic tagging and geotagging in video collections and communities. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, pages 51:1--51:8, 2011. Google ScholarDigital Library
- C. Leacock and M. Chodorow. Combining Local Context and WordNet Similarity for Word Sense Identification, chapter 11, pages 265--283. The MIT Press, 1998.Google Scholar
- G. Li, M. Wang, Y.-T. Zheng, H. Li, Z.-J. Zha, and T.-S. Chua. Shottagger: Tag location for internet videos. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, pages 37:1--37:8, 2011. Google ScholarDigital Library
- L.-J. Li, H. Su, E. P. Xing, and L. Fei-Fei. High-level image representation for scene classification and semantic feature sparsification. In Proceedings of the Neural Information Processing Systems, NIPS '10, pages 1378--1386, 2010.Google Scholar
- B. Loni, L. Y. Cheung, M. Riegler, A. Bozzon, L. Gottlieb, and M. Larson. Fashion 10000: An enriched social image dataset for fashion and clothing. In Proceedings of the 5th ACM Multimedia Systems Conference, MMSys '14, pages 41--46, 2014. Google ScholarDigital Library
- A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, NIPS '01, pages 849--856, 2001.Google Scholar
- P. Over, G. Awad, J. Fiscus, B. Antonishek, M. Michel, A. F. Smeaton, W. Kraaij, G. Quénot, et al. TRECVID 2011--an overview of the goals, tasks, data, evaluation mechanisms and metrics. In TRECVID 2011--TREC Video Retrieval Evaluation Online, 2011.Google Scholar
- T. Pedersen, S. Patwardhan, and J. Michelizzi. WordNet::Similarity: Measuring the relatedness of concepts. In Demonstration Papers at HLT-NAACL 2004, HLT-NAACL--Demonstrations '04, pages 38--41, 2004. Google ScholarDigital Library
- B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vision, 77(1--3):157--173, May 2008. Google ScholarDigital Library
- A. Ulges, C. Schulze, M. Koch, and T. M. Breuel. Learning automatic concept detectors from online video. Comput. Vis. Image Underst., 114(4):429--438, Apr. 2010. Google ScholarDigital Library
- R. Vliegendhart, B. Loni, M. Larson, and A. Hanjalic. How do we deep-link?: Leveraging user-contributed time-links for non-linear video access. In Proceedings of the 21st ACM International Conference on Multimedia, MM '13, pages 517--520, 2013. Google ScholarDigital Library
- C. Vondrick, D. Patterson, and D. Ramanan. Efficiently scaling up crowdsourced video annotation. Int. J. Comput. Vision, 101(1):184--204, 2013. Google ScholarDigital Library
Index Terms
- Users Tagging Visual Moments: Timed Tags in Social Video
Recommendations
Tagging tagged images: on the impact of existing annotations on image tagging
CrowdMM '12: Proceedings of the ACM multimedia 2012 workshop on Crowdsourcing for multimediaCrowdsourcing has been widely used to generate metadata for multimedia resources. By presenting partially described resources to human annotators, resources are tagged yielding better descriptions. Although significant improvements in metadata quality ...
Crowdsourced semantics with semantic tagging: "Don't just tag it, LexiTag it!"
CrowdSem'13: Proceedings of the 1st International Conference on Crowdsourcing the Semantic Web - Volume 1030Free form tagging was one of the most useful contributions of "Web2.0" toward the problem of content management and discovery on the web. Semantic tagging is a more recent but much less successful innovation borne of frustration at the limitations of ...
Community Photo Tagging: Engagement and Quality Study
WebSci '17: Proceedings of the 2017 ACM on Web Science ConferenceWith today's dissemination of digital cameras, running events are usually well-presented in various photo sharing platforms. The number of photos from one medium-sized race can easily exceed a thousand or two, therefore, it is a tedious task for a ...
Comments