Article

To construct optimal training set for video annotation

Authors:

Jinhui Tang,

Yan Song,

Xian-Sheng Hua,

Tao Mei,

Xiuqing WuAuthors Info & Claims

MM '06: Proceedings of the 14th ACM international conference on Multimedia

Pages 89 - 92

https://doi.org/10.1145/1180639.1180667

Published: 23 October 2006 Publication History

Get Access

Abstract

This paper exploits the criteria to optimize the training set construction for video annotation. Most existing learning-based semantic annotation approaches require a large training set to achieve good generalization capacity, in which a considerable amount of labor-intensively manual labeling is desirable. However, it is observed that the generalization capacity of a classifier highly depends on the geometrical distribution rather than the size of the training data. We argue that a training set which includes most temporal and spatial distribution of the whole data will achieve a satisfying performance even in the case of limited size of training set. In order to capture the geometrical distribution characteristics of a given video collection, we propose the following four metrics for constructing an optimal training set, including Salience Time Dispersiveness Spatial Dispersiveness and Diversity. Moreover, based on these metrics, we propose a set of optimization rules to capture the most distribution information of the whole data for a training set with a given size. Experimental results demonstrate that these rules are effective for training set construction for video annotation, and significantly outperform random training set selection as well.

References

[1]

Guidelines for the trecvid 2003 evaluation. http://wwwnlpir.nist.gov/projects/tv2003/tv2003.htm.

Google Scholar

[2]

G. Baudat and F. Anouar. Feature vector selection and projection using kernels. Neurocomputing, 55(9):21--38, 2003.

Crossref

Google Scholar

[3]

J. Fan, H. Luo, and X. Lin. Semantic video classification by integrating flexible mixture model with adaptive em algorithm. In Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval, pages 9--16, Nov 2003.

Digital Library

Google Scholar

[4]

K.-S. Goh, E. Chang, and W.-C. Lai. Concept-dependent multimodal active learning for image retrieval. In Proceedings of ACM International Conference on Multimedia, pages 564--571, Oct 2004.

Digital Library

Google Scholar

[5]

G.-J. Qi, Y. Song, X.-S. Hua, H.-J. Zhang, and D.-R. Dai. Video annotation by active learning and cluster tuning. In Proceedings of International Workshop on Semantic Learning Applications in Multimedia, June 2006.

Google Scholar

[6]

Y. Song, X.-S. Hua, L.-R. Dai, and M. Wang. Semi-automatic video annotation based on active learning with multiple complementary predictors. In Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval, pages 97--104, Nov 2005.

Digital Library

Google Scholar

[7]

V. Vapnik. Three remarks on support vector method of function estimation. Advances in Kernel Methods: Support Vector Learning. MIT Press, 1999.

Digital Library

Google Scholar

[8]

M. Wang, X.-S. Hua, L.-R.Dai, and Y.Song.Enhanced semi-supervised learning for automatic video annotation. In Proceedings of IEEE International Conference in Multimedia & Expo, pages 1485--1488, July 2006.

Crossref

Google Scholar

[9]

L. Xie, S.-F. Chang, A. Divakaran, and H. Sun. Structure analysis of soccer video with hidden markov models. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pages 13--17, May 2002.

Crossref

Google Scholar

[10]

R. Yan and M. Naphade. Semi-supervised cross feature learning for semantic concept detection in videos. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pages 657--663, June 2005.

Digital Library

Google Scholar

[11]

R. Yan, J. Yang, and A. Hauptmann. Automatically labeling video data using multi-class active learning. In Proceedings of IEEE International Conference on Computer Vision,pages 516--523, Oct 2003.

Digital Library

Google Scholar

[12]

D. Zhong and S.-F. Chang. Structure analysis of sports video using domain models. In Proceedings of IEEE International Conference in Multimedia & Expo, pages 713--716, Aug 2001.

Crossref

Google Scholar

Cited By

View all

Tang JHua X(2018)Typicality rankingMultimedia Tools and Applications10.1007/s11042-011-0892-070:2(647-660)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s11042-011-0892-0
ur Rehman SHuang YTu Sur Rehman O(2018)Facebook5k: A Novel Evaluation Resource Dataset for Cross-Media SearchCloud Computing and Security10.1007/978-3-030-00006-6_47(512-524)Online publication date: 1-Nov-2018
https://doi.org/10.1007/978-3-030-00006-6_47
Sun YKojima ACandan KPanchanathan SPrabhakaran BSundaram HFeng WSebe N(2011)A novel method for semantic video concept learning using web imagesProceedings of the 19th ACM international conference on Multimedia10.1145/2072298.2071943(1081-1084)Online publication date: 28-Nov-2011
https://dl.acm.org/doi/10.1145/2072298.2071943
Show More Cited By

Index Terms

To construct optimal training set for video annotation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Semi-supervised multi-instance multi-label learning for video annotation task
MM '12: Proceedings of the 20th ACM international conference on Multimedia

Traditional approaches for automatic video annotation usually represent one video clip with a flat feature vector, neglecting the fact that video data contain natural structures. It is also noteworthy that a video clip is often relevant to multiple ...
Automatic video annotation by semi-supervised learning with kernel density estimation
MM '06: Proceedings of the 14th ACM international conference on Multimedia

Insufficiency of labeled training data is a major obstacle for automatically annotating large-scale video databases with semantic concepts. Existing semi-supervised learning algorithms based on parametric models try to tackle this issue by incorporating ...
Video Annotation Based on Kernel Linear Neighborhood Propagation

The insufficiency of labeled training data for representing the distribution of the entire dataset is a major obstacle in automatic semantic annotation of large-scale video database. Semi-supervised learning algorithms, which attempt to learn from both ...

Comments

Information & Contributors

Information

Published In

MM '06: Proceedings of the 14th ACM international conference on Multimedia

October 2006

1072 pages

ISBN:1595934472

DOI:10.1145/1180639

General Chairs:
Klara Nahrstedt
UIUC
,
Matthew Turk
UCSB
,
Program Chairs:
Yong Rui
Microsoft Research
,
Wolfgang Klas
Universität Wien
,
Ketan Mayer-Patel
UNC

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM06

Sponsor:

MM06: The 14th ACM International Conference on Multimedia 2006

October 23 - 27, 2006

CA, Santa Barbara, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
314
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Tang JHua X(2018)Typicality rankingMultimedia Tools and Applications10.1007/s11042-011-0892-070:2(647-660)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s11042-011-0892-0
ur Rehman SHuang YTu Sur Rehman O(2018)Facebook5k: A Novel Evaluation Resource Dataset for Cross-Media SearchCloud Computing and Security10.1007/978-3-030-00006-6_47(512-524)Online publication date: 1-Nov-2018
https://doi.org/10.1007/978-3-030-00006-6_47
Sun YKojima ACandan KPanchanathan SPrabhakaran BSundaram HFeng WSebe N(2011)A novel method for semantic video concept learning using web imagesProceedings of the 19th ACM international conference on Multimedia10.1145/2072298.2071943(1081-1084)Online publication date: 28-Nov-2011
https://dl.acm.org/doi/10.1145/2072298.2071943
Teixeira RYamasaki TAizawa K(2011)Determination of emotional content of video clips by low-level audiovisual featuresMultimedia Tools and Applications10.1007/s11042-010-0702-061:1(21-49)Online publication date: 11-Jan-2011
https://doi.org/10.1007/s11042-010-0702-0
Chua TTang JHong RLi HLuo ZZheng YKompatsiaris YMarchand-Maillet SAvrithis YConnor NGatica-Perez DChua T(2009)NUS-WIDEProceedings of the ACM International Conference on Image and Video Retrieval10.1145/1646396.1646452(1-9)Online publication date: 8-Jul-2009
https://dl.acm.org/doi/10.1145/1646396.1646452
Hua XQi GEL Saddik AVuong SGriwodz CDel Bimbo ACandan KJaimes A(2008)Online multi-label active annotationProceedings of the 16th ACM international conference on Multimedia10.1145/1459359.1459379(141-150)Online publication date: 26-Oct-2008
https://dl.acm.org/doi/10.1145/1459359.1459379
Tang JHua XQi GGu ZWu X(2007)Beyond Accuracy: Typicality Ranking for Video AnnotationMultimedia and Expo, 2007 IEEE International Conference on10.1109/ICME.2007.4284733(647-650)Online publication date: Jul-2007
https://doi.org/10.1109/ICME.2007.4284733

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Semi-supervised multi-instance multi-label learning for video annotation task

Automatic video annotation by semi-supervised learning with kernel density estimation

Video Annotation Based on Kernel Linear Neighborhood Propagation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations