Article

Automatic video annotation by semi-supervised learning with kernel density estimation

Authors:

Xian-Sheng Hua,

Hong-Jiang ZhangAuthors Info & Claims

MM '06: Proceedings of the 14th ACM international conference on Multimedia

Pages 967 - 976

https://doi.org/10.1145/1180639.1180855

Published: 23 October 2006 Publication History

Abstract

Insufficiency of labeled training data is a major obstacle for automatically annotating large-scale video databases with semantic concepts. Existing semi-supervised learning algorithms based on parametric models try to tackle this issue by incorporating the information in a large amount of unlabeled data. However, they are based on a "model assumption" that the assumed generative model is correct, which usually cannot be satisfied in automatic video annotation due to the large variations of video semantic concepts. In this paper, we propose a novel semi-supervised learning algorithm, named Semi Supervised Learning by Kernel Density Estimation (SSLKDE), which is based on a non-parametric method, and therefore the "model assumption" is avoided. While only labeled data are utilized in the classical Kernel Density Estimation (KDE) approach, in SSLKDE both labeled and unlabeled data are leveraged to estimate class conditional probability densities based on an extended form of KDE. We also investigate the connection between SSLKDE and existing graph-based semi-supervised learning algorithms. Experiments prove that SSLKDE significantly outperforms existing supervised methods for video annotation.

References

[1]

TRECVID: TREC Video Retrieval Evaluation, "http://www-nlpir.nist.gov/projects/trecvid"

[2]

UCI Repository of Machine Learning Databases, "http://www.ics.uci.edu/~mlearn/"

[3]

Belkin, M., Matveeva, I., and Niyogi, P. Regularization and semi-supervised learning on large graphs. Proc. Annual Conf. on Learning Theory, 2004

[4]

Blum, A., and Mitchell, T. Combining labeled and unlabeled data with co-training. Proc. Workshop on Computational Learning Theory, 1998.

Digital Library

[5]

Blum, A., and Chawla, S. Learning from labeled and unlabeled data using graph mincuts. Proc. International Conf. on Machine Learning, 2001.

Digital Library

[6]

Castelli, V., and Cover, T. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE trans. on Information Theory, vol. 42, 1996

[7]

Chang, C.-C., and Lin, C.-J., LIBSVM: a library for support vector machines, 2001

[8]

Chapelle, O., Zien, A., and Scholkopf, B., Semi-supervised learning, MIT Press, 2006

Digital Library

[9]

Cohen, I., Cozman, F.G., Sebe, N., Cirelo, M.C., and Huang, T.S. Semi-supervised learning of classifiers: theory, algorithms and their application to human-computer interaction. IEEE trans. on PAMI, vol. 26, no. 12, 2004.

Digital Library

[10]

Cozman, F., Cohen, I. and Cirelo, M. Semi-supervised learning of mixture models and Bayesian Networks, Proc. International Conf. on Machine Learning, 2003.

[11]

Delalleau, O., Bengio, Y., and Roux, N.L. Efficient non-parametric function induction in semi-supervised learning. Proc. International Workshop on Artificial Intelligence and Statistics, 2005

[12]

Devroye, L., and Penrod, C.S., The consistency of automatic kernel density estimates. The Annals of Statistics, vold. 12, 1984

[13]

Devroye, L., and Krzyzak, A. An equivalence theorem for L1 convergence of the kernel regression estimate. Journal of Statistical Planning and Inference, 1989

[14]

Feng, S.L., Manmatha, R., and Lavrenko, V., Multiple Bernoulli Relevance Models for Image and Video Annotation, Proc. International Conf. on Computer Vision and Pattern Recognition, 2004

Digital Library

[15]

Ghoshal, A., Arcing, P., and Khudanpur, S., Hidden Markov Models for Automatic Annotation and Content-Based Retrieval of Images and Video, Proc. ACM SIGIR, 2005

Digital Library

[16]

Hand, D.J., Kernel Discriminant Analysis. Research Studies Press, Wiley, Chichester, 1982

[17]

He, J.R., Li, M.J., Zhang, H.J., Tong, H.H. and Zhang, C.S., Manifold-ranking based image retrieval, Proc. ACM Multimedia, 2004

Digital Library

[18]

Lavrenko, L., Feng, S.L. and Manmatha, R., Statistical models for automatic video annotation and Retrieval, Proc. International conf. on Acoustics, speech and signal processing, 2004

[19]

Nigam, K., McCallum, A.K., Thrun, S., and Mitchell, T. Text classification from labeled and unlabeled documents using EM. Machine Learning, vol. 39, 2000

Digital Library

[20]

Parzen, E., On the estimation of a probability density function and the mode, Annals of Mathematical Statistics, vol. 33, 1962

[21]

Robert, C.P., and Casella, G., Monte Carlo Statistical Methods, Springer Verlag, 1999

Digital Library

[22]

Rosenberg, C., Heberg, M., and Schneiderman, H. Semi-supervised self-training of object detection models. Proc. 7th IEEE Workshop on Applications of Computer Vision, 2005.

Digital Library

[23]

Seeger, M. Learning with Labeled and Unlabeled Data. Tachnical report, Edinburgh University, 2001.

[24]

Song, Y., Hua, X.S., Dai, L.R. and Wang. M., Semi-Automatic Video Annotation Based on Active Learning with Multiple Complementary Predictors. ACM SIGMM International Workshop on Multimedia Information Retrieval, 2005

Digital Library

[25]

Yan, R., and Naphade, M., Semi-supervised Cross Feature Learning for Semantic Concept Detection in Videos, Proc. International Conf. on Computer Vision and Pattern Recognition, 2005

Digital Library

[26]

Zhang, L., Li, M. and Zhang, H.-J., Boosting Image Orientation Detection with Indoor vs. Outdoor Classification, Proc. 6 th IEEE Workshop on Applications of Computer Vision, 2002

Digital Library

[27]

Zhang, T., and Oles, F.J. A probability analysis on the value of unlabeled data for classification problems. Proc. 17th International Conf. on Machine Learning, 2000.

[28]

Zhou, D., Bousquet, O., Lal, T., Weston, J., and Scholkopf, B. Learning with local and global consistency. Proc. Advances in Neural Information Processing System, 2004

[29]

Zhu, X., Ghahramani, Z., and Lafferty, J. Semi-supervised learning using Gaussian fields and harmonic functions. Proc. International Conf. on Machine Learning, 2003.

[30]

Zhu, X. Semi-supervised learning literature survey, Technical Report (1530), University of Wisconsin-Madison, www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf.

[31]

Zhu, X. and Ghahramani, Z. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-106. Carnegie Mellon University

Cited By

Chen WLin CTseng Y(2024)Label Expansion through Walking Trajectories for Wi-Fi CSI-Based Indoor Localization2024 IEEE 35th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)10.1109/PIMRC59610.2024.10817203(1-6)Online publication date: 2-Sep-2024
https://doi.org/10.1109/PIMRC59610.2024.10817203
Amrish Shwetank (2024)Comparative analysis of manual and annotations for crowd assessment and classification using artificial intelligenceData Science and Management10.1016/j.dsm.2024.04.001Online publication date: Apr-2024
https://doi.org/10.1016/j.dsm.2024.04.001
Adnan MRahim MHasan ali MAl-Jawaheri KNeamah K(2021)A Review of Methods for The Image Automatic AnnotationJournal of Physics: Conference Series10.1088/1742-6596/1892/1/0120021892:1(012002)Online publication date: 1-Apr-2021
https://doi.org/10.1088/1742-6596/1892/1/012002
Show More Cited By

Index Terms

Automatic video annotation by semi-supervised learning with kernel density estimation
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Semi-supervised kernel density estimation for video annotation

Insufficiency of labeled training data is a major obstacle for automatic video annotation. Semi-supervised learning is an effective approach to this problem by leveraging a large amount of unlabeled data. However, existing semi-supervised learning ...
Semi-supervised multi-instance multi-label learning for video annotation task
MM '12: Proceedings of the 20th ACM international conference on Multimedia

Traditional approaches for automatic video annotation usually represent one video clip with a flat feature vector, neglecting the fact that video data contain natural structures. It is also noteworthy that a video clip is often relevant to multiple ...
Automatic image annotation by semi-supervised manifold kernel density estimation

The insufficiency of labeled training data is a major obstacle in automatic image annotation. To tackle this problem, we propose a semi-supervised manifold kernel density estimation (SSMKDE) approach based on a recently proposed manifold KDE method. Our ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '06: Proceedings of the 14th ACM international conference on Multimedia

October 2006

1072 pages

ISBN:1595934472

DOI:10.1145/1180639

General Chairs:
Klara Nahrstedt
UIUC
,
Matthew Turk
UCSB
,
Program Chairs:
Yong Rui
Microsoft Research
,
Wolfgang Klas
Universität Wien
,
Ketan Mayer-Patel
UNC

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM06

Sponsor:

MM06: The 14th ACM International Conference on Multimedia 2006

October 23 - 27, 2006

CA, Santa Barbara, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
749
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen WLin CTseng Y(2024)Label Expansion through Walking Trajectories for Wi-Fi CSI-Based Indoor Localization2024 IEEE 35th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)10.1109/PIMRC59610.2024.10817203(1-6)Online publication date: 2-Sep-2024
https://doi.org/10.1109/PIMRC59610.2024.10817203
Amrish Shwetank (2024)Comparative analysis of manual and annotations for crowd assessment and classification using artificial intelligenceData Science and Management10.1016/j.dsm.2024.04.001Online publication date: Apr-2024
https://doi.org/10.1016/j.dsm.2024.04.001
Adnan MRahim MHasan ali MAl-Jawaheri KNeamah K(2021)A Review of Methods for The Image Automatic AnnotationJournal of Physics: Conference Series10.1088/1742-6596/1892/1/0120021892:1(012002)Online publication date: 1-Apr-2021
https://doi.org/10.1088/1742-6596/1892/1/012002
Wang LLi QLu J(2019)Weakly supervised segment annotation via expectation kernel density estimationIET Computer Vision10.1049/iet-cvi.2018.532513:4(435-441)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.1049/iet-cvi.2018.5325
Qian XLiu XMa XLu DXu C(2016)What Is Happening in the Video? —Annotate Video by SentenceIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2015.247581526:9(1746-1757)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1109/TCSVT.2015.2475815
Chatterjee MLeuski AZhou XSmeaton ATian QBulterman DShen HMayer-Patel KYan S(2015)A Novel Statistical Approach for Image and Video Retrieval and Its Adaption for Active LearningProceedings of the 23rd ACM international conference on Multimedia10.1145/2733373.2806368(935-938)Online publication date: 13-Oct-2015
https://dl.acm.org/doi/10.1145/2733373.2806368
Habibian ASnoek CKankanhalli MRueger SManmatha RJose Jvan Rijsbergen K(2014)Stop-Frame Removal Improves Web Video ClassificationProceedings of International Conference on Multimedia Retrieval10.1145/2578726.2578803(499-502)Online publication date: 1-Apr-2014
https://dl.acm.org/doi/10.1145/2578726.2578803
Ji PZhao NHao SJiang J(2014)Automatic image annotation by semi-supervised manifold kernel density estimationInformation Sciences: an International Journal10.1016/j.ins.2013.09.016281(648-660)Online publication date: 1-Oct-2014
https://dl.acm.org/doi/10.1016/j.ins.2013.09.016
Tang JHua X(2014)Typicality rankingMultimedia Tools and Applications10.1007/s11042-011-0892-070:2(647-660)Online publication date: 1-May-2014
https://dl.acm.org/doi/10.1007/s11042-011-0892-0
Hong RZha ZGao YChua TWu X(2013)Multimedia encyclopedia construction by mining web knowledgeSignal Processing10.1016/j.sigpro.2012.06.02893:8(2361-2368)Online publication date: 1-Aug-2013
https://dl.acm.org/doi/10.1016/j.sigpro.2012.06.028
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten