|
ABSTRACT
Most current image retrieval systems and commercial search engines use mainly text annotations to index and retrieve WWW images. This research explores the use of machine learning approaches to automatically annotate WWW images based on a predefined list of concepts by fusing evidences from image contents and their associated HTML text. One major practical limitation of employing supervised machine learning approaches is that for effective learning, a large set of labeled training samples is needed. This is tedious and severely impedes the practical development of effective search techniques for WWW images, which are dynamic and fast-changing. As web-based images possess both intrinsic visual contents and text annotations, they provide a strong basis to bootstrap the learning process by adopting a co-training approach involving classifiers based on two orthogonal set of features -- visual and text. The idea of co-training is to start from a small set of labeled training samples, and successively annotate a larger set of unlabeled samples using the two orthogonal classifiers. We carry out experiments using a set of over 5,000 images acquired from the Web. We explore the use of different combinations of HTML text and visual representations. We find that our bootstrapping approach can achieve a performance comparable to that of the supervised learning approach with an F1 measure of over 54%. At the same time, it offers the added advantage of requiring only a small initial set of training samples.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
K. Barnard & D.A. Forsyth. Learning the semantics of words and pictures. IEEE International Conference on Computer Vision II, 408--415 (2001).
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
Edward Chang, Kingshy Goh, Gerard Sychay & Gang Wu. CBSA: content-based soft annotation for multimodal image retrieval using Bayes Point Machines. IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Description 13, 26--38 (2003).
|
| |
7
|
T. S. Chua, Y. Zhao, L. Chaisorn, C.-K. Koh, H. Yang, H. Xu and Q. Tian. TREC 2003 Video Retrieval and Story Segmentation Task at NUS PRIS. 2003. http://www-nlpir.nist.gov/projects/tv.pubs.org
|
| |
8
|
M. Collins & Y. Singer. Unsupervised models for name entity classification. Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural language Processing and Very Large Corpora. 1999.
|
 |
9
|
|
| |
10
|
A. Hauptman, R.V. Baron, M.-Y. Chen, M. Christel, P. Duygulu, C. Huang, R. Jin, W.-H Lin, T. Ng, N. Moraveji, N. Papernick, C.G.M. Snoek, G. Tzanetakis, J. Yang, & H.D Wactlar. Informedia at TRECVID 2003: analyzing and searching broadcast news video, 2003, http://www-nlpir.nist.gov/projects/tv.pubs.org
|
 |
11
|
|
| |
12
|
Rainer Lienhart & Alex Hartmann. "Classifiying images ont eh web automatically". Joutrnal of electronic imaging. 11(4), Oct 2002. 1--10.
|
| |
13
|
S.G.. Mallat & Z.F. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12), 3397--3415, 1993.
|
| |
14
|
|
| |
15
|
Y. Mori, H. Takahashi & R. Oka. Image-to-word transformation based on dividing and vector quantizing images with words. First International Workshop on multimedia Intelligent Storage and Retrieval Management (1999).
|
 |
16
|
|
| |
17
|
Lexin Pan. Image8: an image search engine for the Internet. Honours year project report. School of Computing, National University of Singapore, Apr 2003.
|
| |
18
|
D. Pierce & C. Cardie. Limitations of co-training for natural language learning from large datasets. Proceeding of Empirical Methods in Natural Language Processing. 2001.
|
| |
19
|
J.C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In 'Advances in Large Margin Classifiers', A.J. Smola, P. Bartlett, B. Scholkopf & D. Schuurmans (Eds). MIT Press, 1999.
|
| |
20
|
H.R. Rabiee, R.L. kashyap & S.R. Safavian. Adaptive multiresolution image coding with matching and basis pursuits. IEEE ICIP '96, EPFL, Switzerland, Sept, 1996.
|
| |
21
|
|
 |
22
|
V. Harmandas , M. Sanderson , M. D. Dunlop, Image retrieval by hypertext links, Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, p.296-303, July 27-31, 1997, Philadelphia, Pennsylvania, United States
|
 |
23
|
|
| |
24
|
|
| |
25
|
A. Smeanton, W. Kraaij & P. Over. TRECVID 2003 - An Introduction. http://www-nlpir.nist.gov/projects/tv.pubs.org, 2003.
|
 |
26
|
|
| |
27
|
R. Shi, H. Feng, T.-S. Chua & C.-H. Lee. An adaptive image content representation and segmentation approach to automatic image annotation. To appear in Conference on Image and Video Retrieval (CIVR'04), Dublin, Jul 2004.
|
| |
28
|
M. Unser. Texture classification and segmentation using Wavelet frames. IEEE Transactions on Image Processing, 4(11), 1549--1560, 1995.
|
| |
29
|
|
 |
30
|
|
| |
31
|
|
 |
32
|
|
| |
33
|
C. Zhang & T. Chen. An active learning framework for content-based information retrieval. IEEE transactions on multimedia. 4, 260--268, 2002.
|
CITED BY 10
|
|
Dhiraj Joshi , Ritendra Datta , Ziming Zhuang , W. P. Weiss , Marc Friedenberg , Jia Li , James Z. Wang, PARAgrab: a comprehensive architecture for web image management and multimodal querying, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
Ying Liu , Tao Qin , Tie-Yan Liu , Lei Zhang , Wei-Ying Ma, Similarity space projection for web image search and annotation, Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval, November 10-11, 2005, Hilton, Singapore
|
|
Hanghang Tong , Jingrui He , Mingjing Li , Changshui Zhang , Wei-Ying Ma, Graph based multi-modality learning, Proceedings of the 13th annual ACM international conference on Multimedia, November 06-11, 2005, Hilton, Singapore
|
|
|
|
|
|
|
Ritendra Datta , Dhiraj Joshi , Jia Li , James Z. Wang, Image retrieval: Ideas, influences, and trends of the new age, ACM Computing Surveys (CSUR), v.40 n.2, p.1-60, April 2008
|
|
|
|
|