research-article

On the sampling of web images for learning visual concept classifiers

Authors:
Shiai Zhu

City University of Hong Kong, Kowloon, Hong Kong

City University of Hong Kong, Kowloon, Hong Kong
View Profile

,
Gang Wang

City University of Hong Kong, Kowloon, Hong Kong and Fudan University, Shanghai, China

City University of Hong Kong, Kowloon, Hong Kong and Fudan University, Shanghai, China
View Profile

,
Chong-Wah Ngo

City University of Hong Kong, Kowloon, Hong Kong

City University of Hong Kong, Kowloon, Hong Kong
View Profile

,
Yu-Gang Jiang

Columbia University, New York, NY

Columbia University, New York, NY
View Profile

CIVR '10: Proceedings of the ACM International Conference on Image and Video RetrievalJuly 2010Pages 50–57https://doi.org/10.1145/1816041.1816051

Published:05 July 2010Publication History

CIVR '10: Proceedings of the ACM International Conference on Image and Video Retrieval

Pages 50–57

ABSTRACT

Visual concept learning often requires a large set of training images. In practice, nevertheless, acquiring noise-free training labels with sufficient positive examples is always expensive. A plausible solution for training data collection is by sampling the largely available user-tagged images from social media websites. With the general belief that the probability of correct tagging is higher than that of incorrect tagging, such a solution often sounds feasible, though is not without challenges. First, user-tags can be subjective and, to certain extent, are ambiguous. For instance, an image tagged with "whales" may be simply a picture about ocean museum. Learning concept "whales" with such training samples will not be effective. Second, user-tags can be overly abbreviated. For instance, an image about concept "wedding" may be tagged with "love" or simply the couple's names. As a result, crawling sufficient positive training examples is difficult. This paper empirically studies the impact of exploiting the tagged images towards concept learning, investigating the issue of how the quality of pseudo training images affects concept detection performance. In addition, we propose a simple approach, named semantic field, for predicting the relevance between a target concept and the tag list associated with the images. Specifically, the relevance is determined through concept-tag co-occurrence by exploring external sources such as WordNet and Wikipedia. The proposed approach is shown to be effective in selecting pseudo training examples, exhibiting better performance in concept learning than other approaches such as those based on keyword sampling and tag voting.

References

M. Ames and M. Naaman. Why we tag: Motivations for annotation in mobile and online media. In ACM SIGCHI, 2007. Google ScholarDigital Library
A. Ulges et al. Learning automatic concept detectors from online video. Comput. Vis. Image Understand, 2009. Google ScholarDigital Library
D. Liu et al. Tag ranking. In ACM WWW, 2009. Google ScholarDigital Library
G.-J. Qi et al. Transductive inference with hierarchical clustering for video annotation. In ICME, 2007.Google ScholarCross Ref
J. Tang et al. Inferring semantic concepts from community-contributed images and noisy tags. In ACM MM, 2009. Google ScholarDigital Library
K. Bischoff et al. Can all tags be used for search. In ACM CIKM, 2008. Google ScholarDigital Library
Q. Tian et al. A new analysis of the value of unlabeled data in semi-supervised learning for image retrieval. In ICME, 2004.Google Scholar
S.-F. Chang et al. Columbia University/VIREO-CityU/IRIT TRECVID 2008 high-level feature extraction and interactive video search. In TRECVID, 2008.Google Scholar
T.-S. Chua et al. NUS-WIDE: A real-world web image database from national university of singapore. In CIVR, 2009. Google ScholarDigital Library
Y.-G. Jiang et al. Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Trans. on Multimedia, 12(1):42--53, 2010. Google ScholarDigital Library
Y.-G. Jiang, C.-W Ngo, and J. Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. In CIVR, 2007. Google ScholarDigital Library
D. Jurafsky and J. H. Martin. Speech and language processing. Prentice-Hall, 2000. Google ScholarDigital Library
L. S. Kennedy, S.-F. Chang, and I. V. Kozintsev. To search or to label. In ACM MIR, 2006. Google ScholarDigital Library
L.-J. Li and L. Fei-Fei. OPTIMOL: automatic object picture collection via incremental model learning. Int. J. of Computer Vision, 2009. Google ScholarDigital Library
X.-R. Li and C. G. M. Snoek. Visual categorization with negative examples for free. In ACM MM, 2009. Google ScholarDigital Library
X.-R. Li, C. G. M. Snoek, and M. Worring. Learning social tag relevance by neighbor voting. IEEE Trans. on MM, 11(7):1310--1322, 2009. Google ScholarDigital Library
M. R. Naphade and J. R. Smith. On the detection of semantic concepts at TRECVID. In ACM MM, 2004. Google ScholarDigital Library
G. Quénot and S. Ayache. TRECVID 2009 collaborative annotation. http://mrim.imag.fr/tvca/.Google Scholar
F. Schroff, A. Criminisi, and A. Zisserman. Harvesting image databases from the web. In ICCV, 2007.Google ScholarCross Ref
A. T. Setz and C. G. M. Snoek. Can social tagged images aid concept-based video search. In ICME, 2009. Google ScholarDigital Library
S. Tong and E. Chang. Support vector machine active learning for image retrieval. In ACM MM, 2001. Google ScholarDigital Library
G. Wang, T.-S. Chua, and M. Zhao. Exploring knowledge of sub-domain in a multi-resoluation bootstrapping framework for concept detection in news. In ACM MM, 2008. Google ScholarDigital Library
R. Yan, A. G. Hauptmann, and R. Jin. Negative pseudo-relevance feedback in content-based video retrieval. In ACM MM, 2003. Google ScholarDigital Library
R. Yan and M. R. Naphade. Semi-supervised cross feature learning for semantic concept detection in video. In CVPR, 2005. Google ScholarDigital Library

Index Terms

On the sampling of web images for learning visual concept classifiers
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Sampling and Ontologically Pooling Web Images for Visual Concept Learning
Part 1

Sufficient training examples are essential for effective learning of semantic visual concepts. In practice, however, acquiring noise-free training examples has always been expensive. Recently the rapid popularization of social media websites, such as ...
Read More
Learning automatic concept detectors from online video

Concept detection is targeted at automatically labeling video content with semantic concepts appearing in it, like objects, locations, or activities. While concept detectors have become key components in many research prototypes for content-based video ...
Read More
Improving multi-view semi-supervised learning with agreement-based sampling
Combined Learning Methods and Mining Complex Data

Semi-supervised learning algorithms are widely used to build strong learning models when there are not enough labeled instances. Some semi-supervised learning algorithms, including co-training and co-EM, use multiple views to build learning models. Past ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIVR '10: Proceedings of the ACM International Conference on Image and Video Retrieval
July 2010
492 pages
ISBN:9781450301176
DOI:10.1145/1816041
Conference Chairs:
Shipeng Li
Microsoft Research Asia, China
,
Xinbo Gao
Xidian University, China
,
Nicu Sebe
University of Trento, Italy
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
concept detection
sampling
web images
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 319
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On the sampling of web images for learning visual concept classifiers

CIVR '10: Proceedings of the ACM International Conference on Image and Video Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sampling and Ontologically Pooling Web Images for Visual Concept Learning

Learning automatic concept detectors from online video

Improving multi-view semi-supervised learning with agreement-based sampling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On the sampling of web images for learning visual concept classifiers

CIVR '10: Proceedings of the ACM International Conference on Image and Video Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sampling and Ontologically Pooling Web Images for Visual Concept Learning

Learning automatic concept detectors from online video

Improving multi-view semi-supervised learning with agreement-based sampling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media