short-paper

Large Scale Image Annotation via Deep Representation Learning and Tag Embedding Learning

Authors:
Yonghao He

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

,
Jian Wang

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

,
Cuicui Kang

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

,
Shiming Xiang

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

,
Chunhong Pan

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia RetrievalJune 2015Pages 523–526https://doi.org/10.1145/2671188.2749330

Published:22 June 2015Publication History

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Pages 523–526

ABSTRACT

In this paper, we focus on the issue of large scale image annotation, whereas most existing methods are devised for small datasets. A novel model based on deep representation learning and tag embedding learning is proposed. Specifically, the proposed model learns an unified latent space for image visual features and tag embeddings simultaneously. Furthermore, a metric matrix is introduced to estimate the relevance scores between images and tags. Finally, an objective function modeling triplet relationships (irrelevant tag, image, relevant tag) is proposed with maximum margin pursuit. The proposed model is easy to tackle new images and tags via online learning and has a relatively low test computation complexity. Experimental results on NUS-WIDE dataset demonstrate the effectiveness of the proposed model.

References

L. Ballan, T. Uricchio, L. Seidenari, and A. Del Bimbo. A cross-media model for automatic image annotation. In ACM International Conference on Multimedia Retrieval, pages 73--80, 2014. Google ScholarDigital Library
G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3):394--410, 2007. Google ScholarDigital Library
M. Chen, A. Zheng, and K. Weinberger. Fast image tagging. In International Conference on Machine Learning, pages 1274--1282, 2013.Google ScholarDigital Library
T. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: a real-world web image database from national university of singapore. In ACM International Conference on Image and Video Retrieval, page 48, 2009. Google ScholarDigital Library
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. Imagenet: a large-scale hierarchical image database. In IEEE International Conference on Computer Vision and Pattern Recognition, pages 248--255, 2009.Google ScholarCross Ref
P. Duygulu, K. Barnard, J. de Freitas, and D. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In European Conference on Computer Vision, pages 97--112. 2006. Google ScholarDigital Library
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524, 2013. Google ScholarDigital Library
Y. Gong, Y. Jia, T. Leung, A. Toshev, and S. Ioffe. Deep convolutional ranking for multilabel image annotation. 2014.Google Scholar
M. Grubinger, P. Clough, H. Muller, and T. Deselaers. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage, pages 13--23, 2006.Google Scholar
M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In IEEE International Conference on Computer Vision, pages 309--316, 2009.Google ScholarCross Ref
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia, pages 675--678, 2014. Google ScholarDigital Library
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.Google ScholarCross Ref
D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004. Google ScholarDigital Library
A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In European Conference on Computer Vision, pages 316--329. 2008. Google ScholarDigital Library
V. Murthy, E. Can, and R. Manmatha. A hybrid model for automatic image annotation. In ACM International Conference on Multimedia Retrieval, pages 369--376, 2014. Google ScholarDigital Library
A. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. arXiv preprint arXiv:1403.6382, 2014.Google Scholar
V. Vapnik and V. Vapnik. Statistical learning theory, volume 2. Wiley New York, 1998.Google ScholarDigital Library
L. Von Ahn and L. Dabbish. Labeling images with a computer game. In ACM SIGCHI Conference on Human Factors in Computing Systems, pages 319--326, 2004. Google ScholarDigital Library
C. Wang, S. Yan, L. Zhang, and H. Zhang. Multi-label sparse coding for automatic image annotation. In IEEE International Conference on Computer Vision and Pattern Recognition, pages 1643--1650, 2009.Google ScholarCross Ref
J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. Machine Learning, 81(1):21--35, 2010. Google ScholarDigital Library

Index Terms

Large Scale Image Annotation via Deep Representation Learning and Tag Embedding Learning
1. Applied computing
  1. Document management and text processing
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Joint multi-view representation learning and image tagging
AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

Automatic image annotation is an important problem in several machine learning applications such as image search. Since there exists a semantic gap between low-level image features and high-level semantics, the description ability of image ...
Read More
Learning Social Image Embedding with Deep Multimodal Attention Networks
Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017

Learning social media data embedding by deep models has attracted extensive research interest as well as boomed a lot of applications, such as link prediction, classification, and cross-modal search. However, for social images which contain both link ...
Read More
A 3D-CAE-CNN model for Deep Representation Learning of 3D images
Abstract
Deep Representation Learning technologies based on supervised Convolutional Neural Networks (CNNs) have attained significant interest mainly due to their superior performance for learning abstract and robust features used in object ...
Highlights
- We propose the idea of combining 3D-CAE-UFL and 3D-CNN-SFL approaches in order to create efficient and high quality deep learning representations for 3D ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval
June 2015
700 pages
ISBN:9781450332743
DOI:10.1145/2671188
General Chairs:
Alex Hauptmann
Carnegie Mellon University, USA
,
Chong-Wah Ngo
City University of Hong Kong, China
,
Xiangyang Xue
Fudan University, China
,
Program Chairs:
Yu-Gang Jiang
Fudan University, China
,
Cees Snoek
University of Amsterdam and Qualcomm Research Netherlands
,
Nuno Vasconcelos
University of California, San Diego, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep representation learning
large scale image annotation
tag embedding learning
Qualifiers
- short-paper
Conference

Acceptance Rates
ICMR '15 Paper Acceptance Rate48of127submissions,38%Overall Acceptance Rate254of830submissions,31%
More
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 249
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Large Scale Image Annotation via Deep Representation Learning and Tag Embedding Learning

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Joint multi-view representation learning and image tagging

Learning Social Image Embedding with Deep Multimodal Attention Networks

A 3D-CAE-CNN model for Deep Representation Learning of 3D images