short-paper

Automatic Image Annotation using Deep Learning Representations

Authors:

Venkatesh N. Murthy,

Subhransu Maji,

R. ManmathaAuthors Info & Claims

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Pages 603 - 606

https://doi.org/10.1145/2671188.2749391

Published: 22 June 2015 Publication History

Abstract

We propose simple and effective models for the image annotation that make use of Convolutional Neural Network (CNN) features extracted from an image and word embedding vectors to represent their associated tags. Our first set of models is based on the Canonical Correlation Analysis (CCA) framework that helps in modeling both views - visual features (CNN feature) and textual features (word embedding vectors) of the data. Results on all three variants of the CCA models, namely linear CCA, kernel CCA and CCA with k-nearest neighbor (CCA-KNN) clustering, are reported. The best results are obtained using CCA-KNN which outperforms previous results on the Corel-5k and the ESP-Game datasets and achieves comparable results on the IAPRTC-12 dataset. In our experiments we evaluate CNN features in the existing models which bring out the advantages of it over dozens of handcrafted features. We also demonstrate that word embedding vectors perform better than binary vectors as a representation of the tags associated with an image. In addition we compare the CCA model to a simple CNN based linear regression model, which allows the CNN layers to be trained using back-propagation.

References

[1]

L. Ballan, T. Uricchio, L. Seidenari, and A. Del Bimbo. A cross-media model for automatic image annotation. In ICMR, page 73, 2014.

Digital Library

[2]

S. L. Feng, R. Manmatha, and V. Lavrenko. Multiple bernoulli relevance models for image and video annotation. In CVPR'04, pages 1002--1009, 2004.

Digital Library

[3]

Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. A multi-view embedding space for modeling internet images, tags, and their semantics. IJCV, 106(2):210--233, 2014.

Digital Library

[4]

M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In In ICCV, 2009.

[5]

D. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural computation, 16(12):2639--2664, 2004.

Digital Library

[6]

J. Jeon, V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. In SIGIR '03, pages 119--126, 2003.

Digital Library

[7]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM MM, pages 675--678, 2014.

Digital Library

[8]

M. M. Kalayeh, H. Idrees, and M. Shah. Nmf-knn: Image annotation using weighted multi-view non-negative matrix factorization. In CVPR'14, 2014.

Digital Library

[9]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097--1105, 2012.

Digital Library

[10]

A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In ECCV '08, pages 316--329, 2008.

Digital Library

[11]

T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.

[12]

S. Moran and V. Lavrenko. A sparse kernel relevance model for automatic image annotation. ntl. Journal of Multimedia Information Retrieval, 3(4):209--229, 2014.

[13]

V. N. Murthy, E. F. Can, and R. Manmatha. A hybrid model for automatic image annotation. In ICMR'14, pages 369:369--369:376, 2014.

Digital Library

[14]

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[15]

Y. Verma and C. V. Jawahar. Image annotation using metric learning in semantic neighborhoods. In ECCV'12, pages 836--849, 2012.

Digital Library

Cited By

Urfan MRajput PMahajan PSharma SHakla HKour VKhajuria BChowdhary RLehana PKarlupia NAbrol PTran LChoudhary S(2024)The Deep Learning-Crop Platform (DL-CRoP): For Species-Level Identification and Nutrient Status of Agricultural CropsResearch10.34133/research.04917Online publication date: 4-Oct-2024
https://doi.org/10.34133/research.0491
Li X(2024)Scene Semantic Analysis of Display Media Images2024 5th International Conference on Electronics and Sustainable Communication Systems (ICESC)10.1109/ICESC60852.2024.10690094(849-854)Online publication date: 7-Aug-2024
https://doi.org/10.1109/ICESC60852.2024.10690094
Asghar RKumar SHynds P(2024)Automatic Classification of 10 Blood Cell Subtypes using Transfer Learning via Pre-Trained Convolutional Neural NetworksInformatics in Medicine Unlocked10.1016/j.imu.2024.101542(101542)Online publication date: Jun-2024
https://doi.org/10.1016/j.imu.2024.101542
Show More Cited By

Index Terms

Automatic Image Annotation using Deep Learning Representations
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Automatic Image Annotation Using Convex Deep Learning Models
ICPRAM 2015: Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 2

Automatically assigning semantically relevant tags to an image is an important task in machine learning. Many

algorithms have been proposed to annotate images based on features such as color, texture, and shape. Success

of these algorithms is dependent ...
Word Representations For Gender Classification Using Deep Learning
Abstract
This paper studies the effect of word representations on gender classification using deep learning. There are two main objectives: how well do popular deep learning architectures, namely LSTM and CNNs, perform on gender classification task and ...
The image annotation algorithm using convolutional features from intermediate layer of deep learning
Abstract
The automatic image annotation is an effective computer operation that predicts the annotation of an unknown image by automatically learning potential relationships between the semantic concept space and the visual feature space in the annotation ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

June 2015

700 pages

ISBN:9781450332743

DOI:10.1145/2671188

General Chairs:
Alex Hauptmann
Carnegie Mellon University, USA
,
Chong-Wah Ngo
City University of Hong Kong, China
,
Xiangyang Xue
Fudan University, China
,
Program Chairs:
Yu-Gang Jiang
Fudan University, China
,
Cees Snoek
University of Amsterdam and Qualcomm Research Netherlands
,
Nuno Vasconcelos
University of California, San Diego, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

ICMR '15

Sponsor:

SIGMM

ICMR '15: International Conference on Multimedia Retrieval

June 23 - 26, 2015

Shanghai, China

Acceptance Rates

ICMR '15 Paper Acceptance Rate 48 of 127 submissions, 38%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

92
Total Citations
View Citations
1,222
Total Downloads

Downloads (Last 12 months)39
Downloads (Last 6 weeks)4

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Urfan MRajput PMahajan PSharma SHakla HKour VKhajuria BChowdhary RLehana PKarlupia NAbrol PTran LChoudhary S(2024)The Deep Learning-Crop Platform (DL-CRoP): For Species-Level Identification and Nutrient Status of Agricultural CropsResearch10.34133/research.04917Online publication date: 4-Oct-2024
https://doi.org/10.34133/research.0491
Li X(2024)Scene Semantic Analysis of Display Media Images2024 5th International Conference on Electronics and Sustainable Communication Systems (ICESC)10.1109/ICESC60852.2024.10690094(849-854)Online publication date: 7-Aug-2024
https://doi.org/10.1109/ICESC60852.2024.10690094
Asghar RKumar SHynds P(2024)Automatic Classification of 10 Blood Cell Subtypes using Transfer Learning via Pre-Trained Convolutional Neural NetworksInformatics in Medicine Unlocked10.1016/j.imu.2024.101542(101542)Online publication date: Jun-2024
https://doi.org/10.1016/j.imu.2024.101542
Lotfi FJamzad MBeigy HFarhood HSheng QBeheshti A(2024)Knowledge graph construction in hyperbolic space for automatic image annotationImage and Vision Computing10.1016/j.imavis.2024.105293151:COnline publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1016/j.imavis.2024.105293
Barati AFarsi HMohamadzadeh S(2024)Image description using tags latent concepts in convolutional neural networksMultimedia Tools and Applications10.1007/s11042-024-19981-4Online publication date: 6-Aug-2024
https://doi.org/10.1007/s11042-024-19981-4
Salar AAhmadi A(2024)Enhancing high-vocabulary image annotation with a novel attention-based poolingThe Visual Computer10.1007/s00371-024-03618-6Online publication date: 24-Sep-2024
https://doi.org/10.1007/s00371-024-03618-6
Salar AAhmadi A(2024)Improving loss function for deep convolutional neural network applied in automatic image annotationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02873-340:3(1617-1629)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00371-023-02873-3
Jiu MZhu HSahbi H(2024)Deep Multi-order Context-Aware Kernel Network for Multi-label ClassificationPattern Recognition10.1007/978-3-031-78122-3_1(1-17)Online publication date: 5-Dec-2024
https://doi.org/10.1007/978-3-031-78122-3_1
Arunachalam ARavi VAcharya VPham T(2023)Toward Data-Model-Agnostic Autonomous Machine-Generated Data Labeling and Annotation Platform: COVID-19 Autoannotation Use CaseIEEE Transactions on Engineering Management10.1109/TEM.2021.309454470:8(2695-2706)Online publication date: Aug-2023
https://doi.org/10.1109/TEM.2021.3094544
Hashim MAl-Hilali AQasim HSalah ONahi A(2023)An Optimized Image Annotation Method Utilizing Integrating Neural Networks Model and Slantlet Transformation2023 First International Conference on Advances in Electrical, Electronics and Computational Intelligence (ICAEECI)10.1109/ICAEECI58247.2023.10370854(1-10)Online publication date: 19-Oct-2023
https://doi.org/10.1109/ICAEECI58247.2023.10370854
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten