skip to main content
10.1145/2671188.2749330acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
short-paper

Large Scale Image Annotation via Deep Representation Learning and Tag Embedding Learning

Authors Info & Claims
Published:22 June 2015Publication History

ABSTRACT

In this paper, we focus on the issue of large scale image annotation, whereas most existing methods are devised for small datasets. A novel model based on deep representation learning and tag embedding learning is proposed. Specifically, the proposed model learns an unified latent space for image visual features and tag embeddings simultaneously. Furthermore, a metric matrix is introduced to estimate the relevance scores between images and tags. Finally, an objective function modeling triplet relationships (irrelevant tag, image, relevant tag) is proposed with maximum margin pursuit. The proposed model is easy to tackle new images and tags via online learning and has a relatively low test computation complexity. Experimental results on NUS-WIDE dataset demonstrate the effectiveness of the proposed model.

References

  1. L. Ballan, T. Uricchio, L. Seidenari, and A. Del Bimbo. A cross-media model for automatic image annotation. In ACM International Conference on Multimedia Retrieval, pages 73--80, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3):394--410, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Chen, A. Zheng, and K. Weinberger. Fast image tagging. In International Conference on Machine Learning, pages 1274--1282, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: a real-world web image database from national university of singapore. In ACM International Conference on Image and Video Retrieval, page 48, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. Imagenet: a large-scale hierarchical image database. In IEEE International Conference on Computer Vision and Pattern Recognition, pages 248--255, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  6. P. Duygulu, K. Barnard, J. de Freitas, and D. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In European Conference on Computer Vision, pages 97--112. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Gong, Y. Jia, T. Leung, A. Toshev, and S. Ioffe. Deep convolutional ranking for multilabel image annotation. 2014.Google ScholarGoogle Scholar
  9. M. Grubinger, P. Clough, H. Muller, and T. Deselaers. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage, pages 13--23, 2006.Google ScholarGoogle Scholar
  10. M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In IEEE International Conference on Computer Vision, pages 309--316, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia, pages 675--678, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In European Conference on Computer Vision, pages 316--329. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Murthy, E. Can, and R. Manmatha. A hybrid model for automatic image annotation. In ACM International Conference on Multimedia Retrieval, pages 369--376, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. arXiv preprint arXiv:1403.6382, 2014.Google ScholarGoogle Scholar
  17. V. Vapnik and V. Vapnik. Statistical learning theory, volume 2. Wiley New York, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Von Ahn and L. Dabbish. Labeling images with a computer game. In ACM SIGCHI Conference on Human Factors in Computing Systems, pages 319--326, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Wang, S. Yan, L. Zhang, and H. Zhang. Multi-label sparse coding for automatic image annotation. In IEEE International Conference on Computer Vision and Pattern Recognition, pages 1643--1650, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  20. J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. Machine Learning, 81(1):21--35, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Large Scale Image Annotation via Deep Representation Learning and Tag Embedding Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval
        June 2015
        700 pages
        ISBN:9781450332743
        DOI:10.1145/2671188

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 June 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        ICMR '15 Paper Acceptance Rate48of127submissions,38%Overall Acceptance Rate254of830submissions,31%

        Upcoming Conference

        ICMR '24
        International Conference on Multimedia Retrieval
        June 10 - 14, 2024
        Phuket , Thailand

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader