research-article

Online Multimodal Co-indexing and Retrieval of Weakly Labeled Web Image Collections

Authors:
Lei Meng

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

,
Ah-Hwee Tan

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

,
Cyril Leung

Nanyang Technological University, Singapore, Singapore & The University of British Columbia, Canada

Nanyang Technological University, Singapore, Singapore & The University of British Columbia, Canada
View Profile

,
Liqiang Nie

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Tat-Seng Chua

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Chunyan Miao

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia RetrievalJune 2015Pages 219–226https://doi.org/10.1145/2671188.2749362

Published:22 June 2015Publication History

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Pages 219–226

ABSTRACT

Weak supervisory information of web images, such as captions, tags, and descriptions, make it possible to better understand images at the semantic level. In this paper, we propose a novel online multimodal co-indexing algorithm based on Adaptive Resonance Theory, named OMC-ART, for the automatic co-indexing and retrieval of images using their multimodal information. Compared with existing studies, OMC-ART has several distinct characteristics. First, OMC-ART is able to perform online learning of sequential data. Second, OMC-ART builds a two-layer indexing structure, in which the first layer co-indexes the images by the key visual and textual features based on the generalized distributions of clusters they belong to; while in the second layer, images are co-indexed by their own feature distributions. Third, OMC-ART enables flexible multimodal search by using either visual features, keywords, or a combination of both. Fourth, OMC-ART employs a ranking algorithm that does not need to go through the whole indexing system when only a limited number of images need to be retrieved. Experiments on two published data sets demonstrate the efficiency and effectiveness of our proposed approach.

References

J. C. Caicedo, J. BenAbdallah, F. A. González, and O. Nasraoui. Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing, 76(1):50--60, 2012. Google ScholarDigital Library
J. C. Caicedo, J. G. Moreno, E. A. Niño, and F. A. González. Combining visual features and text data for medical image retrieval using latent semantic kernels. In Proceedings of the international conference on Multimedia information retrieval, pages 359--366, 2010. Google ScholarDigital Library
P. Chandrika and C. V. Jawahar. Multi modal semantic indexing for image retrieval. In CIVR, pages 342--349, 2010. Google ScholarDigital Library
T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. NUS-WIDE: a real-world web image database from national university of singapore. In CIVR, 2009. Google ScholarDigital Library
L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253--1278, 2000. Google ScholarDigital Library
P. Duygulu, K. Barnard, J. F. de Freitas, and D. A. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV, pages 97--112, 2002. Google ScholarDigital Library
H. J. Escalante, M. Montes, and E. Sucar. Multimodal indexing based on semantic cohesion for image retrieval. Information Retrieval, 15(1):1--32, 2012. Google ScholarDigital Library
Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. In Proceedings of the European Conference on Computer Vision (ECCV), pages 529--545, 2014.Google ScholarCross Ref
M. Li, X.-B. Xue, and Z.-H. Zhou. Exploiting multi-modal interactions: A unified framework. pages 1120--1125, 2009. Google ScholarDigital Library
R. Lienhart, S. Romberg, and E. Hörster. Multilayer pLSA for multimodal image retrieval. In Proceedings of the ACM International Conference on Image and Video Retrieval, 2009. Google ScholarDigital Library
T. Mei, Y. Rui, S. Li, and Q. Tian. Multimedia search reranking: A literature survey. ACM Computing Surveys (CSUR), 46(3):38, 2014. Google ScholarDigital Library
L. Meng and A.-H. Tan. Semi-supervised hierarchical clustering for personalized web image organization. In Proceedings of International Joint Conference on Neural Networks (IJCNN), pages 1--8, 2012.Google Scholar
L. Meng and A.-H. Tan. Community discovery in social networks via heterogeneous link association and fusion. In Proceedings of the SIAM International Conference on Data Mining (SDM), pages 803--811, 2014.Google ScholarCross Ref
L. Meng, A.-H. Tan, and D. C. Wunsch. Vigilance adaptation in adaptive resonance theory. In Proceedings of International Joint Conference on Neural Networks (IJCNN), pages 1--7, 2013.Google ScholarCross Ref
L. Meng, A.-H. Tan, and D. Xu. Semi-supervised heterogeneous fusion for multimedia data co-clustering. IEEE Transactions on Knowledge and Data Engineering, 26(9):2293--2306, 2014.Google ScholarCross Ref
Y. Mu, J. Shen, and S. Yan. Weakly-supervised hashing in kernel space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3344--3351, 2010.Google ScholarCross Ref
L. Nie, M. Wang, Y. Gao, Z.-J. Zha, and T.-S. Chua. Beyond text QA: Multimedia answer generation by harvesting web information. IEEE Transactions on Multimedia, 15(2):426--441, 2013. Google ScholarDigital Library
L. Nie, M. Wang, Z.-J. Zha, G. Li, and T.-S. Chua. Multimedia answering: Enriching text QA with media information. In SIGIR, pages 695--704, 2011. Google ScholarDigital Library
A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000. Google ScholarDigital Library
J.-H. Su, B.-W. Wang, T.-Y. Hsu, C.-L. Chou, and V. S. Tseng. Multi-modal image retrieval by integrating web image annotation, concept matching and fuzzy ranking techniques. International Journal of Fuzzy Systems, 12(2):136--149, 2010.Google Scholar
F. X. Yu, R. Ji, M.-H. Tsai, G. Ye, and S.-F. Chang. Weak attributes for large-scale image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2949--2956, 2012. Google ScholarDigital Library
S. Zhang, M. Yang, X. Wang, Y. Lin, and Q. Tian. Semantic-aware co-indexing for image retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1673--1680, 2013. Google ScholarDigital Library

Index Terms

Online Multimodal Co-indexing and Retrieval of Weakly Labeled Web Image Collections
1. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Multimodal biomedical image indexing and retrieval using descriptive text and global feature mapping
Abstract
The images found within biomedical articles are sources of essential information useful for a variety of tasks. Due to the rapid growth of biomedical knowledge, image retrieval systems are increasingly becoming necessary tools for quickly ...
Read More
Mutual relevance feedback for multimodal query formulation in video retrieval
MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval

Video indexing and retrieval systems allow users to find relevant video segments for a given information need. A multimodal video index may include speech indices, a text-from-screen (OCR) index, semantic visual concepts, content-based image features, ...
Read More
Optimizing multimedia retrieval using multimodal fusion and relevance feedback techniques
MMM'12: Proceedings of the 18th international conference on Advances in Multimedia Modeling

This paper introduces a novel approach for search and retrieval of multimedia content. The proposed framework retrieves multiple media types simultaneously, namely 3D objects, 2D images and audio files, by utilizing an appropriately modified manifold ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval
June 2015
700 pages
ISBN:9781450332743
DOI:10.1145/2671188
General Chairs:
Alex Hauptmann
Carnegie Mellon University, USA
,
Chong-Wah Ngo
City University of Hong Kong, China
,
Xiangyang Xue
Fudan University, China
,
Program Chairs:
Yu-Gang Jiang
Fudan University, China
,
Cees Snoek
University of Amsterdam and Qualcomm Research Netherlands
,
Nuno Vasconcelos
University of California, San Diego, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
hierarchical image co-indexing
multimodal search
online learning
weakly supervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
ICMR '15 Paper Acceptance Rate48of127submissions,38%Overall Acceptance Rate254of830submissions,31%
More
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 139
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Online Multimodal Co-indexing and Retrieval of Weakly Labeled Web Image Collections

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multimodal biomedical image indexing and retrieval using descriptive text and global feature mapping

Mutual relevance feedback for multimodal query formulation in video retrieval

Optimizing multimedia retrieval using multimodal fusion and relevance feedback techniques