ABSTRACT
Weak supervisory information of web images, such as captions, tags, and descriptions, make it possible to better understand images at the semantic level. In this paper, we propose a novel online multimodal co-indexing algorithm based on Adaptive Resonance Theory, named OMC-ART, for the automatic co-indexing and retrieval of images using their multimodal information. Compared with existing studies, OMC-ART has several distinct characteristics. First, OMC-ART is able to perform online learning of sequential data. Second, OMC-ART builds a two-layer indexing structure, in which the first layer co-indexes the images by the key visual and textual features based on the generalized distributions of clusters they belong to; while in the second layer, images are co-indexed by their own feature distributions. Third, OMC-ART enables flexible multimodal search by using either visual features, keywords, or a combination of both. Fourth, OMC-ART employs a ranking algorithm that does not need to go through the whole indexing system when only a limited number of images need to be retrieved. Experiments on two published data sets demonstrate the efficiency and effectiveness of our proposed approach.
- J. C. Caicedo, J. BenAbdallah, F. A. González, and O. Nasraoui. Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing, 76(1):50--60, 2012. Google ScholarDigital Library
- J. C. Caicedo, J. G. Moreno, E. A. Niño, and F. A. González. Combining visual features and text data for medical image retrieval using latent semantic kernels. In Proceedings of the international conference on Multimedia information retrieval, pages 359--366, 2010. Google ScholarDigital Library
- P. Chandrika and C. V. Jawahar. Multi modal semantic indexing for image retrieval. In CIVR, pages 342--349, 2010. Google ScholarDigital Library
- T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. NUS-WIDE: a real-world web image database from national university of singapore. In CIVR, 2009. Google ScholarDigital Library
- L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253--1278, 2000. Google ScholarDigital Library
- P. Duygulu, K. Barnard, J. F. de Freitas, and D. A. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV, pages 97--112, 2002. Google ScholarDigital Library
- H. J. Escalante, M. Montes, and E. Sucar. Multimodal indexing based on semantic cohesion for image retrieval. Information Retrieval, 15(1):1--32, 2012. Google ScholarDigital Library
- Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. In Proceedings of the European Conference on Computer Vision (ECCV), pages 529--545, 2014.Google ScholarCross Ref
- M. Li, X.-B. Xue, and Z.-H. Zhou. Exploiting multi-modal interactions: A unified framework. pages 1120--1125, 2009. Google ScholarDigital Library
- R. Lienhart, S. Romberg, and E. Hörster. Multilayer pLSA for multimodal image retrieval. In Proceedings of the ACM International Conference on Image and Video Retrieval, 2009. Google ScholarDigital Library
- T. Mei, Y. Rui, S. Li, and Q. Tian. Multimedia search reranking: A literature survey. ACM Computing Surveys (CSUR), 46(3):38, 2014. Google ScholarDigital Library
- L. Meng and A.-H. Tan. Semi-supervised hierarchical clustering for personalized web image organization. In Proceedings of International Joint Conference on Neural Networks (IJCNN), pages 1--8, 2012.Google Scholar
- L. Meng and A.-H. Tan. Community discovery in social networks via heterogeneous link association and fusion. In Proceedings of the SIAM International Conference on Data Mining (SDM), pages 803--811, 2014.Google ScholarCross Ref
- L. Meng, A.-H. Tan, and D. C. Wunsch. Vigilance adaptation in adaptive resonance theory. In Proceedings of International Joint Conference on Neural Networks (IJCNN), pages 1--7, 2013.Google ScholarCross Ref
- L. Meng, A.-H. Tan, and D. Xu. Semi-supervised heterogeneous fusion for multimedia data co-clustering. IEEE Transactions on Knowledge and Data Engineering, 26(9):2293--2306, 2014.Google ScholarCross Ref
- Y. Mu, J. Shen, and S. Yan. Weakly-supervised hashing in kernel space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3344--3351, 2010.Google ScholarCross Ref
- L. Nie, M. Wang, Y. Gao, Z.-J. Zha, and T.-S. Chua. Beyond text QA: Multimedia answer generation by harvesting web information. IEEE Transactions on Multimedia, 15(2):426--441, 2013. Google ScholarDigital Library
- L. Nie, M. Wang, Z.-J. Zha, G. Li, and T.-S. Chua. Multimedia answering: Enriching text QA with media information. In SIGIR, pages 695--704, 2011. Google ScholarDigital Library
- A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000. Google ScholarDigital Library
- J.-H. Su, B.-W. Wang, T.-Y. Hsu, C.-L. Chou, and V. S. Tseng. Multi-modal image retrieval by integrating web image annotation, concept matching and fuzzy ranking techniques. International Journal of Fuzzy Systems, 12(2):136--149, 2010.Google Scholar
- F. X. Yu, R. Ji, M.-H. Tsai, G. Ye, and S.-F. Chang. Weak attributes for large-scale image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2949--2956, 2012. Google ScholarDigital Library
- S. Zhang, M. Yang, X. Wang, Y. Lin, and Q. Tian. Semantic-aware co-indexing for image retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1673--1680, 2013. Google ScholarDigital Library
Index Terms
- Online Multimodal Co-indexing and Retrieval of Weakly Labeled Web Image Collections
Recommendations
Multimodal biomedical image indexing and retrieval using descriptive text and global feature mapping
AbstractThe images found within biomedical articles are sources of essential information useful for a variety of tasks. Due to the rapid growth of biomedical knowledge, image retrieval systems are increasingly becoming necessary tools for quickly ...
Mutual relevance feedback for multimodal query formulation in video retrieval
MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrievalVideo indexing and retrieval systems allow users to find relevant video segments for a given information need. A multimodal video index may include speech indices, a text-from-screen (OCR) index, semantic visual concepts, content-based image features, ...
Optimizing multimedia retrieval using multimodal fusion and relevance feedback techniques
MMM'12: Proceedings of the 18th international conference on Advances in Multimedia ModelingThis paper introduces a novel approach for search and retrieval of multimedia content. The proposed framework retrieves multiple media types simultaneously, namely 3D objects, 2D images and audio files, by utilizing an appropriately modified manifold ...
Comments