Abstract
Product annotation in videos is of great importance for video browsing, search, and advertisement. However, most of the existing automatic video annotation research focuses on the annotation of high-level concepts, such as events, scenes, and object categories. This article presents a novel solution to the annotation of specific products in videos by mining information from the Web. It collects a set of high-quality training data for each product by simultaneously leveraging Amazon and Google image search engine. A visual signature for each product is then built based on the bag-of-visual-words representation of the training images. A correlative sparsification approach is employed to remove noisy bins in the visual signatures. These signatures are used to annotate video frames. We conduct experiments on more than 1,000 videos and the results demonstrate the feasibility and effectiveness of our approach.
- Burghouts, G. J. and Geusebroek, J. M. 2009. Performance evaluation of local colour invariants. Comput. Visi. Image Understand. 113, 1, 48--62. Google ScholarDigital Library
- Chum, O., Philbin, J., Sivic, J., Isard, M., and Zisserman, A. 2007. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proceedings of the International Conference on Computer Vision.Google Scholar
- Gao, K., Lin, S., Zhang, Y., Tang, S., and Zhang, D. 2009. Logo detection based on spatial-spectral saliency and partial spatial context. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'09). 322--329. Google ScholarDigital Library
- Gao, S., Tsang, I., Chia, L.-T., and Zhao, P. 2010. Local features are not lonely: Laplacian sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3555--3561.Google Scholar
- Geng, B., Yang, L., Xu, C., and Hua, X.-S. 2008. Collaborative learning for image and video annotation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR'08). 443--450. Google ScholarDigital Library
- Guo, J., Mei, T., Liu, F., and Hua, X.-S. 2009. Adon: An intelligent overlay video advertising system. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 628--629. Google ScholarDigital Library
- Jegou, H., Douze, M., and Schmid, C. 2008. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the European Conference on Computer Vision. Google ScholarDigital Library
- Jing, Y. and Baluja, S. 2008. Pagerank for product image search. In Proceedings of the International World Wide Web Conference. Google ScholarDigital Library
- Kennedy, L. 2006. Revision of LSCOM event/activity annotations, DTO challenge workshop on large scale concept ontology for multimedia. Tech. rep., Columbia University. December.Google Scholar
- Kim, S.-J., Koh, K., Lustig, M., Boyd, S., and Gorinevsky, D. 2007. An interior-point method for large-scale l1-regularized least squares. IEEE J. Select. Topics Signal Process. 1, 4.Google ScholarCross Ref
- Kleban, J., Xie, X., and Ma, W.-Y. 2008. Spatial pyramid mining for logo detection in natural scenes. In Proceedings of the IEEE International Conference on Multimedia and Expo.Google Scholar
- Li, L.-J., Wang, G., and Fei-Fei, L. 2007. Optimol: Automatic online picture collection via incremental model learning. http://vision.stanford.edu/documents/JiaFei-FeiJCV_2009.pdf.Google Scholar
- Li, Z., Liu, J., Zhu, X., and Lu, H. 2010. Multi-Modal multi-correlation person-centric news retrieval. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM'10). 179--188. Google ScholarDigital Library
- Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Visi. 60. Google ScholarDigital Library
- Mei, T. and Hua, X.-S. 2010. Contextual internet multimedia advertising. Proc. IEEE 98, 8.Google Scholar
- Miller, G. A. 1995. Wordnet: A lexical database for english. Comm. ACM 38. Google ScholarDigital Library
- Naphade, M., Smith, J., Tesic, J., Chang, S.-F., Hsu, W., Kennedy, L., Hauptmann, A., and Curtis, J. 2006. Large-Scale concept ontology for multimedia. IEEE Multimedia 13, 3. Google ScholarDigital Library
- Natsev, A. P., Haubold, A., Tešić, J., Xie, L., and Yan, R. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the ACM International Conference on Multimedia. Google ScholarDigital Library
- Nister, D. and Stewenius, H. 2006. Scalable recognition with a vocabulary tree. In Proceedngs of IEEE International Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
- Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar
- Romberg, S., Pueyo, L. G., Lienhart, R., and van Zwol, R. 2011. Scalable logo recognition in real-world images. In ACM International Conference on Multimedia Retrieval. Google ScholarDigital Library
- Schroff, F., Criminisi, A., and Zisserman, A. 2011. Harvesting image databases from the web. IEEE Trans. Pattern Anal. Mach. Intell. 33, 4, 754--766. Google ScholarDigital Library
- Setz, A. T. and Snoek, C. G. M. 2009. Can social tagged images aid concept-based video search? In Proceedings of the IEEE International Conference on Multimedia & Expo. 1460--463. Google ScholarDigital Library
- Sivic, J. and Zisserman, A. 2003. Video google: A text retrieval approach to object matching in videos. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarDigital Library
- Smeaton, A. F., Over, P., and Kraaij, W. 2006. Evaluation campaigns and trecvid. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval. Google ScholarDigital Library
- Snoek, C. G. M. and Worring, M. 2009. Concept-Based video retrieval. Found. Trends Info. Retr. 4, 2, 215322. Google ScholarDigital Library
- Tang, S., Li, J., Li, M., Cheng, X., and Yizhi, L. 2008. Trecvid 2008 high-level feature extraction by mcg-ict-cas. In TRECVID Workshop.Google Scholar
- Ulges, A., Schulze, C., Koch, M., and Breuel, T. M. 2010. Learning automatic concept detectors from online video. Comput. Vis. Image Underst. 114, 429--438. Google ScholarDigital Library
- van de Sande, K., Gevers, T., and Snoek, C. 2010. Evaluating color descriptors for object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9, 1582--1596. Google ScholarDigital Library
- Wang, M. and Hua, X.-S. 2011. Active learning in multimedia annotation and retrieval: A survey. ACM Trans. Intell. Syst. Technol. 2, 2. Google ScholarDigital Library
- Wang, M., Hua, X.-S., Hong, R., Tang, J., Qi, G.-J., and Song, Y. 2009a. Unified video annotation via multigraph learning. IEEE Trans. Circ. Syst. Video Technol. 19, 5, 733--746. Google ScholarDigital Library
- Wang, M., Hua, X.-S., Tang, J., and Hong, R. 2009b. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Trans. Multimedia. 11, 3, 465--476. Google ScholarDigital Library
- Wang, M., Yang, K., Hua, X.-S., and Zhang, H.-J. 2010. Towards a relevant and diverse search of social images. IEEE Trans. Multimedia 12, 8, 829--842. Google ScholarDigital Library
- Wang, M., Ni, B., Hua, X. S., and Chua, T.-S. 2012. Assistive tagging: A survey of multimeida tagging with human-computer joint exploration. ACM Comput. Surev. 44, 4, article 25. Google ScholarDigital Library
- Wang, M., Yang, L., and Hua, X. 2008. Msra-mm: Bridging research and industrial societies for multimedia information retrieval. Tech. rep., MSR-TR-2009-30.Google Scholar
- Xie, X., Lu, L., Jia, M., Li, H., Seide, F., and Ma, W.-Y. 2008. Mobile search with multimodal queries. Proc. IEEE 96, 4, 589--601.Google ScholarCross Ref
- Zhou, X., Cui, N., Li, Z., Liang, F., and Huang, T. 2009. Hierarchical gaussianization for image classification. In Proceedings of the IEEE 12th International Conference on Computer Vision.Google Scholar
- Zobel, J., Moffat, A., and Ramamohanarao, K. 1998. Inverted files versus signature files for text indexing. ACM Trans. Database Syst. 23. Google ScholarDigital Library
Index Terms
In-video product annotation with web information mining
Recommendations
AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB
Personal name disambiguation is an important task in social network extraction, evaluation and integration of ontologies, information retrieval, cross-document coreference resolution and word sense disambiguation. We propose an unsupervised method to ...
Information Extraction Using Web Usage Mining, Web Scrapping and Semantic Annotation
CICN '11: Proceedings of the 2011 International Conference on Computational Intelligence and Communication NetworksExtracting useful information from the web is the most significant issue of concern for the realization of semantic web. This may be achieved by several ways among which Web Usage Mining, Web Scrapping and Semantic Annotation plays an important role. ...
Automatic Expansion of Chinese Abbreviations by Web Mining
AICI '09: Proceedings of the International Conference on Artificial Intelligence and Computational IntelligenceAbbreviations are common in everyday Chinese. For applications like information retrieval, we want not only to recognize the abbreviations, but also to know what they stand for. To tackle the emergence of all kinds of new abbreviations, this paper ...
Comments