skip to main content
research-article

In-video product annotation with web information mining

Published:30 November 2012Publication History
Skip Abstract Section

Abstract

Product annotation in videos is of great importance for video browsing, search, and advertisement. However, most of the existing automatic video annotation research focuses on the annotation of high-level concepts, such as events, scenes, and object categories. This article presents a novel solution to the annotation of specific products in videos by mining information from the Web. It collects a set of high-quality training data for each product by simultaneously leveraging Amazon and Google image search engine. A visual signature for each product is then built based on the bag-of-visual-words representation of the training images. A correlative sparsification approach is employed to remove noisy bins in the visual signatures. These signatures are used to annotate video frames. We conduct experiments on more than 1,000 videos and the results demonstrate the feasibility and effectiveness of our approach.

References

  1. Burghouts, G. J. and Geusebroek, J. M. 2009. Performance evaluation of local colour invariants. Comput. Visi. Image Understand. 113, 1, 48--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chum, O., Philbin, J., Sivic, J., Isard, M., and Zisserman, A. 2007. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proceedings of the International Conference on Computer Vision.Google ScholarGoogle Scholar
  3. Gao, K., Lin, S., Zhang, Y., Tang, S., and Zhang, D. 2009. Logo detection based on spatial-spectral saliency and partial spatial context. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'09). 322--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gao, S., Tsang, I., Chia, L.-T., and Zhao, P. 2010. Local features are not lonely: Laplacian sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3555--3561.Google ScholarGoogle Scholar
  5. Geng, B., Yang, L., Xu, C., and Hua, X.-S. 2008. Collaborative learning for image and video annotation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR'08). 443--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Guo, J., Mei, T., Liu, F., and Hua, X.-S. 2009. Adon: An intelligent overlay video advertising system. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 628--629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jegou, H., Douze, M., and Schmid, C. 2008. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the European Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jing, Y. and Baluja, S. 2008. Pagerank for product image search. In Proceedings of the International World Wide Web Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kennedy, L. 2006. Revision of LSCOM event/activity annotations, DTO challenge workshop on large scale concept ontology for multimedia. Tech. rep., Columbia University. December.Google ScholarGoogle Scholar
  10. Kim, S.-J., Koh, K., Lustig, M., Boyd, S., and Gorinevsky, D. 2007. An interior-point method for large-scale l1-regularized least squares. IEEE J. Select. Topics Signal Process. 1, 4.Google ScholarGoogle ScholarCross RefCross Ref
  11. Kleban, J., Xie, X., and Ma, W.-Y. 2008. Spatial pyramid mining for logo detection in natural scenes. In Proceedings of the IEEE International Conference on Multimedia and Expo.Google ScholarGoogle Scholar
  12. Li, L.-J., Wang, G., and Fei-Fei, L. 2007. Optimol: Automatic online picture collection via incremental model learning. http://vision.stanford.edu/documents/JiaFei-FeiJCV_2009.pdf.Google ScholarGoogle Scholar
  13. Li, Z., Liu, J., Zhu, X., and Lu, H. 2010. Multi-Modal multi-correlation person-centric news retrieval. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM'10). 179--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Visi. 60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mei, T. and Hua, X.-S. 2010. Contextual internet multimedia advertising. Proc. IEEE 98, 8.Google ScholarGoogle Scholar
  16. Miller, G. A. 1995. Wordnet: A lexical database for english. Comm. ACM 38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Naphade, M., Smith, J., Tesic, J., Chang, S.-F., Hsu, W., Kennedy, L., Hauptmann, A., and Curtis, J. 2006. Large-Scale concept ontology for multimedia. IEEE Multimedia 13, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Natsev, A. P., Haubold, A., Tešić, J., Xie, L., and Yan, R. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nister, D. and Stewenius, H. 2006. Scalable recognition with a vocabulary tree. In Proceedngs of IEEE International Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  21. Romberg, S., Pueyo, L. G., Lienhart, R., and van Zwol, R. 2011. Scalable logo recognition in real-world images. In ACM International Conference on Multimedia Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Schroff, F., Criminisi, A., and Zisserman, A. 2011. Harvesting image databases from the web. IEEE Trans. Pattern Anal. Mach. Intell. 33, 4, 754--766. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Setz, A. T. and Snoek, C. G. M. 2009. Can social tagged images aid concept-based video search? In Proceedings of the IEEE International Conference on Multimedia & Expo. 1460--463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sivic, J. and Zisserman, A. 2003. Video google: A text retrieval approach to object matching in videos. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Smeaton, A. F., Over, P., and Kraaij, W. 2006. Evaluation campaigns and trecvid. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Snoek, C. G. M. and Worring, M. 2009. Concept-Based video retrieval. Found. Trends Info. Retr. 4, 2, 215322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tang, S., Li, J., Li, M., Cheng, X., and Yizhi, L. 2008. Trecvid 2008 high-level feature extraction by mcg-ict-cas. In TRECVID Workshop.Google ScholarGoogle Scholar
  28. Ulges, A., Schulze, C., Koch, M., and Breuel, T. M. 2010. Learning automatic concept detectors from online video. Comput. Vis. Image Underst. 114, 429--438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. van de Sande, K., Gevers, T., and Snoek, C. 2010. Evaluating color descriptors for object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9, 1582--1596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wang, M. and Hua, X.-S. 2011. Active learning in multimedia annotation and retrieval: A survey. ACM Trans. Intell. Syst. Technol. 2, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Wang, M., Hua, X.-S., Hong, R., Tang, J., Qi, G.-J., and Song, Y. 2009a. Unified video annotation via multigraph learning. IEEE Trans. Circ. Syst. Video Technol. 19, 5, 733--746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wang, M., Hua, X.-S., Tang, J., and Hong, R. 2009b. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Trans. Multimedia. 11, 3, 465--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wang, M., Yang, K., Hua, X.-S., and Zhang, H.-J. 2010. Towards a relevant and diverse search of social images. IEEE Trans. Multimedia 12, 8, 829--842. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Wang, M., Ni, B., Hua, X. S., and Chua, T.-S. 2012. Assistive tagging: A survey of multimeida tagging with human-computer joint exploration. ACM Comput. Surev. 44, 4, article 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wang, M., Yang, L., and Hua, X. 2008. Msra-mm: Bridging research and industrial societies for multimedia information retrieval. Tech. rep., MSR-TR-2009-30.Google ScholarGoogle Scholar
  36. Xie, X., Lu, L., Jia, M., Li, H., Seide, F., and Ma, W.-Y. 2008. Mobile search with multimodal queries. Proc. IEEE 96, 4, 589--601.Google ScholarGoogle ScholarCross RefCross Ref
  37. Zhou, X., Cui, N., Li, Z., Liang, F., and Huang, T. 2009. Hierarchical gaussianization for image classification. In Proceedings of the IEEE 12th International Conference on Computer Vision.Google ScholarGoogle Scholar
  38. Zobel, J., Moffat, A., and Ramamohanarao, K. 1998. Inverted files versus signature files for text indexing. ACM Trans. Database Syst. 23. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. In-video product annotation with web information mining

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 8, Issue 4
        November 2012
        139 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/2379790
        Issue’s Table of Contents

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 November 2012
        • Revised: 1 August 2011
        • Accepted: 1 August 2011
        • Received: 1 February 2011
        Published in tomm Volume 8, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader