skip to main content
10.1145/1282280.1282361acmconferencesArticle/Chapter ViewAbstractPublication PagescivrConference Proceedingsconference-collections
Article

The feature and spatial covariant kernel: adding implicit spatial constraints to histogram

Published: 09 July 2007 Publication History

Abstract

In this paper, we are motivated to augment the holistic histogram representation with implicit spatial constrains. To be more concrete, we aim at finding a good match function for the problem of object/scene categorization which considers the spatial constraints against heavy clutter and occlusion. Our solution is a partial match kernel under the histogram representation which varies simultaneously at both the feature and spatial resolutions, named as the Feature and Spatial Covariant (FESCO) kernel. Both the FESCO kernel and its late fusion alternative achieve better match accuracy than Spatial Pyramid Match [13] and Pyramid Match [11]. We also apply the keypoint features to video indexing. And on a large scale TRECVID data sets of over 300 hours videos, to our best knowledge, this approach achieves the state-of-the-art result for a single feature.

References

[1]
A. Amir, J. Argillandery, M. Campbell, A. Haubold, G. Iyengar, S. Ebadollahi, F. Kang, M. R. Naphade, A. P. Natsev, J. R. Smith, J. Tešić, and T. Volkmer. Ibm research trecvid-2005 video retrieval system. www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.
[2]
H. Bay, T. Tuytelaars, and L. Gool. Surf: Speeded up robust features. In Proc. of ECCV 2006.
[3]
A. C. Berg, T. L. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondence. In CVPR, 2005.
[4]
G. Brown, J. Wyatt, R. Harris, and X. Yao. Diversity creation methods: a survey and categorisation. Information Fusion, 6:5--20, 2005.
[5]
S.-F. Chang, W. Hsu, W. Jiang, L. Kennedy, D. Xu, A. Yanagawa, and E. Zavesky. Columbia university trecvid-2006 video search and high-level feature extraction. www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.
[6]
D. J. Crandall and D. P. Huttenlocher. Weakly supervised learning of part-based spatial models for visual object recognition. In Proc. of ECCV, 2006.
[7]
G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, at ECCV, 2004.
[8]
L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples an incremental bayesian approach tested on 101 object categories. In Proceedings of the Workshop on Generative-Model Based Vision, Washington, DC, June 2004.
[9]
R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 264--271, Madison, Wisconsin, June 2003.
[10]
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933--969, 2003.
[11]
K. Grauman and T. Darrell. Pyramid match kernels: Discriminative classification with sets of image features (version 2). Technical Report CSAIL-TR-2006-020, MIT, 2006.
[12]
K. Grauman and T. Darrell. Approximate correspondences in high dimensions. In NIPS 19, 2007.
[13]
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. of CVPR 2006.
[14]
S. Lazebnik, C. Schmid, and J. Ponce. A maximum entropy framework for part-based texture and object recognition. In Proc. of ICCV, 2005.
[15]
B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model. In Proceedings of the Workshop on Statistical Learning in Computer Vision, Prague, Czech Republic, May 2004.
[16]
D. G. Lowe. Distinctive image features form scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004.
[17]
M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. H. A. A light scale concept ontology for multimedia understanding for trecvid 2005. 2005. www-nlpir.nist.gov/projects/tv2005/LSCOMlite_NKKCSOH.pdf.
[18]
S. Petrov, A. Faria, P. Michaillat, A. Berg, A. Stolckeand, D. Klein, and J. Malik. Detecting categories in news video using acoustic, speech, and image features. www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.
[19]
J. Philbin, A. B. O. Chum, and J.-M. Geusebroek. Oxford trecvid 2006 - notebook paper. www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.
[20]
C. G. Snoek, M. Worring, and A. W. Smeulders. Early versus late fusion in semantic video analysis. In Proc. of ACM Multimedia, 2005.
[21]
TRECVID. Trecvid home page. www-nlpir.nist.gov/projects/trecvid.
[22]
V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.
[23]
D. Wang, J. Li, and B. Zhang. Relay boost fusion for learning rare concepts in multimedia. In Proc. of CIVR 2006.
[24]
J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: A comprehensive study. In CVPR, 2006.

Cited By

View all
  • (2016)Simple Techniques Make Sense: Feature Pooling and Normalization for Image ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2015.246197826:7(1251-1264)Online publication date: 1-Jul-2016
  • (2015)Towards Effective Image Classification Using Class-Specific Codebooks and Distinctive Local FeaturesIEEE Transactions on Multimedia10.1109/TMM.2014.238831217:3(323-332)Online publication date: 1-Mar-2015
  • (2010)The Pascal Visual Object Classes (VOC) ChallengeInternational Journal of Computer Vision10.1007/s11263-009-0275-488:2(303-338)Online publication date: 1-Jun-2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrieval
July 2007
655 pages
ISBN:9781595937339
DOI:10.1145/1282280
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature and spatial covariant kernel
  2. histogram
  3. video indexing

Qualifiers

  • Article

Conference

CIVR07
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Simple Techniques Make Sense: Feature Pooling and Normalization for Image ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2015.246197826:7(1251-1264)Online publication date: 1-Jul-2016
  • (2015)Towards Effective Image Classification Using Class-Specific Codebooks and Distinctive Local FeaturesIEEE Transactions on Multimedia10.1109/TMM.2014.238831217:3(323-332)Online publication date: 1-Mar-2015
  • (2010)The Pascal Visual Object Classes (VOC) ChallengeInternational Journal of Computer Vision10.1007/s11263-009-0275-488:2(303-338)Online publication date: 1-Jun-2010
  • (2009)Histogram matching for music repetition detectionProceedings of the 2009 IEEE international conference on Multimedia and Expo10.5555/1698924.1699087(662-665)Online publication date: 28-Jun-2009
  • (2009)Automatic and instant ring tone generation based on music structure analysisProceedings of the 17th ACM international conference on Multimedia10.1145/1631272.1631364(593-596)Online publication date: 23-Oct-2009
  • (2009)Histogram matching for music repetition detection2009 IEEE International Conference on Multimedia and Expo10.1109/ICME.2009.5202583(662-665)Online publication date: Jun-2009
  • (2009)Incorporating spatial correlogram into bag-of-features model for scene categorizationProceedings of the 9th Asian conference on Computer Vision - Volume Part I10.1007/978-3-642-12307-8_31(333-342)Online publication date: 23-Sep-2009
  • (2007)Video diverProceedings of the international workshop on Workshop on multimedia information retrieval10.1145/1290082.1290094(61-70)Online publication date: 24-Sep-2007
  • (2007)Video retrieval with multi-modal featuresProceedings of the 6th ACM international conference on Image and video retrieval10.1145/1282280.1282379(652-652)Online publication date: 9-Jul-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media