Article

The feature and spatial covariant kernel: adding implicit spatial constraints to histogram

Authors:

Bo ZhangAuthors Info & Claims

CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrieval

Pages 565 - 572

https://doi.org/10.1145/1282280.1282361

Published: 09 July 2007 Publication History

Abstract

In this paper, we are motivated to augment the holistic histogram representation with implicit spatial constrains. To be more concrete, we aim at finding a good match function for the problem of object/scene categorization which considers the spatial constraints against heavy clutter and occlusion. Our solution is a partial match kernel under the histogram representation which varies simultaneously at both the feature and spatial resolutions, named as the Feature and Spatial Covariant (FESCO) kernel. Both the FESCO kernel and its late fusion alternative achieve better match accuracy than Spatial Pyramid Match [13] and Pyramid Match [11]. We also apply the keypoint features to video indexing. And on a large scale TRECVID data sets of over 300 hours videos, to our best knowledge, this approach achieves the state-of-the-art result for a single feature.

References

[1]

A. Amir, J. Argillandery, M. Campbell, A. Haubold, G. Iyengar, S. Ebadollahi, F. Kang, M. R. Naphade, A. P. Natsev, J. R. Smith, J. Tešić, and T. Volkmer. Ibm research trecvid-2005 video retrieval system. www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.

[2]

H. Bay, T. Tuytelaars, and L. Gool. Surf: Speeded up robust features. In Proc. of ECCV 2006.

Digital Library

[3]

A. C. Berg, T. L. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondence. In CVPR, 2005.

Digital Library

[4]

G. Brown, J. Wyatt, R. Harris, and X. Yao. Diversity creation methods: a survey and categorisation. Information Fusion, 6:5--20, 2005.

[5]

S.-F. Chang, W. Hsu, W. Jiang, L. Kennedy, D. Xu, A. Yanagawa, and E. Zavesky. Columbia university trecvid-2006 video search and high-level feature extraction. www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.

[6]

D. J. Crandall and D. P. Huttenlocher. Weakly supervised learning of part-based spatial models for visual object recognition. In Proc. of ECCV, 2006.

Digital Library

[7]

G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, at ECCV, 2004.

[8]

L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples an incremental bayesian approach tested on 101 object categories. In Proceedings of the Workshop on Generative-Model Based Vision, Washington, DC, June 2004.

Digital Library

[9]

R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 264--271, Madison, Wisconsin, June 2003.

[10]

Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933--969, 2003.

Digital Library

[11]

K. Grauman and T. Darrell. Pyramid match kernels: Discriminative classification with sets of image features (version 2). Technical Report CSAIL-TR-2006-020, MIT, 2006.

[12]

K. Grauman and T. Darrell. Approximate correspondences in high dimensions. In NIPS 19, 2007.

[13]

S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. of CVPR 2006.

Digital Library

[14]

S. Lazebnik, C. Schmid, and J. Ponce. A maximum entropy framework for part-based texture and object recognition. In Proc. of ICCV, 2005.

Digital Library

[15]

B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model. In Proceedings of the Workshop on Statistical Learning in Computer Vision, Prague, Czech Republic, May 2004.

[16]

D. G. Lowe. Distinctive image features form scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004.

Digital Library

[17]

M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. H. A. A light scale concept ontology for multimedia understanding for trecvid 2005. 2005. www-nlpir.nist.gov/projects/tv2005/LSCOMlite_NKKCSOH.pdf.

[18]

S. Petrov, A. Faria, P. Michaillat, A. Berg, A. Stolckeand, D. Klein, and J. Malik. Detecting categories in news video using acoustic, speech, and image features. www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.

[19]

J. Philbin, A. B. O. Chum, and J.-M. Geusebroek. Oxford trecvid 2006 - notebook paper. www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.

[20]

C. G. Snoek, M. Worring, and A. W. Smeulders. Early versus late fusion in semantic video analysis. In Proc. of ACM Multimedia, 2005.

Digital Library

[21]

TRECVID. Trecvid home page. www-nlpir.nist.gov/projects/trecvid.

[22]

V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.

Digital Library

[23]

D. Wang, J. Li, and B. Zhang. Relay boost fusion for learning rare concepts in multimedia. In Proc. of CIVR 2006.

Digital Library

[24]

J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: A comprehensive study. In CVPR, 2006.

Digital Library

Cited By

Xie LTian QZhang B(2016)Simple Techniques Make Sense: Feature Pooling and Normalization for Image ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2015.246197826:7(1251-1264)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1109/TCSVT.2015.2461978
Altintakan UYazici A(2015)Towards Effective Image Classification Using Class-Specific Codebooks and Distinctive Local FeaturesIEEE Transactions on Multimedia10.1109/TMM.2014.238831217:3(323-332)Online publication date: 1-Mar-2015
https://dl.acm.org/doi/10.1109/TMM.2014.2388312
Everingham MGool LWilliams CWinn JZisserman A(2010)The Pascal Visual Object Classes (VOC) ChallengeInternational Journal of Computer Vision10.1007/s11263-009-0275-488:2(303-338)Online publication date: 1-Jun-2010
https://dl.acm.org/doi/10.1007/s11263-009-0275-4
Show More Cited By

Index Terms

The feature and spatial covariant kernel: adding implicit spatial constraints to histogram
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding
2. Information systems
  1. Information retrieval

Recommendations

Detecting grassland spatial variation by a wavelet approach

Insight into the spatial variation of an ecosystem can provide better understanding of ecological processes and patterns in different scales. Detecting these multiple scales of spatial variation in grassland landscapes is valuable for determining ...
Inferring urban land use using the optimised spatial reclassification kernel

In the 1990s, promising results in land-use classification were obtained by kernel-based contextual classification algorithms. Soon, however, it was recognised that kernel-based reclassifiers have important shortcomings and research instead focused on ...
A Histogram Descriptor Based on Co-occurrence Matrix and its Application in Cloud Image Indexing and Retrieval
IIH-MSP '09: Proceedings of the 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing

It becomes an emergent challenge how to retrieve the cloud image from a gigantic cloud image database because of the fast accumulation of digital cloud images in meteorological area. This paper puts forward the histogram descriptor based on gray level ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrieval

July 2007

655 pages

ISBN:9781595937339

DOI:10.1145/1282280

General Chairs:
Nicu Sebe
Univ. of Amsterdam, The Netherlands
,
Marcel Worring
Univ. of Amsterdam, The Netherlands

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

In-Cooperation

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CIVR07

Sponsor:

SIGMM

CIVR07: International Conference on Image and Video Retrieval 2007

July 9 - 11, 2007

Amsterdam, The Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
332
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xie LTian QZhang B(2016)Simple Techniques Make Sense: Feature Pooling and Normalization for Image ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2015.246197826:7(1251-1264)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1109/TCSVT.2015.2461978
Altintakan UYazici A(2015)Towards Effective Image Classification Using Class-Specific Codebooks and Distinctive Local FeaturesIEEE Transactions on Multimedia10.1109/TMM.2014.238831217:3(323-332)Online publication date: 1-Mar-2015
https://dl.acm.org/doi/10.1109/TMM.2014.2388312
Everingham MGool LWilliams CWinn JZisserman A(2010)The Pascal Visual Object Classes (VOC) ChallengeInternational Journal of Computer Vision10.1007/s11263-009-0275-488:2(303-338)Online publication date: 1-Jun-2010
https://dl.acm.org/doi/10.1007/s11263-009-0275-4
Tian ALi WXiao LWang DZhou JZhang T(2009)Histogram matching for music repetition detectionProceedings of the 2009 IEEE international conference on Multimedia and Expo10.5555/1698924.1699087(662-665)Online publication date: 28-Jun-2009
https://dl.acm.org/doi/10.5555/1698924.1699087
Zhang TFong CXiao LZhou JGao WRui YHanjalic AXu CSteinbach EEl Saddik AZhou M(2009)Automatic and instant ring tone generation based on music structure analysisProceedings of the 17th ACM international conference on Multimedia10.1145/1631272.1631364(593-596)Online publication date: 23-Oct-2009
https://dl.acm.org/doi/10.1145/1631272.1631364
Tian ALi WXiao LWang DZhou JZhang T(2009)Histogram matching for music repetition detection2009 IEEE International Conference on Multimedia and Expo10.1109/ICME.2009.5202583(662-665)Online publication date: Jun-2009
https://doi.org/10.1109/ICME.2009.5202583
Zheng YLu HJin CXue X(2009)Incorporating spatial correlogram into bag-of-features model for scene categorizationProceedings of the 9th Asian conference on Computer Vision - Volume Part I10.1007/978-3-642-12307-8_31(333-342)Online publication date: 23-Sep-2009
https://dl.acm.org/doi/10.1007/978-3-642-12307-8_31
Wang DLiu XLuo LLi JZhang BWang JBoujemaa NDel Bimbo ALi J(2007)Video diverProceedings of the international workshop on Workshop on multimedia information retrieval10.1145/1290082.1290094(61-70)Online publication date: 24-Sep-2007
https://dl.acm.org/doi/10.1145/1290082.1290094
Li JWang ZLi XXiao TWang DZheng WZhang BSebe NWorring M(2007)Video retrieval with multi-modal featuresProceedings of the 6th ACM international conference on Image and video retrieval10.1145/1282280.1282379(652-652)Online publication date: 9-Jul-2007
https://dl.acm.org/doi/10.1145/1282280.1282379

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten