Abstract
We describe a state-of-the-art system for finding objects in cluttered images. Our system is based on deformable models that represent objects using local part templates and geometric constraints on the locations of parts. We reduce object detection to classification with latent variables. The latent variables introduce invariances that make it possible to detect objects with highly variable appearance. We use a generalization of support vector machines to incorporate latent information during training. This has led to a general framework for discriminative training of classifiers with latent variables. Discriminative training benefits from large training datasets. In practice we use an iterative algorithm that alternates between estimating latent values for positive examples and solving a large convex optimization problem. Practical optimization of this large convex problem can be done using active set techniques for adaptive subsampling of the training data.
- Amit, Y., Trouve, A. POP: Patchwork of parts models for object recognition. Int. J. Comput. Vis. 75, 2 (2007), 267--282. Google ScholarDigital Library
- Andrews, S., Tsochantaridis, I., Hofmann, T. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems (2003), volume 15.Google Scholar
- Burl, M., Weber, M., Perona, P. A probabilistic approach to object recognition using local photometry and global geometry. In European Conference on Computer Vision (1998). Google ScholarDigital Library
- Cootes, T., Edwards, G., Taylor, C. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23, 6 (2001), 681--685. Google ScholarDigital Library
- Coughlan, J., Yuille, A., English, C., Snow, D. Efficient deformable template detection and localization without user initialization. Comput. Vis. Image Understand. 78, 3 (2000), 303--319. Google ScholarDigital Library
- Crandall, D., Felzenszwalb, P., Huttenlocher, D. Spatial priors for part-based recognition using statistical models. In IEEE Conference on Computer Vision and Pattern Recognition (2005). Google ScholarDigital Library
- Dalal, N., Triggs, B. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (2005). Google ScholarDigital Library
- Desai, C., Ramanan, D., Fowlkes, C. Discriminative models of multi-class object layout. Int. J. Comput. Vis. 95, 1 (2011), 1--12. Google ScholarDigital Library
- Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A. The PASCAL Visual Object Classes Challenges. http://www.pascal-network.org/challenges/VOC/index.html. Google ScholarDigital Library
- Felzenszwalb, P., Girshick, R., McAllester, D. Cascade object detection with deformable part models. In IEEE Computer Vision and Pattern Recognition (2010).Google Scholar
- Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D. Discriminatively trained deformable part models. http://people.cs.uchicago.edu/~pff/latent/.Google Scholar
- Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D. Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (2010), 1627--1645. Google ScholarDigital Library
- Felzenszwalb, P., Huttenlocher, D. Distance transforms of sampled functions. Technical Report 2004--1963, CIS Dept., Cornell University, 2004.Google Scholar
- Felzenszwalb, P., Huttenlocher, D. Pictorial structures for object recognition. Int. J. Comput. Vis. 61, 1 (2005), 55--79. Google ScholarDigital Library
- Felzenszwalb, P., McAllester, D. Object detection grammars. Technical Report TR-2010-02, CS Dept., University of Chicago, 2010.Google Scholar
- Felzenszwalb, P., McAllester, D., Ramanan, D. A discriminatively trained, multiscale, deformable part model. In IEEE Conference on Computer Vision and Pattern Recognition (2008).Google ScholarCross Ref
- Fergus, R., Perona, P., Zisserman, A. Object class recognition by unsupervised scale-invariant learning. In IEEE Conference on Computer Vision and Pattern Recognition (2003).Google ScholarCross Ref
- Fischler, M., Elschlager, R. The representation and matching of pictorial structures. IEEE Trans. Comput. C-22, 1 (1973), 67--92. Google ScholarDigital Library
- Girshick, R., Felzenszwalb, P., McAllester, D. Object detection with grammar models. In Advances in Neural Information Processing Systems (2011), volume 24.Google Scholar
- Grenander, U., Chow, Y., Keenan, D. HANDS: A Pattern-Theoretic Study of Biological Shapes, Springer-Verlag, 1991. Google ScholarDigital Library
- Huttenlocher, D., Klanderman, G., Rucklidge, W. Comparing images using the hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15, 9 (1993), 850--863. Google ScholarDigital Library
- Lamdan, Y. Wolfson, H. Geometric hashing: A general and efficient model-based recognition scheme. In IEEE International Conference on Computer Vision (1988).Google Scholar
- LeCun, Y., Bottou, L., Bengio, Y., Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
- Lowe, D. Three-dimensional object recognition from single two-dimensional images. Artif. intell. 31, 3 (1987), 355--395. Google ScholarDigital Library
- Marr, D., Nishihara, H. Representation and recognition of the spatial organization of three-dimensional shapes. Proc. Roy. Soc. Lond. B Biol. Sci. 200, 1140 (1978), 269--294.Google Scholar
- Mundy, J., Zisserman, A., et al. Geometric Invariance in Computer Vision, volume 92, MIT press, Cambridge, MA, 1992. Google ScholarDigital Library
- Murase, H., Nayar, S. Visual learning and recognition of 3-d objects from appearance. Int. J. Comput. Vis. 14, 1 (1995), 5--24. Google ScholarDigital Library
- Schneiderman, H., Kanade, T. A statistical method for 3D object detection applied to faces and cars. In IEEE Conference on Computer Vision and Pattern Recognition (2000).Google ScholarCross Ref
- Sung, K.K., Poggio, T. Example-based learning for view-based human face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1 (1998), 39--51. Google ScholarDigital Library
- Viola, P., Jones, M. Robust real-time face detection. Int. J. Comput. Vis. 57, 2 (2004), 137--154. Google ScholarDigital Library
- Weber, M., Welling, M., Perona, P. Towards automatic discovery of object categories. In IEEE Conference on Computer Vision and Pattern Recognition (2000).Google ScholarCross Ref
- Yang, Y., Ramanan, D. Articulated pose estimation using flexible mixtures of parts. In IEEE Conference on Computer Vision and Pattern Recognition (2011). Google ScholarDigital Library
- Yuille, A., Hallinan, P., Cohen, D. Feature extraction from faces using deformable templates. Int. J. Comput. Vis. 8, 2 (1992), 99--111. Google ScholarDigital Library
- Zhu, X., Ramanan, D. Face detection, pose estimation, and landmark localization in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (2012). Google ScholarDigital Library
Index Terms
- Visual object detection with deformable part models
Recommendations
Object Detection with Discriminatively Trained Part-Based Models
We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While ...
Scene recognition and weakly supervised object localization with deformable part-based models
ICCV '11: Proceedings of the 2011 International Conference on Computer VisionWeakly supervised discovery of common visual structure in highly variable, cluttered images is a key problem in recognition. We address this problem using deformable part-based models (DPM's) with latent SVM training [6]. These models have been ...
Object detection using strongly-supervised deformable part models
ECCV'12: Proceedings of the 12th European conference on Computer Vision - Volume Part IDeformable part-based models [1, 2] achieve state-of-the-art performance for object detection, but rely on heuristic initialization during training due to the optimization of non-convex cost function. This paper investigates limitations of such an ...
Comments