skip to main content
research-article
Free Access

Visual object detection with deformable part models

Published:01 September 2013Publication History
Skip Abstract Section

Abstract

We describe a state-of-the-art system for finding objects in cluttered images. Our system is based on deformable models that represent objects using local part templates and geometric constraints on the locations of parts. We reduce object detection to classification with latent variables. The latent variables introduce invariances that make it possible to detect objects with highly variable appearance. We use a generalization of support vector machines to incorporate latent information during training. This has led to a general framework for discriminative training of classifiers with latent variables. Discriminative training benefits from large training datasets. In practice we use an iterative algorithm that alternates between estimating latent values for positive examples and solving a large convex optimization problem. Practical optimization of this large convex problem can be done using active set techniques for adaptive subsampling of the training data.

References

  1. Amit, Y., Trouve, A. POP: Patchwork of parts models for object recognition. Int. J. Comput. Vis. 75, 2 (2007), 267--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrews, S., Tsochantaridis, I., Hofmann, T. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems (2003), volume 15.Google ScholarGoogle Scholar
  3. Burl, M., Weber, M., Perona, P. A probabilistic approach to object recognition using local photometry and global geometry. In European Conference on Computer Vision (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cootes, T., Edwards, G., Taylor, C. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23, 6 (2001), 681--685. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Coughlan, J., Yuille, A., English, C., Snow, D. Efficient deformable template detection and localization without user initialization. Comput. Vis. Image Understand. 78, 3 (2000), 303--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Crandall, D., Felzenszwalb, P., Huttenlocher, D. Spatial priors for part-based recognition using statistical models. In IEEE Conference on Computer Vision and Pattern Recognition (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dalal, N., Triggs, B. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Desai, C., Ramanan, D., Fowlkes, C. Discriminative models of multi-class object layout. Int. J. Comput. Vis. 95, 1 (2011), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A. The PASCAL Visual Object Classes Challenges. http://www.pascal-network.org/challenges/VOC/index.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Felzenszwalb, P., Girshick, R., McAllester, D. Cascade object detection with deformable part models. In IEEE Computer Vision and Pattern Recognition (2010).Google ScholarGoogle Scholar
  11. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D. Discriminatively trained deformable part models. http://people.cs.uchicago.edu/~pff/latent/.Google ScholarGoogle Scholar
  12. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D. Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (2010), 1627--1645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Felzenszwalb, P., Huttenlocher, D. Distance transforms of sampled functions. Technical Report 2004--1963, CIS Dept., Cornell University, 2004.Google ScholarGoogle Scholar
  14. Felzenszwalb, P., Huttenlocher, D. Pictorial structures for object recognition. Int. J. Comput. Vis. 61, 1 (2005), 55--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Felzenszwalb, P., McAllester, D. Object detection grammars. Technical Report TR-2010-02, CS Dept., University of Chicago, 2010.Google ScholarGoogle Scholar
  16. Felzenszwalb, P., McAllester, D., Ramanan, D. A discriminatively trained, multiscale, deformable part model. In IEEE Conference on Computer Vision and Pattern Recognition (2008).Google ScholarGoogle ScholarCross RefCross Ref
  17. Fergus, R., Perona, P., Zisserman, A. Object class recognition by unsupervised scale-invariant learning. In IEEE Conference on Computer Vision and Pattern Recognition (2003).Google ScholarGoogle ScholarCross RefCross Ref
  18. Fischler, M., Elschlager, R. The representation and matching of pictorial structures. IEEE Trans. Comput. C-22, 1 (1973), 67--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Girshick, R., Felzenszwalb, P., McAllester, D. Object detection with grammar models. In Advances in Neural Information Processing Systems (2011), volume 24.Google ScholarGoogle Scholar
  20. Grenander, U., Chow, Y., Keenan, D. HANDS: A Pattern-Theoretic Study of Biological Shapes, Springer-Verlag, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Huttenlocher, D., Klanderman, G., Rucklidge, W. Comparing images using the hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15, 9 (1993), 850--863. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lamdan, Y. Wolfson, H. Geometric hashing: A general and efficient model-based recognition scheme. In IEEE International Conference on Computer Vision (1988).Google ScholarGoogle Scholar
  23. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  24. Lowe, D. Three-dimensional object recognition from single two-dimensional images. Artif. intell. 31, 3 (1987), 355--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Marr, D., Nishihara, H. Representation and recognition of the spatial organization of three-dimensional shapes. Proc. Roy. Soc. Lond. B Biol. Sci. 200, 1140 (1978), 269--294.Google ScholarGoogle Scholar
  26. Mundy, J., Zisserman, A., et al. Geometric Invariance in Computer Vision, volume 92, MIT press, Cambridge, MA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Murase, H., Nayar, S. Visual learning and recognition of 3-d objects from appearance. Int. J. Comput. Vis. 14, 1 (1995), 5--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Schneiderman, H., Kanade, T. A statistical method for 3D object detection applied to faces and cars. In IEEE Conference on Computer Vision and Pattern Recognition (2000).Google ScholarGoogle ScholarCross RefCross Ref
  29. Sung, K.K., Poggio, T. Example-based learning for view-based human face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1 (1998), 39--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Viola, P., Jones, M. Robust real-time face detection. Int. J. Comput. Vis. 57, 2 (2004), 137--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Weber, M., Welling, M., Perona, P. Towards automatic discovery of object categories. In IEEE Conference on Computer Vision and Pattern Recognition (2000).Google ScholarGoogle ScholarCross RefCross Ref
  32. Yang, Y., Ramanan, D. Articulated pose estimation using flexible mixtures of parts. In IEEE Conference on Computer Vision and Pattern Recognition (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yuille, A., Hallinan, P., Cohen, D. Feature extraction from faces using deformable templates. Int. J. Comput. Vis. 8, 2 (1992), 99--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zhu, X., Ramanan, D. Face detection, pose estimation, and landmark localization in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Visual object detection with deformable part models

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image Communications of the ACM
              Communications of the ACM  Volume 56, Issue 9
              September 2013
              97 pages
              ISSN:0001-0782
              EISSN:1557-7317
              DOI:10.1145/2500468
              Issue’s Table of Contents

              Copyright © 2013 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 September 2013

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Popular
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format