skip to main content
10.1145/1553374.1553453acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

Published:14 June 2009Publication History

ABSTRACT

There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our model can perform hierarchical (bottom-up and top-down) inference over full-sized images.

References

  1. Bell, A. J., & Sejnowski, T. J. (1997). The 'independent components' of natural scenes are edge filters. Vision Research, 37, 3327--3338.Google ScholarGoogle ScholarCross RefCross Ref
  2. Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2006). Greedy layer-wise training of deep networks. Adv. in Neural Information Processing Systems.Google ScholarGoogle Scholar
  3. Berg, A. C., Berg, T. L., & Malik, J. (2005). Shape matching and object recognition using low distortion correspondence. IEEE Conference on Computer Vision and Pattern Recognition (pp. 26--33). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Desjardins, G., & Bengio, Y. (2008). Empirical evaluation of convolutional RBMs for vision (Technical Report).Google ScholarGoogle Scholar
  5. Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. CVPR Workshop on Gen.-Model Based Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Grosse, R., Raina, R., Kwong, H., & Ng, A. (2007). Shift-invariant sparse coding for audio classification. Proceedings of the Conference on Uncertainty in AI.Google ScholarGoogle Scholar
  7. Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771--1800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527--1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hinton, G. E., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.Google ScholarGoogle ScholarCross RefCross Ref
  10. Ito, M., & Komatsu, H. (2004). Representation of angles embedded within contour stimuli in area V2 of macaque monkeys. J. Neurosci., 24, 3313--3324.Google ScholarGoogle ScholarCross RefCross Ref
  11. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1, 541--551. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lee, H., Ekanadham, C., & Ng, A. Y. (2008). Sparse deep belief network model for visual area V2. Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  14. Lee, T. S., & Mumford, D. (2003). Hierarchical bayesian inference in the visual cortex. Journal of the Optical Society of America A, 20, 1434--1448.Google ScholarGoogle ScholarCross RefCross Ref
  15. Mutch, J., & Lowe, D. G. (2006). Multiclass object recognition with sparse, localized features. IEEE Conf. on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607--609.Google ScholarGoogle ScholarCross RefCross Ref
  17. Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: Transfer learning from unlabeled data. International Conference on Machine Learning (pp. 759--766). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Raina, R., Madhavan, A., & Ng, A. Y. (2009). Large-scale deep unsupervised learning using graphics processors. International Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ranzato, M., Huang, F.-J., Boureau, Y.-L., & LeCun, Y. (2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  20. Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2006). Efficient learning of sparse representations with an energy-based model. Advances in Neural Information Processing Systems (pp. 1137--1144).Google ScholarGoogle Scholar
  21. Taylor, G., Hinton, G. E., & Roweis, S. (2007). Modeling human motion using binary latent variables. Adv. in Neural Information Processing Systems.Google ScholarGoogle Scholar
  22. Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. International Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  23. Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yu, K., Xu, W., & Gong, Y. (2009). Deep learning with kernel regularization for visual recognition. Adv. Neural Information Processing Systems.Google ScholarGoogle Scholar
  25. Zhang, H., Berg, A. C., Maire, M., & Malik, J. (2006). SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
    June 2009
    1331 pages
    ISBN:9781605585161
    DOI:10.1145/1553374

    Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 June 2009

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate140of548submissions,26%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader