ABSTRACT
There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our model can perform hierarchical (bottom-up and top-down) inference over full-sized images.
- Bell, A. J., & Sejnowski, T. J. (1997). The 'independent components' of natural scenes are edge filters. Vision Research, 37, 3327--3338.Google ScholarCross Ref
- Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2006). Greedy layer-wise training of deep networks. Adv. in Neural Information Processing Systems.Google Scholar
- Berg, A. C., Berg, T. L., & Malik, J. (2005). Shape matching and object recognition using low distortion correspondence. IEEE Conference on Computer Vision and Pattern Recognition (pp. 26--33). Google ScholarDigital Library
- Desjardins, G., & Bengio, Y. (2008). Empirical evaluation of convolutional RBMs for vision (Technical Report).Google Scholar
- Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. CVPR Workshop on Gen.-Model Based Vision. Google ScholarDigital Library
- Grosse, R., Raina, R., Kwong, H., & Ng, A. (2007). Shift-invariant sparse coding for audio classification. Proceedings of the Conference on Uncertainty in AI.Google Scholar
- Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771--1800. Google ScholarDigital Library
- Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527--1554. Google ScholarDigital Library
- Hinton, G. E., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.Google ScholarCross Ref
- Ito, M., & Komatsu, H. (2004). Representation of angles embedded within contour stimuli in area V2 of macaque monkeys. J. Neurosci., 24, 3313--3324.Google ScholarCross Ref
- Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
- LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1, 541--551. Google ScholarDigital Library
- Lee, H., Ekanadham, C., & Ng, A. Y. (2008). Sparse deep belief network model for visual area V2. Advances in Neural Information Processing Systems.Google Scholar
- Lee, T. S., & Mumford, D. (2003). Hierarchical bayesian inference in the visual cortex. Journal of the Optical Society of America A, 20, 1434--1448.Google ScholarCross Ref
- Mutch, J., & Lowe, D. G. (2006). Multiclass object recognition with sparse, localized features. IEEE Conf. on Computer Vision and Pattern Recognition. Google ScholarDigital Library
- Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607--609.Google ScholarCross Ref
- Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: Transfer learning from unlabeled data. International Conference on Machine Learning (pp. 759--766). Google ScholarDigital Library
- Raina, R., Madhavan, A., & Ng, A. Y. (2009). Large-scale deep unsupervised learning using graphics processors. International Conf. on Machine Learning. Google ScholarDigital Library
- Ranzato, M., Huang, F.-J., Boureau, Y.-L., & LeCun, Y. (2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2006). Efficient learning of sparse representations with an energy-based model. Advances in Neural Information Processing Systems (pp. 1137--1144).Google Scholar
- Taylor, G., Hinton, G. E., & Roweis, S. (2007). Modeling human motion using binary latent variables. Adv. in Neural Information Processing Systems.Google Scholar
- Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. International Conference on Computer Vision.Google ScholarCross Ref
- Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. International Conference on Machine Learning. Google ScholarDigital Library
- Yu, K., Xu, W., & Gong, Y. (2009). Deep learning with kernel regularization for visual recognition. Adv. Neural Information Processing Systems.Google Scholar
- Zhang, H., Berg, A. C., Maire, M., & Malik, J. (2006). SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
Recommendations
Unsupervised learning of hierarchical representations with convolutional deep belief networks
There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks (DBNs); however, scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we ...
ImageNet classification with deep convolutional neural networks
NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% ...
Generative adversarial networks
Generative adversarial networks are a kind of artificial intelligence algorithm designed to solve the generative modeling problem. The goal of a generative model is to study a collection of training examples and learn the probability distribution that ...
Comments