research-article

Herding dynamical weights to learn

Author:
Max Welling

University of California, Irvine, CA

University of California, Irvine, CA
View Profile

ICML '09: Proceedings of the 26th Annual International Conference on Machine LearningJune 2009Pages 1121–1128https://doi.org/10.1145/1553374.1553517

Published:14 June 2009Publication History

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 1121–1128

ABSTRACT

A new "herding" algorithm is proposed which directly converts observed moments into a sequence of pseudo-samples. The pseudo-samples respect the moment constraints and may be used to estimate (unobserved) quantities of interest. The procedure allows us to sidestep the usual approach of first learning a joint model (which is intractable) and then sampling from that model (which can easily get stuck in a local mode). Moreover, the algorithm is fully deterministic, avoiding random number generation) and does not need expensive operations such as exponentiation.

References

Besag, J. (1977). Efficiency of pseudo-likelihood estimation for simple Gaussian fields. Biometrika, 64, 616--618.Google ScholarCross Ref
Ganapathi, V., Vickrey, D., Duchi, J., & Koller, D. (2008). Constrained approximate maximum entropy learning. Proceedings of the Twenty-fourth Conference on Uncertainty in AI (pp. 196--203).Google Scholar
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721--741.Google ScholarDigital Library
Hinton, G. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771--1800. Google ScholarDigital Library
Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79, 2554--2558.Google ScholarCross Ref
Hyvarinen, A. (2005). Estimation of non-normalized statistical models using score matching. Journal of Machine Learning Research, 6, 695--709. Google ScholarDigital Library
Jaynes, E. (1957). Information theory and statistical mechanics. Physical Review, 106, 620--630.Google ScholarCross Ref
Lafferty, J. (1999). Additive models, boosting, and inference for generalized divergences. COLT: Proceedings of the Workshop on Computational Learning Theory (pp. 125--133). Google ScholarDigital Library
Lebanon, G., & Lafferty, J. (2002). Boosting and maximum likelihood for exponential models. Neural Information Processing Systems (pp. 447--454).Google Scholar
Levina, A., Herrmann, J., & Geisel, T. (2007). Dynamical synapses causing self-organized criticality in neural networks. Nature Physics, 3, 857--860.Google ScholarCross Ref
Maass, W., & Zador, A. M. (1998). Dynamic stochastic synapses as computational units. Advances in Neural Information Processing Systems (pp. 903--917). MIT Press. Google ScholarDigital Library
Murray, I., & Ghahramani, Z. (2004). Bayesian learning in undirected graphical models: approximate MCMC algorithms. Proceedings of the 14th Annual Conference on Uncertainty in AI (pp. 392--399). Google ScholarDigital Library
Neal, R. (1992). Connectionist learning of belief networks. Articial Intelligence, 56, 71--113. Google ScholarDigital Library
Pantic, L., Torres, J., Kappen, H., & Gielen, C. (2002). Associative memory with dynamic synapses. Neural Computation, 14, 2903--2923. Google ScholarDigital Library
Parise, S., & Welling, M. (2005). Learning in markov random fields: An empirical study. Proc. of the Joint Statistical Meeting.Google Scholar
Teh, Y., & Welling, M. (2002). The unified propagation and scaling algorithm. Neural Information Processing Systems (pp. 953--960).Google Scholar
Tieleman, T. (2008). Training restricted boltzmann machines using approximations to the likelihood gradient. Proceedings of the International Conference on Machine Learning (pp. 1064--1071). Google ScholarDigital Library
Welling, M., & Parise, S. (2006). Bayesian random fields: The Bethe-Laplace approximation. Proc. of the Conf. on Uncertainty in Artificial Intelligence (pp. 512--519).Google Scholar
Younes, L. (1999). On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics An International Journal of Probability and Stochastic Processes, 65, 177--228.Google Scholar
Yuille, A. (2004). The convergence of contrastive divergences. Advances in Neural Information Processing Systems (pp. 1593--1600).Google Scholar
Zhu, S., & Liu, X. (2002). Learning in Gibbsian fields: How accurate and how fast can it be? IEEE Trans. on Pattern Analysis and Machine Intelligence, 24, 1001--1006. Google ScholarDigital Library
Zhu, S., Wu, Z., & Mumford, D. (1997). Minimax entropy principle and its application to texture modeling. Neural Computation, 9, 1627--1660. Google ScholarDigital Library

Index Terms

Herding dynamical weights to learn
1. Computing methodologies
  1. Machine learning
  2. Modeling and simulation
    1. Model development and analysis
      1. Model verification and validation
      2. Modeling methodologies
2. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic algorithms
    2. Probabilistic reasoning algorithms
      1. Markov-chain Monte Carlo methods
      2. Sequential Monte Carlo methods

Recommendations

Herding: driving deterministic dynamics to learn and sample probabilistic models
Read More
Optimally-weighted herding is Bayesian quadrature
UAI'12: Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence

Herding and kernel herding are deterministic methods of choosing samples which summarise a probability distribution. A related task is choosing samples for estimating integrals using Bayesian quadrature. We show that the criterion minimised when ...
Read More
Entropic herding
Abstract
Herding is a deterministic algorithm used to generate data points regarded as random samples satisfying input moment conditions. This algorithm is based on a high-dimensional dynamical system and rooted in the maximum entropy principle of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374
General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University
Copyright © 2009 Copyright 2009 by the author(s)/owner(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 136
  Total Citations
  View Citations
- 769
  Total Downloads
- Downloads (Last 12 months)159
- Downloads (Last 6 weeks)37
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Herding dynamical weights to learn

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Herding: driving deterministic dynamics to learn and sample probabilistic models

Optimally-weighted herding is Bayesian quadrature

Entropic herding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Herding dynamical weights to learn

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Herding: driving deterministic dynamics to learn and sample probabilistic models

Optimally-weighted herding is Bayesian quadrature

Entropic herding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media