skip to main content
10.1145/1553374.1553517acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Herding dynamical weights to learn

Published:14 June 2009Publication History

ABSTRACT

A new "herding" algorithm is proposed which directly converts observed moments into a sequence of pseudo-samples. The pseudo-samples respect the moment constraints and may be used to estimate (unobserved) quantities of interest. The procedure allows us to sidestep the usual approach of first learning a joint model (which is intractable) and then sampling from that model (which can easily get stuck in a local mode). Moreover, the algorithm is fully deterministic, avoiding random number generation) and does not need expensive operations such as exponentiation.

References

  1. Besag, J. (1977). Efficiency of pseudo-likelihood estimation for simple Gaussian fields. Biometrika, 64, 616--618.Google ScholarGoogle ScholarCross RefCross Ref
  2. Ganapathi, V., Vickrey, D., Duchi, J., & Koller, D. (2008). Constrained approximate maximum entropy learning. Proceedings of the Twenty-fourth Conference on Uncertainty in AI (pp. 196--203).Google ScholarGoogle Scholar
  3. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721--741.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hinton, G. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771--1800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79, 2554--2558.Google ScholarGoogle ScholarCross RefCross Ref
  6. Hyvarinen, A. (2005). Estimation of non-normalized statistical models using score matching. Journal of Machine Learning Research, 6, 695--709. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jaynes, E. (1957). Information theory and statistical mechanics. Physical Review, 106, 620--630.Google ScholarGoogle ScholarCross RefCross Ref
  8. Lafferty, J. (1999). Additive models, boosting, and inference for generalized divergences. COLT: Proceedings of the Workshop on Computational Learning Theory (pp. 125--133). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lebanon, G., & Lafferty, J. (2002). Boosting and maximum likelihood for exponential models. Neural Information Processing Systems (pp. 447--454).Google ScholarGoogle Scholar
  10. Levina, A., Herrmann, J., & Geisel, T. (2007). Dynamical synapses causing self-organized criticality in neural networks. Nature Physics, 3, 857--860.Google ScholarGoogle ScholarCross RefCross Ref
  11. Maass, W., & Zador, A. M. (1998). Dynamic stochastic synapses as computational units. Advances in Neural Information Processing Systems (pp. 903--917). MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Murray, I., & Ghahramani, Z. (2004). Bayesian learning in undirected graphical models: approximate MCMC algorithms. Proceedings of the 14th Annual Conference on Uncertainty in AI (pp. 392--399). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Neal, R. (1992). Connectionist learning of belief networks. Articial Intelligence, 56, 71--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Pantic, L., Torres, J., Kappen, H., & Gielen, C. (2002). Associative memory with dynamic synapses. Neural Computation, 14, 2903--2923. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Parise, S., & Welling, M. (2005). Learning in markov random fields: An empirical study. Proc. of the Joint Statistical Meeting.Google ScholarGoogle Scholar
  16. Teh, Y., & Welling, M. (2002). The unified propagation and scaling algorithm. Neural Information Processing Systems (pp. 953--960).Google ScholarGoogle Scholar
  17. Tieleman, T. (2008). Training restricted boltzmann machines using approximations to the likelihood gradient. Proceedings of the International Conference on Machine Learning (pp. 1064--1071). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Welling, M., & Parise, S. (2006). Bayesian random fields: The Bethe-Laplace approximation. Proc. of the Conf. on Uncertainty in Artificial Intelligence (pp. 512--519).Google ScholarGoogle Scholar
  19. Younes, L. (1999). On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics An International Journal of Probability and Stochastic Processes, 65, 177--228.Google ScholarGoogle Scholar
  20. Yuille, A. (2004). The convergence of contrastive divergences. Advances in Neural Information Processing Systems (pp. 1593--1600).Google ScholarGoogle Scholar
  21. Zhu, S., & Liu, X. (2002). Learning in Gibbsian fields: How accurate and how fast can it be? IEEE Trans. on Pattern Analysis and Machine Intelligence, 24, 1001--1006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zhu, S., Wu, Z., & Mumford, D. (1997). Minimax entropy principle and its application to texture modeling. Neural Computation, 9, 1627--1660. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Herding dynamical weights to learn

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
                June 2009
                1331 pages
                ISBN:9781605585161
                DOI:10.1145/1553374

                Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 14 June 2009

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                Overall Acceptance Rate140of548submissions,26%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader