ABSTRACT
In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the well-established Vapnik-Chervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior (functional form) of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory.
- 1.S. Amari, N. Fujita, and S. Shinomoto. Four types of learning curves. Neural Computation, 4(4):605-618, 1992.]] Google ScholarDigital Library
- 2.E.B. Baum and Y.-D. Lyuu. The transition to perfect generalization in perceptrons. Neural Comput., 3:386-401, 1991.]]Google ScholarCross Ref
- 3.G. Benedek and A. Itai. Learnability with respect to fixed distributions. Theoret. Comput. Sc~., 86(2):377-389, 1991.]] Google ScholarDigital Library
- 4.D. Cohn and G. Tesauro. How tight are the Vapnik- Chervonenkis bounds. Neural Comput., 4:249-269, 1992.]] Google ScholarDigital Library
- 5.L. Devroye and G. Lugosi. Lower bounds in pattern recognition and learning. 1994. Preprint.]]Google Scholar
- 6.R. M. Dudley. Centrallimit theorems for emplricalmeasures. Annals of Probability, 6(6):899-929, 1978.]]Google ScholarCross Ref
- 7.A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computatwn, 82(3):247-251~ 1989.]] Google ScholarDigital Library
- 8.A. Engel and W. Fink. Statistical mechanics calculaOon of Vapnik Chervonenkis bounds for perceptrons. J. Phys., 26:6893-6914, 1993.]]Google Scholar
- 9.A. Engel and C. van den Broeck. Systems that can learn from examples: replica calculation of uniform convergence bounds for the perceptron. Phys. Rev. Lett., 71:1772-1775, 1993.]]Google ScholarCross Ref
- 10.E. Gardner. The space of interactions in neural network models. J. Phys., A21:257-270, 1988.]]Google Scholar
- 11.E. Gardner and B. Derrida. Three unfinished works on the optimal storage capacity of networks. J. Phys., A22:1983- 1994, 1989.]]Google Scholar
- 12.S. A. Goldman, M. J. Kearns, and R. E. Schapire. On the sample complexity of weak learning. In Proceedings of the 3rd Workshop on Computational Learning Theory, pages 217-231. Morgan Kaufmann, San Mateo, CA, 1990.]] Google ScholarDigital Library
- 13.G. GySrgyi. First-order transition to perfect generalization in a neural network with binary synapses. Phys. Ray., A41:7097-7100, 1990.]]Google Scholar
- 14.D. Haussler. Decision-theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78-150~ 1992.]] Google ScholarDigital Library
- 15.D. Haussler, M. Kearns, and R. E. Schapire. Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. In Proceedings of the gth Workshop on Computational Learning Theory, pages 61-74. Morgan Kaufmann, San Mateo, CA, 1991.]] Google ScholarDigital Library
- 16.E. Levin, N. Tishby, and S. Solla. A statistical approach to learning and generalization in neural networks. In R. Rivest, editor, Proc. 3rd Annu. Workshop on Comput. Learning Theory. Morgan Kanfmann, 1989.]] Google ScholarDigital Library
- 17.Y.-D. Lyuu and I. Rivin. Tight bounds on transition to perfect generalization in perceptrons. Neural Comput., 4:854- 862, 1992.]] Google ScholarDigital Library
- 18.G. L. Martin and J. A. Pittman. Recognizing hand-printed letters and digits using backpropagation learning. Neural Comput., 3:258-267, 1991.]] Google ScholarCross Ref
- 19.D. Pollard. Convergence of Stochastic Processes. Springer- Verlag, 1984.]]Google ScholarCross Ref
- 20.D. B. Schwartz, V. K. Samalam, J. S. Denker, and S. A. Solla. Exhaustive learning. Neural Comput., 2:374-385, 1990.]]Google ScholarDigital Library
- 21.H. S. Setmg, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Physical Review, A45:6056-6091, 1992.]]Google Scholar
- 22.H. U. Simon. General bounds on the number of examples needed for learning probabilistic concepts. In Proceedings of the 6th Annual ACM Conference on Computatwnal Learning Theory, pages 402-411. ACM Press, New York, NY, 1993.]] Google ScholarDigital Library
- 23.H. Sompolinsky, H. S. Seung, and N. Tishby. Learning curves in large neural networks. In Proc..ith Annu. Workshop on Comput. Learning Theory, pages 112-127. Morgan Kaufmann, San Mateo, CA, 1991.]] Google ScholarDigital Library
- 24.H. Sompolinsky, N. Tishby, and H. S. Seung. Learning from examples in large neural networks. Phys. Rev. Lett., 65(13):1683-1686, 1990.]]Google ScholarCross Ref
- 25.V. Vapnik, E. Levin, and Y. LeCun. Measuring the VC dimension of a learning machine. Neural Comput., 1994. To appear.]] Google ScholarDigital Library
- 26.V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York, 1982.]] Google ScholarDigital Library
- 27.V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applicatwns, 16(2):264- 280, 1971.]]Google ScholarCross Ref
- 28.T. L. H. Watkin, A. Rau, and M. Biehl. The statistical mechanics of learning a rule. Rev. Mod. Phys., 65:499-556, 1993.]]Google ScholarCross Ref
Index Terms
- Rigorous learning curve bounds from statistical mechanics
Recommendations
New Lower Bounds for Statistical Query Learning
COLT '02: Proceedings of the 15th Annual Conference on Computational Learning TheoryWe prove two lower bounds on the Statistical Query (SQ) learning model. The first lower bound is on weak-learning. We prove that for a concept class of SQ-dimension d , a running time of ( d / log d ) is needed. The SQ-dimension of a concept class is ...
New lower bounds for statistical query learning
Special issue on COLT 2002We prove two lower bounds in the statistical query (SQ) learning model. The first lower bound is on weak-learning. We prove that for a concept class of SQ-dimension d, a running time of @W(d/logd) is needed. The SQ-dimension of a concept class is ...
Comments