| Solving large scale linear prediction problems using stochastic gradient descent algorithms |
| Full text |
Pdf
(227 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 69
archive
Proceedings of the twenty-first international conference on Machine learning
table of contents
Banff, Alberta, Canada
Page: 116
Year of Publication: 2004
ISBN:1-58113-828-5
|
|
Author
|
|
Tong Zhang
|
IBM T. J. Watson Research Center, Yorktown Heights, NY
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 68, Citation Count: 7
|
|
|
ABSTRACT
Linear prediction methods, such as least squares for regression, logistic regression and support vector machines for classification, have been extensively used in statistics and machine learning. In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linear prediction methods. This class of methods, related to online algorithms such as perceptron, are both efficient and very simple to implement. We obtain numerical rate of convergence for such algorithms, and discuss its implications. Experiments on text data will be provided to demonstrate numerical and statistical consequences of our theoretical findings.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
Kushner, H. J., & Yin, G. G. (1997). Stochastic approximation algorithms and applications. New York: Springer-Verlag.
|
| |
7
|
Li, F., & Yang, Y. (2003). A loss function analysis for classification methods in text categorization. ICML 03 (pp. 472--479).
|
| |
8
|
|
| |
9
|
Rosenblatt, F. (1962). Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. New York: Spartan.
|
| |
10
|
|
|