research-article

Stochastic methods for l₁ regularized loss minimization

Authors:
Shai Shalev-Shwartz

Toyota Technological Institute at Chicago, Chicago, IL

Toyota Technological Institute at Chicago, Chicago, IL
View Profile

,
Ambuj Tewari

Toyota Technological Institute at Chicago, Chicago, IL

Toyota Technological Institute at Chicago, Chicago, IL
View Profile

ICML '09: Proceedings of the 26th Annual International Conference on Machine LearningJune 2009Pages 929–936https://doi.org/10.1145/1553374.1553493

Published:14 June 2009Publication History

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 929–936

ABSTRACT

We describe and analyze two stochastic methods for l₁ regularized loss minimization problems, such as the Lasso. The first method updates the weight of a single feature at each iteration while the second method updates the entire weight vector but only uses a single training example at each iteration. In both methods, the choice of feature/example is uniformly at random. Our theoretical runtime analysis suggests that the stochastic methods should outperform state-of-the-art deterministic approaches, including their deterministic counterparts, when the size of the problem is large. We demonstrate the advantage of stochastic methods by experimenting with synthetic and natural data sets.

References

Beck, A., & Teboulle, M. (2003). Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31, 167--175. Google ScholarDigital Library
Bottou, L. (Web Page). Stochastic gradient descent examples. http://leon.bottou.org/projects/sgd.Google Scholar
Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. Advances in Neural Information Processing Systems 20 (pp. 161--168).Google Scholar
Bottou, L., & LeCunn, Y. (2005). On-line learning for very large datasets. Appl. Stoch. Model. Bus. and Ind., 21, 137--151. Google ScholarDigital Library
Clarkson, K. (2008). Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 922--931). Google ScholarDigital Library
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge University Press. Google ScholarDigital Library
Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008). Efficient projections onto the l ₁-ball for learning in high dimensions. Proceedings of the 25th International Conference on Machine Learning (pp. 272--279). Google ScholarDigital Library
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Ann. Statist., 32, 407--499.Google ScholarCross Ref
Frank, M., & Wolfe, P. (1956). An algorithm for quadratic programming. Naval Res. Logist. Quart., 3, 95--110.Google ScholarCross Ref
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Regularized paths for generalized linear models via coordinate descent (Technical Report). Department of Statistics, Stanford University.Google Scholar
Genkin, A., Lewis, D., & Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics, 49, 291--304.Google ScholarCross Ref
Gentile, C. (2003). The robustness of the p-norm algorithms. Machine Learning, 53, 265--299. Google ScholarDigital Library
Grove, A. J., Littlestone, N., & Schuurmans, D. (2001). General convergence results for linear discriminant updates. Machine Learning, 43, 173--210. Google ScholarDigital Library
Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1--64. Google ScholarDigital Library
Koh, K., Kim, S., & Boyd, S. (2007). An interior-point method for large-scale l ₁-regularized logistic regression. Journal of Machine Learning Research, 8, 1519--1555. Google ScholarDigital Library
Langford, J., Li, L., & Zhang, T. (2009). Sparse online learning via truncated gradient. Advances in Neural Information Processing Systems 21 (pp. 905--912).Google Scholar
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285--318. Google ScholarCross Ref
Luo, Z., & Tseng, P. (1992). On the convergence of coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl., 72, 7--35. Google ScholarDigital Library
Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. Proceedings of the 24th International Conference on Machine Learning (pp. 807--814). Google ScholarDigital Library
Shalev-Shwartz, S., & Srebro, N. (2008). SVM optimization: Inverse dependence on training set size. Proceedings of the 25th International Conference on Machine Learning (pp. 928--935). Google ScholarDigital Library
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58, 267--288.Google ScholarCross Ref
Tseng, P., & Yun, S. (2009). A block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl., 140, 513--535.Google ScholarDigital Library
Wu, T. T., & Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Annals of Applied Statistics, 2, 224--244.Google ScholarCross Ref
Zhang, T. (2003). Sequential greedy approximation for certain convex optimization problems. IEEE Transaction on Information Theory, 49, 682--691. Google ScholarDigital Library
Zhang, T., & Oles, F. J. (2001). Text categorization based on regularized linear classification methods. Information Retrieval, 4, 5--31. Google ScholarDigital Library

Index Terms

Recommendations

Stochastic Methods for l₁-regularized Loss Minimization

We describe and analyze two stochastic methods for l₁ regularized loss minimization problems, such as the Lasso. The first method updates the weight of a single feature at each iteration while the second method updates the entire weight vector but only ...
Read More
Iterative reweighted minimization methods for $$l_p$$lp regularized unconstrained nonlinear programming

In this paper we study general $$l_p$$ l p regularized unconstrained minimization problems. In particular, we derive lower bounds for nonzero entries of the first- and second-order stationary points and hence also of local minimizers of the $$l_p$$ l p minimization problems. ...
Read More
Non-asymptotic analysis of stochastic methods for non-smooth non-convex regularized problems
NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems

Stochastic Proximal Gradient (SPG) methods have been widely used for solving optimization problems with a simple (possibly non-smooth) regularizer in machine learning and statistics. However, to the best of our knowledge no non-asymptotic convergence ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374
General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University
Copyright © 2009 Copyright 2009 by the author(s)/owner(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 1,349
  Total Downloads
- Downloads (Last 12 months)56
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Stochastic methods for l₁ regularized loss minimization

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stochastic Methods for l₁-regularized Loss Minimization

Iterative reweighted minimization methods for $$l_p$$lp regularized unconstrained nonlinear programming

Non-asymptotic analysis of stochastic methods for non-smooth non-convex regularized problems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Stochastic methods for l1 regularized loss minimization

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stochastic Methods for l1-regularized Loss Minimization

Iterative reweighted minimization methods for $$l_p$$lp regularized unconstrained nonlinear programming

Non-asymptotic analysis of stochastic methods for non-smooth non-convex regularized problems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Stochastic methods for l₁ regularized loss minimization

Stochastic Methods for l₁-regularized Loss Minimization