skip to main content
10.1145/1553374.1553493acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Stochastic methods for l1 regularized loss minimization

Published:14 June 2009Publication History

ABSTRACT

We describe and analyze two stochastic methods for l1 regularized loss minimization problems, such as the Lasso. The first method updates the weight of a single feature at each iteration while the second method updates the entire weight vector but only uses a single training example at each iteration. In both methods, the choice of feature/example is uniformly at random. Our theoretical runtime analysis suggests that the stochastic methods should outperform state-of-the-art deterministic approaches, including their deterministic counterparts, when the size of the problem is large. We demonstrate the advantage of stochastic methods by experimenting with synthetic and natural data sets.

References

  1. Beck, A., & Teboulle, M. (2003). Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31, 167--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bottou, L. (Web Page). Stochastic gradient descent examples. http://leon.bottou.org/projects/sgd.Google ScholarGoogle Scholar
  3. Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. Advances in Neural Information Processing Systems 20 (pp. 161--168).Google ScholarGoogle Scholar
  4. Bottou, L., & LeCunn, Y. (2005). On-line learning for very large datasets. Appl. Stoch. Model. Bus. and Ind., 21, 137--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Clarkson, K. (2008). Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 922--931). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008). Efficient projections onto the l 1-ball for learning in high dimensions. Proceedings of the 25th International Conference on Machine Learning (pp. 272--279). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Ann. Statist., 32, 407--499.Google ScholarGoogle ScholarCross RefCross Ref
  9. Frank, M., & Wolfe, P. (1956). An algorithm for quadratic programming. Naval Res. Logist. Quart., 3, 95--110.Google ScholarGoogle ScholarCross RefCross Ref
  10. Friedman, J., Hastie, T., & Tibshirani, R. (2008). Regularized paths for generalized linear models via coordinate descent (Technical Report). Department of Statistics, Stanford University.Google ScholarGoogle Scholar
  11. Genkin, A., Lewis, D., & Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics, 49, 291--304.Google ScholarGoogle ScholarCross RefCross Ref
  12. Gentile, C. (2003). The robustness of the p-norm algorithms. Machine Learning, 53, 265--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Grove, A. J., Littlestone, N., & Schuurmans, D. (2001). General convergence results for linear discriminant updates. Machine Learning, 43, 173--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Koh, K., Kim, S., & Boyd, S. (2007). An interior-point method for large-scale l 1-regularized logistic regression. Journal of Machine Learning Research, 8, 1519--1555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Langford, J., Li, L., & Zhang, T. (2009). Sparse online learning via truncated gradient. Advances in Neural Information Processing Systems 21 (pp. 905--912).Google ScholarGoogle Scholar
  17. Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285--318. Google ScholarGoogle ScholarCross RefCross Ref
  18. Luo, Z., & Tseng, P. (1992). On the convergence of coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl., 72, 7--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. Proceedings of the 24th International Conference on Machine Learning (pp. 807--814). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Shalev-Shwartz, S., & Srebro, N. (2008). SVM optimization: Inverse dependence on training set size. Proceedings of the 25th International Conference on Machine Learning (pp. 928--935). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58, 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  22. Tseng, P., & Yun, S. (2009). A block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl., 140, 513--535.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Wu, T. T., & Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Annals of Applied Statistics, 2, 224--244.Google ScholarGoogle ScholarCross RefCross Ref
  24. Zhang, T. (2003). Sequential greedy approximation for certain convex optimization problems. IEEE Transaction on Information Theory, 49, 682--691. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhang, T., & Oles, F. J. (2001). Text categorization based on regularized linear classification methods. Information Retrieval, 4, 5--31. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Stochastic methods for l1 regularized loss minimization

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
                June 2009
                1331 pages
                ISBN:9781605585161
                DOI:10.1145/1553374

                Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 14 June 2009

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                Overall Acceptance Rate140of548submissions,26%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader