skip to main content
10.1145/1390156.1390191acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Efficient projections onto the l1-ball for learning in high dimensions

Published:05 July 2008Publication History

ABSTRACT

We describe efficient algorithms for projecting a vector onto the l1-ball. We present two methods for projection. The first performs exact projection in O(n) expected time, where n is the dimension of the space. The second works on vectors k of whose elements are perturbed outside the l1-ball, projecting in O(k log(n)) time. This setting is especially useful for online learning in sparse feature spaces such as text categorization applications. We demonstrate the merits and effectiveness of our algorithms in numerous batch and online learning tasks. We show that variants of stochastic gradient projection methods augmented with our efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques. We also show that in online settings gradient updates with l1 projections outperform the exponentiated gradient algorithm while obtaining models with high degrees of sparsity.

References

  1. Beck, A., & Teboulle, M. (2003). Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31, 167--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bertsekas, D. (1999). Nonlinear programming. Athena Scientific.Google ScholarGoogle Scholar
  3. Candes, E. J. (2006). Compressive sampling. Proc. of the Int. Congress of Math., Madrid, Spain.Google ScholarGoogle Scholar
  4. Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2001). Introduction to algorithms. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Crammer, K., & Singer, Y. (2002). On the learnability and design of output codes for multiclass problems. Machine Learning, 47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Donoho, D. (2006a). Compressed sensing. Technical Report, Stanford University.Google ScholarGoogle Scholar
  7. Donoho, D. (2006b). For most large underdetermined systems of linear equations, the minimal l 1-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59.Google ScholarGoogle Scholar
  8. Friedman, J., Hastie, T., & Tibshirani, R. (2007). Pathwise co-ordinate optimization. Annals of Applied Statistics, 1, 302--332.Google ScholarGoogle ScholarCross RefCross Ref
  9. Gafni, E., & Bertsekas, D. P. (1984). Two-metric projection methods for constrained optimization. SIAM Journal on Control and Optimization, 22, 936--964.Google ScholarGoogle ScholarCross RefCross Ref
  10. Hazan, E. (2006). Approximate convex optimization by online game playing. Unpublished manuscript.Google ScholarGoogle Scholar
  11. Kim, S.-J., Koh, K., Lustig, M., Boyd, S., & Gorinevsky, D. (2007). An interior-point method for large-scale l 1-regularized least squares. IEEE Journal on Selected Topics in Signal Processing, 4, 606--617.Google ScholarGoogle ScholarCross RefCross Ref
  12. Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Koh, K., Kim, S.-J., & Boyd, S. (2007). An interior-point method for large-scale l 1-regularized logistic regression. Journal of Machine Learning Research, 8, 1519--1555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lewis, D., Yang, Y., Rose, T., & Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ng, A. (2004). Feature selection, l 1 vs. l 2 regularization, and rotational invariance. Proceedings of the Twenty-First International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Shalev-Shwartz, S., & Singer, Y. (2006). Efficient learning of label ranking by soft projections onto polyhedra. Journal of Machine Learning Research, 7 (July), 1567--1599. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimated sub-gradient solver for SVM. Proceedings of the 24th International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tarjan, R. E. (1983). Data structures and network algorithms. Society for Industrial and Applied Mathematics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58, 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  20. Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proceedings of the Twentieth International Conference on Machine Learning.Google ScholarGoogle Scholar

Index Terms

  1. Efficient projections onto the l1-ball for learning in high dimensions

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  ICML '08: Proceedings of the 25th international conference on Machine learning
                  July 2008
                  1310 pages
                  ISBN:9781605582054
                  DOI:10.1145/1390156

                  Copyright © 2008 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 5 July 2008

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  Overall Acceptance Rate140of548submissions,26%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader