ABSTRACT
We describe efficient algorithms for projecting a vector onto the l1-ball. We present two methods for projection. The first performs exact projection in O(n) expected time, where n is the dimension of the space. The second works on vectors k of whose elements are perturbed outside the l1-ball, projecting in O(k log(n)) time. This setting is especially useful for online learning in sparse feature spaces such as text categorization applications. We demonstrate the merits and effectiveness of our algorithms in numerous batch and online learning tasks. We show that variants of stochastic gradient projection methods augmented with our efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques. We also show that in online settings gradient updates with l1 projections outperform the exponentiated gradient algorithm while obtaining models with high degrees of sparsity.
- Beck, A., & Teboulle, M. (2003). Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31, 167--175. Google ScholarDigital Library
- Bertsekas, D. (1999). Nonlinear programming. Athena Scientific.Google Scholar
- Candes, E. J. (2006). Compressive sampling. Proc. of the Int. Congress of Math., Madrid, Spain.Google Scholar
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2001). Introduction to algorithms. MIT Press. Google ScholarDigital Library
- Crammer, K., & Singer, Y. (2002). On the learnability and design of output codes for multiclass problems. Machine Learning, 47. Google ScholarDigital Library
- Donoho, D. (2006a). Compressed sensing. Technical Report, Stanford University.Google Scholar
- Donoho, D. (2006b). For most large underdetermined systems of linear equations, the minimal l 1-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59.Google Scholar
- Friedman, J., Hastie, T., & Tibshirani, R. (2007). Pathwise co-ordinate optimization. Annals of Applied Statistics, 1, 302--332.Google ScholarCross Ref
- Gafni, E., & Bertsekas, D. P. (1984). Two-metric projection methods for constrained optimization. SIAM Journal on Control and Optimization, 22, 936--964.Google ScholarCross Ref
- Hazan, E. (2006). Approximate convex optimization by online game playing. Unpublished manuscript.Google Scholar
- Kim, S.-J., Koh, K., Lustig, M., Boyd, S., & Gorinevsky, D. (2007). An interior-point method for large-scale l 1-regularized least squares. IEEE Journal on Selected Topics in Signal Processing, 4, 606--617.Google ScholarCross Ref
- Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1--64. Google ScholarDigital Library
- Koh, K., Kim, S.-J., & Boyd, S. (2007). An interior-point method for large-scale l 1-regularized logistic regression. Journal of Machine Learning Research, 8, 1519--1555. Google ScholarDigital Library
- Lewis, D., Yang, Y., Rose, T., & Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361--397. Google ScholarDigital Library
- Ng, A. (2004). Feature selection, l 1 vs. l 2 regularization, and rotational invariance. Proceedings of the Twenty-First International Conference on Machine Learning. Google ScholarDigital Library
- Shalev-Shwartz, S., & Singer, Y. (2006). Efficient learning of label ranking by soft projections onto polyhedra. Journal of Machine Learning Research, 7 (July), 1567--1599. Google ScholarDigital Library
- Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimated sub-gradient solver for SVM. Proceedings of the 24th International Conference on Machine Learning. Google ScholarDigital Library
- Tarjan, R. E. (1983). Data structures and network algorithms. Society for Industrial and Applied Mathematics. Google ScholarDigital Library
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58, 267--288.Google ScholarCross Ref
- Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proceedings of the Twentieth International Conference on Machine Learning.Google Scholar
Index Terms
- Efficient projections onto the l1-ball for learning in high dimensions
Recommendations
Enhanced Locality Preserving Projections
CSSE '08: Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 01In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of data. In this paper, a new manifold learning algorithm, called Enhanced Locality Preserving Projections, to identify the underlying manifold ...
Locality Preserving Projections
NIPS'03: Proceedings of the 16th International Conference on Neural Information Processing SystemsMany problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally ...
Comments