research-article

Kernel-based methods for bandit convex optimization

Authors:
Sébastien Bubeck

Microsoft Research, USA

Microsoft Research, USA
View Profile

,
Yin Tat Lee

Microsoft Research, USA

Microsoft Research, USA
View Profile

,
Ronen Eldan

Weizmann Institute of Science, Israel

Weizmann Institute of Science, Israel
View Profile

STOC 2017: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of ComputingJune 2017Pages 72–85https://doi.org/10.1145/3055399.3055403

Published:19 June 2017Publication History

STOC 2017: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing

Pages 72–85

ABSTRACT

We consider the adversarial convex bandit problem and we build the first poly(T)-time algorithm with poly(n) √T-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization

literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ(n^9.5 #8730;T)-regret, and we show that a simple variant of this algorithm can be run in poly(n log(T))-time per step at the cost of an additional poly(n) T^o(1) factor in the regret. These results improve upon the Õ(n¹¹ #8730;T)-regret and exp(poly(T))-time result of the first two authors, and the log(T)^poly(n) #8730;T-regret and log(T)^poly(n)-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve Õ(n^1.5 #8730;T)-regret, and moreover that this regret is unimprovable (the current best lower bound being Î©(n #8730;T) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order n³ / ϵ².

Supplemental Material

d2_sc_t5.mp4

mp4

172 MB

Download

References

J. Abernethy, E. Hazan, and A. Rakhlin. 2008.Google Scholar
Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization. In Proceedings of the 21st Annual Conference on Learning Theory (COLT).Google Scholar
A. Agarwal, O. Dekel, and L. Xiao. 2010. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. In Proceedings of the 23rd Annual Conference on Learning Theory (COLT).Google Scholar
A. Agarwal, D.P. Foster, D. Hsu, S.M. Kakade, and A. Rakhlin. 2011. Stochastic convex optimization with bandit feedback. In Advances in Neural Information Processing Systems (NIPS). Google ScholarDigital Library
F. Bach and V. Perchet. 2016. Highly-Smooth Zero-th Order Online Optimization. In Proceedings of the 29th Annual Conference on Learning Theory (COLT).Google Scholar
K. Ball, F. Barthe, and A. Naor. 2003. Entropy jumps in the presence of a spectral gap. Duke Mathematical Journal 119, 1 (2003), 41–63.Google ScholarCross Ref
A. Belloni, T. Liang, H. Narayanan, and A. Rakhlin. 2015. Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions. In Proceedings of the 28th Annual Conference on Learning Theory (COLT).Google Scholar
S. Bubeck and N. Cesa-Bianchi. 2012. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning 5, 1 (2012), 1–122.Google ScholarCross Ref
S. Bubeck, N. Cesa-Bianchi, and S.M. Kakade. 2012. Towards minimax policies for online linear optimization with bandit feedback. In Proceedings of the 25th Annual Conference on Learning Theory (COLT).Google Scholar
S. Bubeck, O. Dekel, T. Koren, and Y. Peres. 2015. Bandit Convex Optimization: √ T Regret in One Dimension. In Proceedings of the 28th Annual Conference on Learning Theory (COLT).Google Scholar
S. Bubeck and R. Eldan. 2016. Multi-scale exploration of convex functions and bandit convex optimization. In Proceedings of the 29th Annual Conference on Learning Theory (COLT).Google Scholar
S. Bubeck, Y.T. Lee, and R. Eldan. 2016. Kernel-based methods for bandit convex optimization. arXiv preprint arXiv:1607.03084 (2016).Google Scholar
A. Conn, K. Scheinberg, and L. Vicente. 2009.Google Scholar
Introduction to Derivative-Free Optimization. Society for Industrial and Applied Mathematics (SIAM). Google ScholarDigital Library
V. Dani, T. Hayes, and S. Kakade. 2008. The price of bandit information for online optimization. In Advances in Neural Information Processing Systems (NIPS). Google ScholarDigital Library
O. Dekel, R. Eldan, and T. Koren. 2015. Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). 2926–2934. Google ScholarDigital Library
P. Erdös. 1939.Google Scholar
On a family of symmetric Bernoulli convolutions. American Journal of Mathematics 61, 4 (1939), 974–976.Google Scholar
A. Flaxman, A. Kalai, and B. McMahan. 2005. Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. In In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). Google ScholarDigital Library
E. Hazan and K. Levy. 2014. Bandit Convex Optimization: Towards Tight Bounds. In Advances in Neural Information Processing Systems (NIPS). Google ScholarDigital Library
E. Hazan and Y. Li. 2016. An optimal algorithm for bandit convex optimization. arXiv preprint arXiv:1603.04350 (2016).Google Scholar
O. Johnson and A. Barron. 2004. Fisher information inequalities and the central limit theorem. Probability Theory and Related Fields 129, 3 (2004), 391–409.Google ScholarCross Ref
R. Kleinberg. 2004. Nearly tight bounds for the continuum-armed bandit problem. Advances in Neural Information Processing Systems (NIPS) (2004). Google ScholarDigital Library
A. Levin. 1965. On an algorithm for the minimization of convex functions. In Soviet Mathematics Doklady, Vol. 160. 1244–1247.Google Scholar
V. D. Milman and A. Pajor. 1989. Isotropic position and inertia ellipsoids and zonoids of the unit ball of a normed n-dimensional space. In Geometric aspects of functional analysis (1987–88). Lecture Notes in Math., Vol. 1376. Springer, Berlin, 64–104.Google Scholar
A. Nemirovski and D. Yudin. 1983.Google Scholar
Problem Complexity and Method Efficiency in Optimization. Wiley Interscience.Google Scholar
D. Newman. 1965. Location of the maximum on unimodal surfaces. Journal of the ACM (JACM) 12, 3 (1965), 395–398. Google ScholarDigital Library
Y. Peres, W. Schlag, and B. Solomyak. 2000. Sixty years of Bernoulli convolutions. In Fractal geometry and stochastics II. Springer, 39–65.Google Scholar
V. Protasov. 1996. Algorithms for approximate calculation of the minimum of a convex function from its values. Mathematical Notes 59, 1 (1996), 69–74.Google ScholarCross Ref
H. Rosenbrock. 1960. An automatic method for finding the greatest or least value of a function. Comput. J. 3, 3 (1960), 175–184.Google ScholarCross Ref
A. Saha and A. Tewari. 2011. Improved regret guarantees for online smooth convex optimization with bandit feedback. In International Conference on Artificial Intelligence and Statistics (AISTAT). 636–642.Google Scholar
O. Shamir. 2013. On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization. In Proceedings of the 26th Annual Conference on Learning Theory (COLT).Google Scholar

Index Terms

Kernel-based methods for bandit convex optimization
1. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization
        Convex optimization
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Online learning theory

Recommendations

Kernel-based Methods for Bandit Convex Optimization
We consider the adversarial convex bandit problem and we build the first poly(T)-time algorithm with poly(n) √T-regret for this problem. To do so, we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a ...
Read More
Multi-armed bandit problem with known trend

We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different ...
Read More
Bandits with switching costs: T^2/3 regret
STOC '14: Proceedings of the forty-sixth annual ACM symposium on Theory of computing

We study the adversarial multi-armed bandit problem in a setting where the player incurs a unit cost each time he switches actions. We prove that the player's T-round minimax regret in this setting is [EQUATION], thereby closing a fundamental gap in our ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
STOC 2017: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing
June 2017
1268 pages
ISBN:9781450345286
DOI:10.1145/3055399
General Chairs:
Hamed Hatami
McGill University
,
Pierre McKenzie
Université de Montréal
,
Program Chair:
Valerie King
University of Victoria
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
convex optimization
multi-armed bandit
online learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,469of4,586submissions,32%
Upcoming Conference
STOC '24

Sponsor:

sigact

56th Annual ACM Symposium on Theory of Computing (STOC 2024)

June 24 - 28, 2024

Vancouver , BC , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 44
  Total Citations
  View Citations
- 381
  Total Downloads
- Downloads (Last 12 months)83
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Kernel-based methods for bandit convex optimization

STOC 2017: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Kernel-based Methods for Bandit Convex Optimization

Multi-armed bandit problem with known trend

Bandits with switching costs: T^2/3 regret