ABSTRACT
We consider the adversarial convex bandit problem and we build the first poly(T)-time algorithm with poly(n) √T-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization
literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ(n9.5 #8730;T)-regret, and we show that a simple variant of this algorithm can be run in poly(n log(T))-time per step at the cost of an additional poly(n) To(1) factor in the regret. These results improve upon the Õ(n11 #8730;T)-regret and exp(poly(T))-time result of the first two authors, and the log(T)poly(n) #8730;T-regret and log(T)poly(n)-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve Õ(n1.5 #8730;T)-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω(n #8730;T) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order n3 / ϵ2.
Supplemental Material
- J. Abernethy, E. Hazan, and A. Rakhlin. 2008.Google Scholar
- Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization. In Proceedings of the 21st Annual Conference on Learning Theory (COLT).Google Scholar
- A. Agarwal, O. Dekel, and L. Xiao. 2010. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. In Proceedings of the 23rd Annual Conference on Learning Theory (COLT).Google Scholar
- A. Agarwal, D.P. Foster, D. Hsu, S.M. Kakade, and A. Rakhlin. 2011. Stochastic convex optimization with bandit feedback. In Advances in Neural Information Processing Systems (NIPS). Google ScholarDigital Library
- F. Bach and V. Perchet. 2016. Highly-Smooth Zero-th Order Online Optimization. In Proceedings of the 29th Annual Conference on Learning Theory (COLT).Google Scholar
- K. Ball, F. Barthe, and A. Naor. 2003. Entropy jumps in the presence of a spectral gap. Duke Mathematical Journal 119, 1 (2003), 41–63.Google ScholarCross Ref
- A. Belloni, T. Liang, H. Narayanan, and A. Rakhlin. 2015. Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions. In Proceedings of the 28th Annual Conference on Learning Theory (COLT).Google Scholar
- S. Bubeck and N. Cesa-Bianchi. 2012. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning 5, 1 (2012), 1–122.Google ScholarCross Ref
- S. Bubeck, N. Cesa-Bianchi, and S.M. Kakade. 2012. Towards minimax policies for online linear optimization with bandit feedback. In Proceedings of the 25th Annual Conference on Learning Theory (COLT).Google Scholar
- S. Bubeck, O. Dekel, T. Koren, and Y. Peres. 2015. Bandit Convex Optimization: √ T Regret in One Dimension. In Proceedings of the 28th Annual Conference on Learning Theory (COLT).Google Scholar
- S. Bubeck and R. Eldan. 2016. Multi-scale exploration of convex functions and bandit convex optimization. In Proceedings of the 29th Annual Conference on Learning Theory (COLT).Google Scholar
- S. Bubeck, Y.T. Lee, and R. Eldan. 2016. Kernel-based methods for bandit convex optimization. arXiv preprint arXiv:1607.03084 (2016).Google Scholar
- A. Conn, K. Scheinberg, and L. Vicente. 2009.Google Scholar
- Introduction to Derivative-Free Optimization. Society for Industrial and Applied Mathematics (SIAM). Google ScholarDigital Library
- V. Dani, T. Hayes, and S. Kakade. 2008. The price of bandit information for online optimization. In Advances in Neural Information Processing Systems (NIPS). Google ScholarDigital Library
- O. Dekel, R. Eldan, and T. Koren. 2015. Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). 2926–2934. Google ScholarDigital Library
- P. Erdös. 1939.Google Scholar
- On a family of symmetric Bernoulli convolutions. American Journal of Mathematics 61, 4 (1939), 974–976.Google Scholar
- A. Flaxman, A. Kalai, and B. McMahan. 2005. Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. In In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). Google ScholarDigital Library
- E. Hazan and K. Levy. 2014. Bandit Convex Optimization: Towards Tight Bounds. In Advances in Neural Information Processing Systems (NIPS). Google ScholarDigital Library
- E. Hazan and Y. Li. 2016. An optimal algorithm for bandit convex optimization. arXiv preprint arXiv:1603.04350 (2016).Google Scholar
- O. Johnson and A. Barron. 2004. Fisher information inequalities and the central limit theorem. Probability Theory and Related Fields 129, 3 (2004), 391–409.Google ScholarCross Ref
- R. Kleinberg. 2004. Nearly tight bounds for the continuum-armed bandit problem. Advances in Neural Information Processing Systems (NIPS) (2004). Google ScholarDigital Library
- A. Levin. 1965. On an algorithm for the minimization of convex functions. In Soviet Mathematics Doklady, Vol. 160. 1244–1247.Google Scholar
- V. D. Milman and A. Pajor. 1989. Isotropic position and inertia ellipsoids and zonoids of the unit ball of a normed n-dimensional space. In Geometric aspects of functional analysis (1987–88). Lecture Notes in Math., Vol. 1376. Springer, Berlin, 64–104.Google Scholar
- A. Nemirovski and D. Yudin. 1983.Google Scholar
- Problem Complexity and Method Efficiency in Optimization. Wiley Interscience.Google Scholar
- D. Newman. 1965. Location of the maximum on unimodal surfaces. Journal of the ACM (JACM) 12, 3 (1965), 395–398. Google ScholarDigital Library
- Y. Peres, W. Schlag, and B. Solomyak. 2000. Sixty years of Bernoulli convolutions. In Fractal geometry and stochastics II. Springer, 39–65.Google Scholar
- V. Protasov. 1996. Algorithms for approximate calculation of the minimum of a convex function from its values. Mathematical Notes 59, 1 (1996), 69–74.Google ScholarCross Ref
- H. Rosenbrock. 1960. An automatic method for finding the greatest or least value of a function. Comput. J. 3, 3 (1960), 175–184.Google ScholarCross Ref
- A. Saha and A. Tewari. 2011. Improved regret guarantees for online smooth convex optimization with bandit feedback. In International Conference on Artificial Intelligence and Statistics (AISTAT). 636–642.Google Scholar
- O. Shamir. 2013. On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization. In Proceedings of the 26th Annual Conference on Learning Theory (COLT).Google Scholar
Index Terms
- Kernel-based methods for bandit convex optimization
Recommendations
Kernel-based Methods for Bandit Convex Optimization
We consider the adversarial convex bandit problem and we build the first poly(T)-time algorithm with poly(n) √T-regret for this problem. To do so, we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a ...
Multi-armed bandit problem with known trend
We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different ...
Bandits with switching costs: T2/3 regret
STOC '14: Proceedings of the forty-sixth annual ACM symposium on Theory of computingWe study the adversarial multi-armed bandit problem in a setting where the player incurs a unit cost each time he switches actions. We prove that the player's T-round minimax regret in this setting is [EQUATION], thereby closing a fundamental gap in our ...
Comments