ABSTRACT
Suppose there is a large collection of items, each with an associated cost and an inherent utility that is revealed only once we commit to selecting it. Given a budget on the cumulative cost of the selected items, how can we pick a subset of maximal value? This task generalizes several important problems such as multi-arm bandits, active search and the knapsack problem. We present an algorithm, GP-SELECT, which utilizes prior knowledge about similarity between items, expressed as a kernel function. GP-SELECT uses Gaussian process prediction to balance exploration (estimating the unknown value of items) and exploitation (selecting items of high value). We extend GP-SELECT to be able to discover sets that simultaneously have high utility and are diverse. Our preference for diversity can be specified as an arbitrary monotone submodular function that quantifies the diminishing returns obtained when selecting similar items. Furthermore, we exploit the structure of the model updates to achieve an order of magnitude (up to 40X) speedup in our experiments without resorting to approximations. We provide strong guarantees on the performance of GP-SELECT and apply it to three real-world case studies of industrial relevance: (1) Refreshing a repository of prices in a Global Distribution System for the travel industry, (2) Identifying diverse, binding-affine peptides in a vaccine design task and (3) Maximizing clicks in a web-scale recommender system by recommending items to users.
Supplemental Material
- P. Auer, N. C. Bianchi, and P. Fischer. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, 47 (2--3): 235--256, May 2002. Google ScholarDigital Library
- A. Badanidiyuru, R. Kleinberg, and A. Slivkins. Bandits with knapsacks. CoRR, abs/1305.2545, 2013. Google ScholarDigital Library
- }Bubeck08S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári. Online optimization in X-armed bandits. In Advances in Neural Information Processing Systems (NIPS), 2008.Google ScholarDigital Library
- G. Choquet. Theory of capacities. Annales de l'institut Fourier, 5: 131--295, 1954.Google Scholar
- W. Chu, S.-T. Park, T. Beaupre, N. Motgi, A. adke, S. Chakraborty, and J. Zachariah. A case study of behavior-driven conjoint analysis on yahoo!: Front page today module. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pages 1097--1104, 2009. Google ScholarDigital Library
- V. Dani, T. P. Hayes, and S. Kakade. The price of bandit information for online optimization. In Advances in Neural Information Processing Systems (NIPS), 2007.Google Scholar
- T. Desautels, A. Krause, and J. Burdick. Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization. Journal of Machine Learning Research (JMLR), 2014. Google ScholarDigital Library
- U. Feige. A threshold of ln n for approximating set cover. Journal of ACM, 45 (4): 634--652, July 1998. Google ScholarDigital Library
- R. Garnett, Y. Krishnamurthy, X. Xiong, J. Schneider, and R. Mann. Bayesian optimal active search and surveying. Proceredings of the 29th Annual International Conference on Machine Learning (ICML), 2012.Google Scholar
- A. Gupta, R. Krishnaswamy, M. Molinaro, and R. Ravi. Approximation algorithms for correlated knapsacks and non-martingale bandits. FOCS, pages 827--836, 2011. Google ScholarDigital Library
- J. Han, H. Cheng, D. Xin, and X. Yan. Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15 (1): 55--86, 2007. Google ScholarDigital Library
- L. Jacob and J.-P. Vert. Efficient peptide-MHC-I binding prediction for alleles with few known binders. Bioinformatics, 24 (3): 358--366, Feb 2008. Google ScholarDigital Library
- S. Kale, L. Reyzin, and R. E. Schapire. Non-stochastic bandit slate problems. In Advances in Neural Information Processing Systems (NIPS), pages 1054--1062, 2010.Google Scholar
- G. Kim, E. P. Xing, L. Fei-Fei, and T. Kanade. Distributed Cosegmentation via Submodular Optimization on Anisotropic Diffusion. 13th International Conference on Computer Vision (ICCV 2011), 2011. Google ScholarDigital Library
- R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces. STOC, pages 681--690, 2008. Google ScholarDigital Library
- R. Kleinberg, A. Niculescu-Mizil, and Y. Sharma. Regret bounds for sleeping experts and bandits. Machine Learning, 80 (2--3): 245--272, 2010. Google ScholarDigital Library
- A. Krause and C. S. Ong. Contextual gaussian process bandit optimization. In Advances in Neural Information Processing Systems (NIPS), 2011.Google ScholarDigital Library
- A. Kulesza and B. Taskar. Determinantal point processes for machine learning. Machine Learning, 5 (2--3): 123--286, 2012.Google Scholar
- T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6 (1): 4--22, 1985. Google ScholarDigital Library
- N. D. Lawrence, M. Seeger, and R. Herbrich. Fast sparse gaussian process methods: The informative vector machine. NIPS, pages 609--616, 2002.Google Scholar
- L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 661--670, 2010. Google ScholarDigital Library
- H. Lin and J. Bilmes. A class of submodular functions for document summarization. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT '11, pages 510--520, 2011. Google ScholarDigital Library
- M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. Optimization Techniques, volume 7, pages 234--243. 1978.Google ScholarCross Ref
- G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of approximations for maximizing submodular set functions-1. Mathematical Programming, 1978.Google Scholar
- B. Peters, H. Bui, S. Frankild, M. Nielsen, C. Lundegaard, E. Kostem, D. Basch, K. Lamberth, M. Harndahl, W. Fleri, S. Wilson, J. Sidney, O. Lund, S. Buus, and A. Sette. A Community Resource Benchmarking Predictions of Peptide Binding to MHC-I Molecules. PLoS Comput Biol, 2 (6), June 2006.Google Scholar
- C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2005. ISBN 026218253X. Google ScholarCross Ref
- opf and Smola(2001)}Scholkopf01B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001. ISBN 0262194759. Google ScholarDigital Library
- B. Settles. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool, 2012.Google ScholarDigital Library
- A. Slivkins, F. Radlinski, and S. Gollapudi. Learning optimally diverse rankings over large document collections. International Conference on Machine Learning(ICML), pages 983--990, 2010.Google Scholar
- N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58 (5): 3250--3265, May 2012. Google ScholarDigital Library
- M. Streeter and D. Golovin. An online algorithm for maximizing submodular functions. Advances in Neural Information Processing Systems 21, pages 1577--1584. 2008.Google ScholarDigital Library
- M. Streeter, D. Golovin, and A. Krause. Online learning of assignments. In Advances in Neural Information Processing Systems (NIPS), 2009.Google ScholarDigital Library
- L. Tran-Thanh, A. C. Chapman, E. M. de Cote, A. Rogers, and N. R. Jennings. Epsilon-first policies for budget-limited multi-armed bandits. AAAI, 2010.Google ScholarDigital Library
- H. P. Vanchinathan, I. Nikolic, F. De Bona, and A. Krause. Explore-exploit in top-n recommender systems via gaussian processes. Proceedings of the 8th ACM Conference on Recommender systems, pages 225--232. ACM, 2014. Google ScholarDigital Library
- t al.(2015)Vanchinathan, Marfurt, Robelin, Kossmann, and Krause}Vanchinathan15ArxivH. P. Vanchinathan, A. Marfurt, C.-A. Robelin, D. Kossmann, and A. Krause. Discovering Valuable Items from Massive Data. ArXiv e-prints, June 2015. URL http://arxiv.org/abs/1506.00935.Google Scholar
- Wang, Garnett, and Schneider}Wang13aX. Wang, R. Garnett, and J. G. Schneider. Active search on graphs. KDD, pages 731--738, 2013\natexlaba. Google ScholarDigital Library
- Wang, Zoghi, Hutter, Matheson, and De Freitas}Wang13Z. Wang, M. Zoghi, F. Hutter, D. Matheson, and N. De Freitas. Bayesian optimization in high dimensions via random embeddings. Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, pages 1778--1784, 2013\natexlabb. Google ScholarDigital Library
- M. K. Warmuth, J. Liao, G. Rätsch, M. Mathieson, S. Putta, and C. Lemmen. Support vector machines for active learning in the drug discovery process. Journal of Chemical Information Sciences, 43: 667--673, 2003.Google ScholarCross Ref
- C. Widmer, N. Toussaint, Y. Altun, and G. Ratsch. Inferring latent task structure for Multitask Learning by Multiple Kernel Learning. BMC Bioinformatics, 11 (Suppl 8): S5Google Scholar
- , 2010.Google Scholar
- Y. Yue and C. Guestrin. Linear submodular bandits and their application to diversified retrieval. In Advances in Neural Information Processing Systems (NIPS), Granada, Spain, December 2011.Google Scholar
Index Terms
- Discovering Valuable items from Massive Data
Recommendations
A contextual-bandit approach to personalized news article recommendation
WWW '10: Proceedings of the 19th international conference on World wide webPersonalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two ...
Explore-exploit in top-N recommender systems via Gaussian processes
RecSys '14: Proceedings of the 8th ACM Conference on Recommender systemsWe address the challenge of ranking recommendation lists based on click feedback by efficiently encoding similarities among users and among items. The key challenges are threefold: (1) combinatorial number of lists; (2) sparse feedback and (3) context ...
DRN: A Deep Reinforcement Learning Framework for News Recommendation
WWW '18: Proceedings of the 2018 World Wide Web ConferenceIn this paper, we propose a novel Deep Reinforcement Learning framework for news recommendation. Online personalized news recommendation is a highly challenging problem due to the dynamic nature of news features and user preferences. Although some ...
Comments