skip to main content
10.1145/1935826.1935878acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

Published: 09 February 2011 Publication History

Abstract

Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. Common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modeling bias is usually unavoidably introduced. In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.

Supplementary Material

JPG File (wsdm2011_li_uoe_01.jpg)
MP4 File (wsdm2011_li_uoe_01.mp4)

References

[1]
Naoki Abe, Alan W. Biermann, and Philip M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4):263--293, 2003.
[2]
Deepak Agarwal, Bee-Chung Chen, and Pradheep Elango. Explore/exploit schemes for web content optimization. In Proceedings of the Ninth International Conference on Data Mining, 2009.
[3]
Deepak Agarwal, Bee-Chung Chen, and Pradheep Elango. Spatio-temporal models for estimating click-through rate. In Proceedings of the Eighteenth International Conference on World Wide Web, 2009.
[4]
Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397--422, 2002.
[5]
Peter Auer, Nicol`o Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2--3):235--256, 2002.
[6]
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48--77, 2002.
[7]
Donald A. Berry and Bert Fristedt. Bandit Problems: Sequential Allocation of Experiments. Monographs on Statistics and Applied Probability. Chapman and Hall, 1985.
[8]
J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological), 41:148--177, 1979.
[9]
Leslie Pack Kaelbling. Associative reinforcement learning:Functions in k -DNF. Machine Learning, 15(3):279--298, 1994.
[10]
Michael J. Kearns, Yishay Mansour, and Andrew Y. Ng. Approximate planning in large POMDPs via reusable trajectories. In Advances in Neural Information Processing Systems 12, 2000.
[11]
Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4--22, 1985.
[12]
John Langford, Alexander L. Strehl, and Jennifer Wortman. Exploration scavenging. In Proceedings of the Twenty-Fifth International Conference on Machine Learning, pages 528--535, 2008.
[13]
John Langford and Tong Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In Advances in Neural Information Processing Systems 20, 2008.
[14]
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the Nineteenth International Conference on World Wide Web, 2010.
[15]
Colin McDiarmid. On the method of bounded differences. In J. Siemons, editor, Surveys in Combinatorics, volume 141 of London Mathematical Society Lecture Notes, pages 148--188. Cambridge University Press, 1989.
[16]
Taesup Moon, Lihong Li, Wei Chu, Ciya Liao, Zhaohui Zheng, and Yi Chang. Online learning for recency search ranking using real-time user feedback. In Proceedings of the Nineteenth International Conference on Knowledge Management, 2010.
[17]
Doina Precup, Richard S. Sutton, and Satinder P. Singh. Eligibility traces for off-policy policy evaluation. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 759--766, 2000.
[18]
Alexander L. Strehl, John Langford, Lihong Li, and Sham M. Kakade. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems 23, 2011.
[19]
Alexander L. Strehl, Chris Mesterharm, Michael L. Littman, and Haym Hirsh. Experience-efficient learning in associative bandit problems. In Proceedings of the Twenty-Third International Conference on Machine Learning, pages 889--896, 2006.
[20]
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning:An Introduction. MIT Press, Cambridge, MA, March 1998.
[21]
Chih-Chun Wang, Sanjeev R. Kulkarni, and H. Vincent Poor. Bandit problems with side observations. IEEE Transactions on Automatic Control, 50(3):338--355, 2005.
[22]
Michael Woodroofe. A one-armed bandit problem with a concomitant variable. Journal of the American Statistics Association, 74(368):799--806, 1979.

Cited By

View all
  • (2024)Graph feedback bandits with similar armsProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702817(3021-3040)Online publication date: 15-Jul-2024
  • (2024)Off-policy evaluation beyond overlapProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693021(23734-23757)Online publication date: 21-Jul-2024
  • (2024)Policy-conditioned environment models are more generalizableProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692323(6539-6561)Online publication date: 21-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining
February 2011
870 pages
ISBN:9781450304931
DOI:10.1145/1935826
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. benchmark dataset
  2. contextual bandit
  3. multi-armed bandit
  4. offline evaluation
  5. recommendation

Qualifiers

  • Research-article

Conference

Acceptance Rates

WSDM '11 Paper Acceptance Rate 83 of 372 submissions, 22%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)159
  • Downloads (Last 6 weeks)13
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Graph feedback bandits with similar armsProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702817(3021-3040)Online publication date: 15-Jul-2024
  • (2024)Off-policy evaluation beyond overlapProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693021(23734-23757)Online publication date: 21-Jul-2024
  • (2024)Policy-conditioned environment models are more generalizableProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692323(6539-6561)Online publication date: 21-Jul-2024
  • (2024)Recommending for a Multi-Sided Marketplace: A Multi-Objective Hierarchical ApproachMarketing Science10.1287/mksc.2022.0238Online publication date: 14-Aug-2024
  • (2024)Dynamic Pricing in Securities Lending Market: Application in Revenue Optimization for an Agent Lender PortfolioProceedings of the 5th ACM International Conference on AI in Finance10.1145/3677052.3698611(513-520)Online publication date: 14-Nov-2024
  • (2024)AutoOffAB: Toward Automated Offline A/B Testing for Data-Driven Requirement EngineeringCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663780(472-476)Online publication date: 10-Jul-2024
  • (2024)Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference ElicitationProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688142(74-83)Online publication date: 8-Oct-2024
  • (2024)Handling Varied Objectives by Online Decision MakingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671812(2130-2140)Online publication date: 25-Aug-2024
  • (2024)Personalized Product Assortment with Real-time 3D Perception and Bayesian Payoff EstimationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671518(5161-5171)Online publication date: 25-Aug-2024
  • (2024)Counterfactual Ranking Evaluation with Flexible Click ModelsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657810(1200-1210)Online publication date: 10-Jul-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media