research-article

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

Authors:

Xuanhui WangAuthors Info & Claims

WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

Pages 297 - 306

https://doi.org/10.1145/1935826.1935878

Published: 09 February 2011 Publication History

Abstract

Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. Common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modeling bias is usually unavoidably introduced. In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.

Supplementary Material

JPG File (wsdm2011_li_uoe_01.jpg)

Download
11.22 KB

MP4 File (wsdm2011_li_uoe_01.mp4)

Download
154.41 MB

References

[1]

Naoki Abe, Alan W. Biermann, and Philip M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4):263--293, 2003.

Digital Library

[2]

Deepak Agarwal, Bee-Chung Chen, and Pradheep Elango. Explore/exploit schemes for web content optimization. In Proceedings of the Ninth International Conference on Data Mining, 2009.

Digital Library

[3]

Deepak Agarwal, Bee-Chung Chen, and Pradheep Elango. Spatio-temporal models for estimating click-through rate. In Proceedings of the Eighteenth International Conference on World Wide Web, 2009.

Digital Library

[4]

Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397--422, 2002.

Digital Library

[5]

Peter Auer, Nicol`o Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2--3):235--256, 2002.

Digital Library

[6]

Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48--77, 2002.

Digital Library

[7]

Donald A. Berry and Bert Fristedt. Bandit Problems: Sequential Allocation of Experiments. Monographs on Statistics and Applied Probability. Chapman and Hall, 1985.

[8]

J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological), 41:148--177, 1979.

[9]

Leslie Pack Kaelbling. Associative reinforcement learning:Functions in k -DNF. Machine Learning, 15(3):279--298, 1994.

Digital Library

[10]

Michael J. Kearns, Yishay Mansour, and Andrew Y. Ng. Approximate planning in large POMDPs via reusable trajectories. In Advances in Neural Information Processing Systems 12, 2000.

[11]

Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4--22, 1985.

Digital Library

[12]

John Langford, Alexander L. Strehl, and Jennifer Wortman. Exploration scavenging. In Proceedings of the Twenty-Fifth International Conference on Machine Learning, pages 528--535, 2008.

Digital Library

[13]

John Langford and Tong Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In Advances in Neural Information Processing Systems 20, 2008.

[14]

Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the Nineteenth International Conference on World Wide Web, 2010.

Digital Library

[15]

Colin McDiarmid. On the method of bounded differences. In J. Siemons, editor, Surveys in Combinatorics, volume 141 of London Mathematical Society Lecture Notes, pages 148--188. Cambridge University Press, 1989.

[16]

Taesup Moon, Lihong Li, Wei Chu, Ciya Liao, Zhaohui Zheng, and Yi Chang. Online learning for recency search ranking using real-time user feedback. In Proceedings of the Nineteenth International Conference on Knowledge Management, 2010.

Digital Library

[17]

Doina Precup, Richard S. Sutton, and Satinder P. Singh. Eligibility traces for off-policy policy evaluation. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 759--766, 2000.

Digital Library

[18]

Alexander L. Strehl, John Langford, Lihong Li, and Sham M. Kakade. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems 23, 2011.

[19]

Alexander L. Strehl, Chris Mesterharm, Michael L. Littman, and Haym Hirsh. Experience-efficient learning in associative bandit problems. In Proceedings of the Twenty-Third International Conference on Machine Learning, pages 889--896, 2006.

Digital Library

[20]

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning:An Introduction. MIT Press, Cambridge, MA, March 1998.

Digital Library

[21]

Chih-Chun Wang, Sanjeev R. Kulkarni, and H. Vincent Poor. Bandit problems with side observations. IEEE Transactions on Automatic Control, 50(3):338--355, 2005.

[22]

Michael Woodroofe. A one-armed bandit problem with a concomitant variable. Journal of the American Statistics Association, 74(368):799--806, 1979.

Cited By

Qi HFei GZhu LKiyavash NMooij J(2024)Graph feedback bandits with similar armsProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702817(3021-3040)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702817
Khan SSaveski MUgander JSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Off-policy evaluation beyond overlapProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693021(23734-23757)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693021
Chen RChen XSun YXiao SLi MYu YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Policy-conditioned environment models are more generalizableProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692323(6539-6561)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692323
Show More Cited By

Index Terms

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

Recommendations

Ensemble contextual bandits for personalized recommendation
RecSys '14: Proceedings of the 8th ACM Conference on Recommender systems

The cold-start problem has attracted extensive attention among various online services that provide personalized recommendation. Many online vendors employ contextual bandit strategies to tackle the so-called exploration/exploitation dilemma rooted from ...
An unbiased offline evaluation of contextual bandit algorithms with generalized linear models
OTEAE'11: Proceedings of the 2011 International Conference on On-line Trading of Exploration and Exploitation 2 - Volume 26

Contextual bandit algorithms have become popular tools in online recommendation and advertising systems. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very ...
A contextual-bandit approach to personalized news article recommendation
WWW '10: Proceedings of the 19th international conference on World wide web

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

February 2011

870 pages

ISBN:9781450304931

DOI:10.1145/1935826

General Chair:
Irwin King
CUHK, Hong Kong
,
Program Chairs:
Wolfgang Nejdl
L3S and University of Hannover, Germany
,
Hang Li
Microsoft Research Asia, China

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM'11

Sponsor:

WSDM'11: Fourth ACM International Conference on Web Search and Data Mining

February 9 - 12, 2011

Hong Kong, China

Acceptance Rates

WSDM '11 Paper Acceptance Rate 83 of 372 submissions, 22%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

253
Total Citations
View Citations
3,273
Total Downloads

Downloads (Last 12 months)159
Downloads (Last 6 weeks)13

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qi HFei GZhu LKiyavash NMooij J(2024)Graph feedback bandits with similar armsProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702817(3021-3040)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702817
Khan SSaveski MUgander JSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Off-policy evaluation beyond overlapProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693021(23734-23757)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693021
Chen RChen XSun YXiao SLi MYu YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Policy-conditioned environment models are more generalizableProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692323(6539-6561)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692323
Wang YTao LZhang X(2024)Recommending for a Multi-Sided Marketplace: A Multi-Objective Hierarchical ApproachMarketing Science10.1287/mksc.2022.0238Online publication date: 14-Aug-2024
https://doi.org/10.1287/mksc.2022.0238
Xu JHsu YBiscarri W(2024)Dynamic Pricing in Securities Lending Market: Application in Revenue Optimization for an Agent Lender PortfolioProceedings of the 5th ACM International Conference on AI in Finance10.1145/3677052.3698611(513-520)Online publication date: 14-Nov-2024
https://dl.acm.org/doi/10.1145/3677052.3698611
Wu Jd'Amorim M(2024)AutoOffAB: Toward Automated Offline A/B Testing for Data-Driven Requirement EngineeringCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663780(472-476)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663780
Austin DKorikov AToroghi ASanner S(2024)Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference ElicitationProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688142(74-83)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688142
Ma LZhang ZDing YZhou ZBaeza-Yates RBonchi F(2024)Handling Varied Objectives by Online Decision MakingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671812(2130-2140)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671812
Jenkins PSelander MJenkins JMerrill AArmstrong KBaeza-Yates RBonchi F(2024)Personalized Product Assortment with Real-time 3D Perception and Bayesian Payoff EstimationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671518(5161-5171)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671518
Buchholz ALondon BDi Benedetto GLichtenberg JStein YJoachims THui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Counterfactual Ranking Evaluation with Flexible Click ModelsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657810(1200-1210)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657810
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents