research-article

A contextual-bandit approach to personalized news article recommendation

Authors:

Robert E. SchapireAuthors Info & Claims

WWW '10: Proceedings of the 19th international conference on World wide web

Pages 661 - 670

https://doi.org/10.1145/1772690.1772758

Published: 26 April 2010 Publication History

Abstract

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation.

In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.

The contributions of this work are three-fold. First, we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory. Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5% click lift compared to a standard context-free bandit algorithm, and the advantage becomes even greater when data gets more scarce.

References

[1]

N. Abe, A. W. Biermann, and P. M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4):263--293, 2003.

Digital Library

[2]

D. Agarwal, B.-C. Chen, and P. Elango. Explore/exploit schemes for web content optimization. In Proc. of the 9th International Conf. on Data Mining, 2009.

Digital Library

[3]

D. Agarwal, B.-C. Chen, P. Elango, N. Motgi, S.-T. Park, R. Ramakrishnan, S. Roy, and J. Zachariah. Online models for content optimization. In Advances in Neural Information Processing Systems 21, pages 17--24, 2009.

Digital Library

[4]

R. Agrawal. Sample mean based index policies with o(log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27(4):1054--1078, 1995.

[5]

A. Anagnostopoulos, A. Z. Broder, E. Gabrilovich, V. Josifovski, and L. Riedel. Just-in-time contextual advertising. In Proc. of the 16th ACM Conf. on Information and Knowledge Management, pages 331--340, 2007.

Digital Library

[6]

P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397--422, 2002.

Digital Library

[7]

P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235--256, 2002.

Digital Library

[8]

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48--77, 2002.

Digital Library

[9]

D. A. Berry and B. Fristedt. Bandit Problems: Sequential Allocation of Experiments. Monographs on Statistics and Applied Probability. Chapman and Hall, 1985.

[10]

P. Brusilovsky, A. Kobsa, and W. Nejdl, editors. The Adaptive Web -- Methods and Strategies of Web Personalization, volume 4321 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2007.

Digital Library

[11]

R. Burke. Hybrid systems for personalized recommendations. In B. Mobasher and S. S. Anand, editors, Intelligent Techniques for Web Personalization. Springer-Verlag, 2005.

Digital Library

[12]

W. Chu and S.-T. Park. Personalized recommendation on dynamic content using predictive bilinear models. In Proc. of the 18th International Conf. on World Wide Web, pages 691--700, 2009.

Digital Library

[13]

W. Chu, S.-T. Park, T. Beaupre, N. Motgi, A. Phadke, S. Chakraborty, and J. Zachariah. A case study of behavior-driven conjoint analysis on Yahoo!: Front Page Today Module. In Proc. of the 15th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, pages 1097--1104, 2009.

Digital Library

[14]

A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In Proc. of the 16th International World Wide Web Conf., 2007.

Digital Library

[15]

J. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological), 41:148--177, 1979.

[16]

S. M. Kakade, S. Shalev-Shwartz, and A. Tewari. Efficient bandit algorithms for online multiclass prediction. In Proc. of the 25th International Conf. on Machine Learning, pages 440--447, 2008.

Digital Library

[17]

T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4--22, 1985.

Digital Library

[18]

J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In Advances in Neural Information Processing Systems 20, 2008.

[19]

D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.

Digital Library

[20]

D. Mladenic. Text-learning and related intelligent agents: A survey. IEEE Intelligent Agents, pages 44--54, 1999.

Digital Library

[21]

S.-T. Park, D. Pennock, O. Madani, N. Good, and D. DeCoste. Naïve filterbots for robust cold-start recommendations. In Proc. of the 12th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, pages 699--705, 2006.

Digital Library

[22]

N. G. Pavlidis, D. K. Tasoulis, and D. J. Hand. Simulation studies of multi-armed bandits with covariates. In Proceedings on the 10th International Conf. on Computer Modeling and Simulation, pages 493--498, 2008.

Digital Library

[23]

D. Precup, R. S. Sutton, and S. P. Singh. Eligibility traces for off-policy policy evaluation. In Proc. of the 17th Interational Conf. on Machine Learning, pages 759--766, 2000.

Digital Library

[24]

H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527--535, 1952.

[25]

J. B. Schafer, J. Konstan, and J. Riedi. Recommender systems in e-commerce. In Proc. of the 1st ACM Conf. on Electronic Commerce, 1999.

Digital Library

[26]

W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285--294, 1933.

[27]

T. J. Walsh, I. Szita, C. Diuk, and M. L. Littman. Exploring compact reinforcement-learning representations with linear regression. In Proc. of the 25th Conf. on Uncertainty in Artificial Intelligence, 2009.

Digital Library

Cited By

K P(2025)Introducing Modern Education Curriculum into Corporate Training for Real Estate Agency EmployeesJournal of Educational Research and Policies10.53469/jerp.2025.07(01).037:1(9-15)Online publication date: 31-Jan-2025
https://doi.org/10.53469/jerp.2025.07(01).03
Aguilar-Ortiz JZamudio-García VSerrano-Franco G(2025)Use of Reinforcement Learning Algorithms in the Development of Pedagogical MethodologiesRevolutionizing Pedagogy Through Smart Education10.4018/979-8-3693-7793-2.ch010(181-204)Online publication date: 28-Feb-2025
https://doi.org/10.4018/979-8-3693-7793-2.ch010
Zhou PWei HZhang H(2025)Selective Reviews of Bandit Problems in AI via a Statistical ViewMathematics10.3390/math1304066513:4(665)Online publication date: 18-Feb-2025
https://doi.org/10.3390/math13040665
Show More Cited By

Index Terms

A contextual-bandit approach to personalized news article recommendation

Recommendations

Personalized Recommendation via Parameter-Free Contextual Bandits
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Personalized recommendation services have gained increasing popularity and attention in recent years as most useful information can be accessed online in real-time. Most online recommender systems try to address the information needs of users by virtue ...
Effects of Personalized and Aggregate Top-N Recommendation Lists on User Preference Ratings

Prior research has shown a robust effect of personalized product recommendations on user preference judgments for items. Specifically, the display of system-predicted preference ratings as item recommendations has been shown in multiple studies to bias ...
Personalized news recommendation: a review and an experimental investigation
Special issue on Community Analysis and Information Recommendation

Online news articles, as a new format of press releases, have sprung up on the Internet. With its convenience and recency, more and more people prefer to read news online instead of reading the paper-format press releases. However, a gigantic amount of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '10: Proceedings of the 19th international conference on World wide web

April 2010

1407 pages

ISBN:9781605587998

DOI:10.1145/1772690

General Chairs:
Michael Rappa
North Carolina State University, USA
,
Paul Jones
University of North Carolina at Chapel Hill, USA
,
Program Chairs:
Juliana Freire
University of Utah, USA
,
Soumen Chakrabarti
Indian Institute of Technology, India

Copyright © 2010 International World Wide Web Conference Committee (IW3C2).

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 April 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '10

WWW '10: The 19th International World Wide Web Conference

April 26 - 30, 2010

North Carolina, Raleigh, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,438
Total Citations
View Citations
9,507
Total Downloads

Downloads (Last 12 months)941
Downloads (Last 6 weeks)79

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

K P(2025)Introducing Modern Education Curriculum into Corporate Training for Real Estate Agency EmployeesJournal of Educational Research and Policies10.53469/jerp.2025.07(01).037:1(9-15)Online publication date: 31-Jan-2025
https://doi.org/10.53469/jerp.2025.07(01).03
Aguilar-Ortiz JZamudio-García VSerrano-Franco G(2025)Use of Reinforcement Learning Algorithms in the Development of Pedagogical MethodologiesRevolutionizing Pedagogy Through Smart Education10.4018/979-8-3693-7793-2.ch010(181-204)Online publication date: 28-Feb-2025
https://doi.org/10.4018/979-8-3693-7793-2.ch010
Zhou PWei HZhang H(2025)Selective Reviews of Bandit Problems in AI via a Statistical ViewMathematics10.3390/math1304066513:4(665)Online publication date: 18-Feb-2025
https://doi.org/10.3390/math13040665
Baheri A(2025)Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety GuaranteesMathematics10.3390/math1301014913:1(149)Online publication date: 3-Jan-2025
https://doi.org/10.3390/math13010149
Zhou LXu ZXia QXu ZRen WQi WMa JYan SYang Y(2025)Chasing Common Knowledge: Joint Large Model Selection and Pulling in MEC With Parameter SharingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.352764936:3(437-454)Online publication date: Mar-2025
https://doi.org/10.1109/TPDS.2025.3527649
Wang HCui LWang ELiu J(2025)A Differentially Private Approach for Budgeted Combinatorial Multi-Armed BanditsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.340183622:1(424-439)Online publication date: Jan-2025
https://doi.org/10.1109/TDSC.2024.3401836
Zhang GChen SHuang KPeng Y(2025) Efficient Learning for Selecting Top- m Context-Dependent Designs IEEE Transactions on Automation Science and Engineering10.1109/TASE.2024.339102022(3210-3225)Online publication date: 2025
https://doi.org/10.1109/TASE.2024.3391020
Gangadharan KPurandaran AMalathi KSubramanian BJeyaraj RJung S(2025)From Data to Decisions: The Power of Machine Learning in Business RecommendationsIEEE Access10.1109/ACCESS.2025.353269713(17354-17397)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3532697
Lu DErgan SOzbay K(2025)Reinforcement learning-based optimal control of wearable alarms for consistent roadway workers’ reactions to traffic hazardsJournal of Transportation Safety & Security10.1080/19439962.2024.2449119(1-25)Online publication date: 9-Jan-2025
https://doi.org/10.1080/19439962.2024.2449119
Chen Y(2025)Contextual bandits to increase user prediction accuracy in movie recommendation systemITM Web of Conferences10.1051/itmconf/2025730101873(01018)Online publication date: 17-Feb-2025
https://doi.org/10.1051/itmconf/20257301018
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

EPUB

View this article in ePub.

Figures

Tables

Media

View Table of Conten