tutorial

Bandit Algorithms in Interactive Information Retrieval

Author:

Dorota GlowackaAuthors Info & Claims

ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval

Pages 327 - 328

https://doi.org/10.1145/3121050.3121108

Published: 01 October 2017 Publication History

Abstract

The multi-armed bandit problem models an agent that simultaneously attempts to acquire new knowledge (exploration) and optimize his decisions based on existing knowledge (exploitation). The agent attempts to balance these competing tasks in order to maximize his total value over the period of time considered. There are many practical applications of the bandit model, such as clinical trials, adaptive routing or portfolio design. Over the last decade there has been an increased interest in developing bandit algorithms for specific problems in information, such as diverse document ranking, news recommendation or ranker evaluation. The aim of this tutorial is to provide an overview of the various applications of bandit algorithms in information retrieval as well as issues related to their practical deployment and performance in real-life systems/applications.

References

[1]

K. Ahukorala, A. Medlar, K. Ilves, and D. Glowacka. 2015. Balancing exploration and exploitation: Empirical parameterization of exploratory search systems. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 1703--1706.

Digital Library

[2]

S. Andolina, K. Klouche, J. Peltonen, M. Hoque, T. Ruotsalo, D. Cabral, A. Klami, D. Glowacka, P. Floreen, and G. Jacucci. 2015. In Streams: Smart Parallel Search Streams for Branching Exploratory Search. In Proceedings of the 20th International Conference on Intelligent User Interfaces.

Digital Library

[3]

K. Athukorala, A. Medlar, A. Oulasvirta, G. Jacucci, and D. Glowacka. 2016. Beyond Relevance: Adapting Exploration/Exploitation in Information Retrieval. In Proceedings of the 21st International Conference on Intelligent User Interfaces.

Digital Library

[4]

P. Auer. 2002. Using confidence bounds for exploitation-exploration trade-off's. Journal of Machine Learning Research 3, Nov (2002), 397--422.

Digital Library

[5]

P. Auer, N. Cesa-Bianchi, and P. Fischer. 2002. Finite-time analysis of the multi- armed bandit problem. Machine learning 47, 2--3 (2002), 235--256.

Digital Library

[6]

P. Auer, Z. Hussain, S. Kaski, A. Klami, J. Kujala, J. Laaksonen, A. P. Leung, K. Pasupa, and J. Shawe-Taylor. 2010. Pinview: Implicit Feedback in Content-Based Image Retrieval. In WAPA. 51--57.

[7]

P. Daee, J. Pyykko, D. Glowacka, and S. Kaski. 2016. Interactive Intent Modeling from Multiple Feedback Domains. In Proceedings of the 21st International Conference on Intelligent User Interfaces.

Digital Library

[8]

Y. Gao, K. Ilves, and D. Glowacka. 2015. OfficeHours: A System for Student Supervisor Matching through Reinforcement Learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces Companion.

Digital Library

[9]

John C Gittins. 1979. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological) (1979), 148--177.

[10]

D. Glowacka, T. Ruotsalo, K. Konyushkova, K. Athukorala, S. Kaski, and G. Jacucci. 2013. Directing Exploratory Search: Reinforcement Learning from User Interactions with Keywords. In Proceedings of the 2013 International Conference on Intelligent User Interfaces.

Digital Library

[11]

K. Hofmann, S. Whiteson, and M. de Rijke. 2013. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval 16, 1 (2013), 63--90.

Digital Library

[12]

S. Hore, L. Tyrvainen, J. Pyykko, and D. Glowacka. 2014. A reinforcement learning approach to query-less image retrieval. In International Workshop on Symbiotic Interaction. 121--126.

[13]

A. Kangasraasio, Y. Chen, D. Glowacka, and S. Kaski. 2016. Interactive Modeling of Concept Drift and Errors in Relevance Feedback. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization.

Digital Library

[14]

A. Kangasraasio, D. Glowacka, and S. Kaski. 2015. Improving Controllability and Predictability of Interactive Recommendation Interfaces for Exploratory Search. In Proceedings of the 20th International Conference on Intelligent User Interfaces.

Digital Library

[15]

R. Kleinberg, A. Slivkins, and E. Upfal. 2008. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on theory of computing. 681--690.

Digital Library

[16]

K. Konyushkova and D. Glowacka. 2013. Content-based image retrieval with hierarchical Gaussian Process bandits with self-organizing maps. In ESANN.

[17]

A. Lacerda. 2017. Multi-Objective Ranked Bandits for Recommender Systems. Neurocomputing (2017).

Digital Library

[18]

L. Li, W. Chu, J. Langford, and R. E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661--670.

Digital Library

[19]

S. Li, A. Karatzoglou, and C. Gentile. 2016. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval.

Digital Library

[20]

A. Medlar, K. Ilves, P. Wang, W. Buntine, and D. Glowacka. 2016. PULP: A System for Exploratory Search of Scientific Literature. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[21]

A. Medlar, J. Pyykko, and D. Glowacka. 2017. Towards Fine-Grained Adaptation of Exploration/Exploitation in Information Retrieval. In Proceedings of the 22nd International Conference on Intelligent User Interfaces.

Digital Library

[22]

S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski. 2007. Bandits for taxonomies: A model-based approach. In Proceedings of the 2007 SIAM International Conference on Data Mining. 216--227.

[23]

S. Pandey, D. Chakrabarti, and D. Agarwal. 2007. Multi-armed bandit problems with dependent arms. In Proceedings of the 24th international conference on Machine learning. 721--728.

Digital Library

[24]

F. Radlinski, R. Kleinberg, and T. Joachims. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th international conference on Machine learning.

Digital Library

[25]

T. Ruotsalo, J. Peltonen, M. Eugster, D. Glowacka, K. Konyushkova, K. Athukorala, I. Kosunen, A. Reijonen, P. Myllymaki, G. Jacucci, and S. Kaski. 2013. Directing Exploratory Search with Interactive Intent Modeling. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management.

Digital Library

[26]

T. Ruotsalo, J. Peltonen, M. J.A. Eugster, D. Glowacka, A. Reijonen, G. Jacucci, P. Myllymaki, and S. Kaski. 2015. SciNet: Interactive Intent Modeling for Information Discovery. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[27]

M. Sloan and J. Wang. 2012. Dynamical information retrieval modelling: a portfolio-armed bandit machine approach. In Proceedings of the 21st International Conference on World Wide Web.

Digital Library

[28]

A. Vorobev, D. Lefortier, G. Gusev, and P. Serdyukov. 2015. Gathering additional feedback on search results by multi-armed bandits with respect to production ranking. In Proceedings of the 24th international conference on World wide web.

Digital Library

[29]

X. Wang, Y. Wang, D. Hsu, and Y. Wang. 2014. Exploration in interactive personalized music recommendation: a reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 11, 1 (2014), 7.

Digital Library

[30]

Y. Yue and T. Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning.

Digital Library

[31]

M. Zoghi, S. Whiteson, and M. de Rijke. 2015. MergeRUCB: A method for large-scale online ranker evaluation. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining.

Digital Library

Cited By

Li GYang X(2024)Two-Stage Dynamic Creative Optimization Under Sparse Ambiguous Samples for e-Commerce AdvertisingSN Computer Science10.1007/s42979-024-03332-z5:8Online publication date: 26-Oct-2024
https://doi.org/10.1007/s42979-024-03332-z
Syed Shahul Hameed ARajagopalan N(2023)MABSearch: The Bandit Way of Learning the Learning Rate—A Harmony Between Reinforcement Learning and Gradient DescentNational Academy Science Letters10.1007/s40009-023-01292-147:1(29-34)Online publication date: 4-Jun-2023
https://doi.org/10.1007/s40009-023-01292-1
Hameed ARajagopalan N(2023)NPROS: A Not So Pure Random Orthogonal search algorithm—A suite of random optimization algorithms driven by reinforcement learningOptimization Letters10.1007/s11590-023-02038-018:9(2091-2111)Online publication date: 11-Jul-2023
https://doi.org/10.1007/s11590-023-02038-0
Show More Cited By

Index Terms

Bandit Algorithms in Interactive Information Retrieval
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Online learning settings
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Information retrieval diversity
    2. Users and interactive retrieval
      1. Personalization

Recommendations

Introduction to Bandits in Recommender Systems
RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

The multi-armed bandit problem models an agent that simultaneously attempts to acquire new knowledge (exploration) and optimize his decisions based on existing knowledge (exploitation). The agent attempts to balance these competing tasks in order to ...
Bandit algorithms in recommender systems
RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems

The multi-armed bandit problem models an agent that simultaneously attempts to acquire new knowledge (exploration) and optimize his decisions based on existing knowledge (exploitation). The agent attempts to balance these competing tasks in order to ...
A contextual-bandit approach to personalized news article recommendation
WWW '10: Proceedings of the 19th international conference on World wide web

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval

October 2017

348 pages

ISBN:9781450344906

DOI:10.1145/3121050

General Chairs:
Jaap Kamps
University of Amsterdam, The Netherlands
,
Evangelos Kanoulas
University of Amsterdam, The Netherlands
,
Maarten de Rijke
University of Amsterdam, The Netherlands
,
Program Chairs:
Hui Fang
University of Delaware, USA
,
Emine Yilmaz
University College London, UK

Copyright © 2017 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2017

Check for updates

Author Tags

Qualifiers

Tutorial

Conference

ICTIR '17

Sponsor:

SIGIR

ICTIR '17: ACM SIGIR International Conference on the Theory of Information Retrieval

October 1 - 4, 2017

Amsterdam, The Netherlands

Acceptance Rates

ICTIR '17 Paper Acceptance Rate 27 of 54 submissions, 50%;

Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
304
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)2

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li GYang X(2024)Two-Stage Dynamic Creative Optimization Under Sparse Ambiguous Samples for e-Commerce AdvertisingSN Computer Science10.1007/s42979-024-03332-z5:8Online publication date: 26-Oct-2024
https://doi.org/10.1007/s42979-024-03332-z
Syed Shahul Hameed ARajagopalan N(2023)MABSearch: The Bandit Way of Learning the Learning Rate—A Harmony Between Reinforcement Learning and Gradient DescentNational Academy Science Letters10.1007/s40009-023-01292-147:1(29-34)Online publication date: 4-Jun-2023
https://doi.org/10.1007/s40009-023-01292-1
Hameed ARajagopalan N(2023)NPROS: A Not So Pure Random Orthogonal search algorithm—A suite of random optimization algorithms driven by reinforcement learningOptimization Letters10.1007/s11590-023-02038-018:9(2091-2111)Online publication date: 11-Jul-2023
https://doi.org/10.1007/s11590-023-02038-0
Jiang WChen PZhang WSun YJunpeng CWen Q(2022)User Recruitment Algorithm for Maximizing Quality under Limited Budget in Mobile CrowdsensingDiscrete Dynamics in Nature and Society10.1155/2022/48042312022:1Online publication date: 20-Jan-2022
https://doi.org/10.1155/2022/4804231
van Capelleveen GAmrit CYazan DZijm H(2022)The recommender canvasExpert Systems with Applications: An International Journal10.1016/j.eswa.2019.04.001129:C(97-117)Online publication date: 20-Apr-2022
https://dl.acm.org/doi/10.1016/j.eswa.2019.04.001
Wang SLiu QGe TLian DZhang Z(2021)A Hybrid Bandit Model with Visual Priors for Creative Ranking in Display AdvertisingProceedings of the Web Conference 202110.1145/3442381.3449910(2324-2334)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449910
Kuhnle AAroca-Ouellette MBasu ASensoy MReid JZhang DDiaz FShah CSuel TCastells PJones RSakai T(2021)Reinforcement Learning for Information RetrievalProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462813(2669-2672)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3462813
Zhai CDiaz FShah CSuel TCastells PJones RSakai T(2021)Interactive Information Retrieval: Models, Algorithms, and EvaluationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462811(2662-2665)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3462811
Zhai CHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)Interactive Information Retrieval: Models, Algorithms, and EvaluationProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401424(2444-2447)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401424
Barraza-Urbina AGlowacka D(2020)Introduction to Bandits in Recommender SystemsProceedings of the 14th ACM Conference on Recommender Systems10.1145/3383313.3411547(748-750)Online publication date: 22-Sep-2020
https://dl.acm.org/doi/10.1145/3383313.3411547
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents