skip to main content
10.1145/3121050.3121108acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
tutorial

Bandit Algorithms in Interactive Information Retrieval

Published: 01 October 2017 Publication History

Abstract

The multi-armed bandit problem models an agent that simultaneously attempts to acquire new knowledge (exploration) and optimize his decisions based on existing knowledge (exploitation). The agent attempts to balance these competing tasks in order to maximize his total value over the period of time considered. There are many practical applications of the bandit model, such as clinical trials, adaptive routing or portfolio design. Over the last decade there has been an increased interest in developing bandit algorithms for specific problems in information, such as diverse document ranking, news recommendation or ranker evaluation. The aim of this tutorial is to provide an overview of the various applications of bandit algorithms in information retrieval as well as issues related to their practical deployment and performance in real-life systems/applications.

References

[1]
K. Ahukorala, A. Medlar, K. Ilves, and D. Glowacka. 2015. Balancing exploration and exploitation: Empirical parameterization of exploratory search systems. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 1703--1706.
[2]
S. Andolina, K. Klouche, J. Peltonen, M. Hoque, T. Ruotsalo, D. Cabral, A. Klami, D. Glowacka, P. Floreen, and G. Jacucci. 2015. In Streams: Smart Parallel Search Streams for Branching Exploratory Search. In Proceedings of the 20th International Conference on Intelligent User Interfaces.
[3]
K. Athukorala, A. Medlar, A. Oulasvirta, G. Jacucci, and D. Glowacka. 2016. Beyond Relevance: Adapting Exploration/Exploitation in Information Retrieval. In Proceedings of the 21st International Conference on Intelligent User Interfaces.
[4]
P. Auer. 2002. Using confidence bounds for exploitation-exploration trade-off's. Journal of Machine Learning Research 3, Nov (2002), 397--422.
[5]
P. Auer, N. Cesa-Bianchi, and P. Fischer. 2002. Finite-time analysis of the multi- armed bandit problem. Machine learning 47, 2--3 (2002), 235--256.
[6]
P. Auer, Z. Hussain, S. Kaski, A. Klami, J. Kujala, J. Laaksonen, A. P. Leung, K. Pasupa, and J. Shawe-Taylor. 2010. Pinview: Implicit Feedback in Content-Based Image Retrieval. In WAPA. 51--57.
[7]
P. Daee, J. Pyykko, D. Glowacka, and S. Kaski. 2016. Interactive Intent Modeling from Multiple Feedback Domains. In Proceedings of the 21st International Conference on Intelligent User Interfaces.
[8]
Y. Gao, K. Ilves, and D. Glowacka. 2015. OfficeHours: A System for Student Supervisor Matching through Reinforcement Learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces Companion.
[9]
John C Gittins. 1979. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological) (1979), 148--177.
[10]
D. Glowacka, T. Ruotsalo, K. Konyushkova, K. Athukorala, S. Kaski, and G. Jacucci. 2013. Directing Exploratory Search: Reinforcement Learning from User Interactions with Keywords. In Proceedings of the 2013 International Conference on Intelligent User Interfaces.
[11]
K. Hofmann, S. Whiteson, and M. de Rijke. 2013. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval 16, 1 (2013), 63--90.
[12]
S. Hore, L. Tyrvainen, J. Pyykko, and D. Glowacka. 2014. A reinforcement learning approach to query-less image retrieval. In International Workshop on Symbiotic Interaction. 121--126.
[13]
A. Kangasraasio, Y. Chen, D. Glowacka, and S. Kaski. 2016. Interactive Modeling of Concept Drift and Errors in Relevance Feedback. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization.
[14]
A. Kangasraasio, D. Glowacka, and S. Kaski. 2015. Improving Controllability and Predictability of Interactive Recommendation Interfaces for Exploratory Search. In Proceedings of the 20th International Conference on Intelligent User Interfaces.
[15]
R. Kleinberg, A. Slivkins, and E. Upfal. 2008. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on theory of computing. 681--690.
[16]
K. Konyushkova and D. Glowacka. 2013. Content-based image retrieval with hierarchical Gaussian Process bandits with self-organizing maps. In ESANN.
[17]
A. Lacerda. 2017. Multi-Objective Ranked Bandits for Recommender Systems. Neurocomputing (2017).
[18]
L. Li, W. Chu, J. Langford, and R. E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661--670.
[19]
S. Li, A. Karatzoglou, and C. Gentile. 2016. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval.
[20]
A. Medlar, K. Ilves, P. Wang, W. Buntine, and D. Glowacka. 2016. PULP: A System for Exploratory Search of Scientific Literature. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[21]
A. Medlar, J. Pyykko, and D. Glowacka. 2017. Towards Fine-Grained Adaptation of Exploration/Exploitation in Information Retrieval. In Proceedings of the 22nd International Conference on Intelligent User Interfaces.
[22]
S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski. 2007. Bandits for taxonomies: A model-based approach. In Proceedings of the 2007 SIAM International Conference on Data Mining. 216--227.
[23]
S. Pandey, D. Chakrabarti, and D. Agarwal. 2007. Multi-armed bandit problems with dependent arms. In Proceedings of the 24th international conference on Machine learning. 721--728.
[24]
F. Radlinski, R. Kleinberg, and T. Joachims. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th international conference on Machine learning.
[25]
T. Ruotsalo, J. Peltonen, M. Eugster, D. Glowacka, K. Konyushkova, K. Athukorala, I. Kosunen, A. Reijonen, P. Myllymaki, G. Jacucci, and S. Kaski. 2013. Directing Exploratory Search with Interactive Intent Modeling. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management.
[26]
T. Ruotsalo, J. Peltonen, M. J.A. Eugster, D. Glowacka, A. Reijonen, G. Jacucci, P. Myllymaki, and S. Kaski. 2015. SciNet: Interactive Intent Modeling for Information Discovery. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[27]
M. Sloan and J. Wang. 2012. Dynamical information retrieval modelling: a portfolio-armed bandit machine approach. In Proceedings of the 21st International Conference on World Wide Web.
[28]
A. Vorobev, D. Lefortier, G. Gusev, and P. Serdyukov. 2015. Gathering additional feedback on search results by multi-armed bandits with respect to production ranking. In Proceedings of the 24th international conference on World wide web.
[29]
X. Wang, Y. Wang, D. Hsu, and Y. Wang. 2014. Exploration in interactive personalized music recommendation: a reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 11, 1 (2014), 7.
[30]
Y. Yue and T. Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning.
[31]
M. Zoghi, S. Whiteson, and M. de Rijke. 2015. MergeRUCB: A method for large-scale online ranker evaluation. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining.

Cited By

View all
  • (2024)Two-Stage Dynamic Creative Optimization Under Sparse Ambiguous Samples for e-Commerce AdvertisingSN Computer Science10.1007/s42979-024-03332-z5:8Online publication date: 26-Oct-2024
  • (2023)MABSearch: The Bandit Way of Learning the Learning Rate—A Harmony Between Reinforcement Learning and Gradient DescentNational Academy Science Letters10.1007/s40009-023-01292-147:1(29-34)Online publication date: 4-Jun-2023
  • (2023)NPROS: A Not So Pure Random Orthogonal search algorithm—A suite of random optimization algorithms driven by reinforcement learningOptimization Letters10.1007/s11590-023-02038-018:9(2091-2111)Online publication date: 11-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval
October 2017
348 pages
ISBN:9781450344906
DOI:10.1145/3121050
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2017

Check for updates

Author Tags

  1. bandit algorithms
  2. exploration-exploitation trade-off
  3. information retrieval
  4. interactive search
  5. personalization
  6. recommender systems
  7. system optimization

Qualifiers

  • Tutorial

Conference

ICTIR '17
Sponsor:

Acceptance Rates

ICTIR '17 Paper Acceptance Rate 27 of 54 submissions, 50%;
Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Two-Stage Dynamic Creative Optimization Under Sparse Ambiguous Samples for e-Commerce AdvertisingSN Computer Science10.1007/s42979-024-03332-z5:8Online publication date: 26-Oct-2024
  • (2023)MABSearch: The Bandit Way of Learning the Learning Rate—A Harmony Between Reinforcement Learning and Gradient DescentNational Academy Science Letters10.1007/s40009-023-01292-147:1(29-34)Online publication date: 4-Jun-2023
  • (2023)NPROS: A Not So Pure Random Orthogonal search algorithm—A suite of random optimization algorithms driven by reinforcement learningOptimization Letters10.1007/s11590-023-02038-018:9(2091-2111)Online publication date: 11-Jul-2023
  • (2022)User Recruitment Algorithm for Maximizing Quality under Limited Budget in Mobile CrowdsensingDiscrete Dynamics in Nature and Society10.1155/2022/48042312022:1Online publication date: 20-Jan-2022
  • (2022)The recommender canvasExpert Systems with Applications: An International Journal10.1016/j.eswa.2019.04.001129:C(97-117)Online publication date: 20-Apr-2022
  • (2021)A Hybrid Bandit Model with Visual Priors for Creative Ranking in Display AdvertisingProceedings of the Web Conference 202110.1145/3442381.3449910(2324-2334)Online publication date: 19-Apr-2021
  • (2021)Reinforcement Learning for Information RetrievalProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462813(2669-2672)Online publication date: 11-Jul-2021
  • (2021)Interactive Information Retrieval: Models, Algorithms, and EvaluationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462811(2662-2665)Online publication date: 11-Jul-2021
  • (2020)Interactive Information Retrieval: Models, Algorithms, and EvaluationProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401424(2444-2447)Online publication date: 25-Jul-2020
  • (2020)Introduction to Bandits in Recommender SystemsProceedings of the 14th ACM Conference on Recommender Systems10.1145/3383313.3411547(748-750)Online publication date: 22-Sep-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media