skip to main content
10.1145/1329125.1329173acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Theoretical advantages of lenient Q-learners: an evolutionary game theoretic perspective

Published: 14 May 2007 Publication History

Abstract

This paper presents the dynamics of multiple reinforcement learning agents from an Evolutionary Game Theoretic (EGT) perspective. We provide a Replicator Dynamics model for traditional multiagent Q-learning, and we extend these differential equations to account for lenient learners: agents that forgive possible mistakes of their teammates that resulted in lower rewards. We use this extended formal model to visualize the basins of attraction of both traditional and lenient multiagent Q-learners in two benchmark coordination problems. The results indicate that lenience provides learners with more accurate estimates for the utility of their actions, resulting in higher likelihood of convergence to the globally optimal solution. In addition, our research supports the strength of EGT as a backbone for multiagent reinforcement learning.

References

[1]
C. Claus and G. Boutilier. The dynamics of reinforcement learning in cooperative multi-agent systems. In Proceedings of the 15th International Conference on Artificial Intelligence, pages 746--752, 1998.
[2]
H. Gintis. Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction. Princeton University Press, 2001.
[3]
J. Hofbauer and K. Sigmund. Evolutionary Games and Population Dynamics. Cambridge University Press, 1998.
[4]
S. Kapetanakis and D. Kudenko. Reinforcement learning of coordination in cooperative multi-agent systems. In Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-02), 2002.
[5]
M. Lauer and M. Riedmiller. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 535--542. Morgan Kaufmann, 2000.
[6]
J. Maynard-Smith. Evolution and the Theory of Games. Cambridge University Press, 1982.
[7]
J. Maynard-Smith and J. Price. The logic of animal conflict. Nature, 146:15--18, 1973.
[8]
L. Panait, K. Sullivan, and S. Luke. Lenience towards teammates helps in cooperative multiagent learning. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi Agent Systems -- AAMAS-2006. ACM, 2006.
[9]
L. Samuelson. Evolutionary Games and Equilibrium Selection. MIT Press, Cambridge, MA, 1997.
[10]
S. P. Singh, M. J. Kearns, and Y. Mansour. Nash convergence of gradient dynamics in general-sum games. In UAI '00: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pages 541--548, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
[11]
R. Sutton and A. Barto. Reinforcement Learning: An introduction. Cambridge, MA: MIT Press., 1998.
[12]
K. Tuyls, P. 't Hoen, and B. Vanschoenwinkel. An evolutionary dynamical analysis of multi-agent learning in iterated games. The Journal of Autonomous Agents and Multi-Agent Systems, 12:115--153, 2006.
[13]
K. Tuyls, K. Verbeeck, and T. Lenaerts. A Selection-Mutation model for Q-learning in Multi-Agent Systems. In The second International Joint Conference on Autonomous Agents and Multi-Agent Systems. ACM Press, Melbourne, Australia, 2003.
[14]
F. Vega-Redondo. Economics and the Theory of Games. Cambridge University Press, 2003.
[15]
C. Watkins and P. Dayan. Q-learning. Machine Learning, 8:279--292, 1992.
[16]
C. J. Watkins. Models of Delayed Reinforcement Learning. PhD thesis, Psychology Department, Cambridge University, Cambridge, United Kingdom, 1989.
[17]
J. W. Weibull. Evolutionary Game Theory. MIT Press, 1996.
[18]
R. P. Wiegand. An Analysis of Cooperative Coevolutionary Algorithms. PhD thesis, George Mason University, Fairfax, Virginia, 2004.

Cited By

View all
  • (2024)Hypergraph-Based Model for Modeling Multi-Agent Q-Learning Dynamics in Public Goods GamesIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.347394111:6(6169-6179)Online publication date: Nov-2024
  • (2024)The Evolutionary Dynamics of Soft-Max Policy Gradient in Multi-Agent SettingsTheoretical Computer Science10.1016/j.tcs.2024.115011(115011)Online publication date: Dec-2024
  • (2024)Cooperative coevolution for non-separable large-scale black-box optimization: Convergence analyses and distributed accelerationsApplied Soft Computing10.1016/j.asoc.2024.112232166(112232)Online publication date: Nov-2024
  • Show More Cited By
  1. Theoretical advantages of lenient Q-learners: an evolutionary game theoretic perspective

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
    May 2007
    1585 pages
    ISBN:9788190426275
    DOI:10.1145/1329125
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • IFAAMAS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 May 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    AAMAS07
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Hypergraph-Based Model for Modeling Multi-Agent Q-Learning Dynamics in Public Goods GamesIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.347394111:6(6169-6179)Online publication date: Nov-2024
    • (2024)The Evolutionary Dynamics of Soft-Max Policy Gradient in Multi-Agent SettingsTheoretical Computer Science10.1016/j.tcs.2024.115011(115011)Online publication date: Dec-2024
    • (2024)Cooperative coevolution for non-separable large-scale black-box optimization: Convergence analyses and distributed accelerationsApplied Soft Computing10.1016/j.asoc.2024.112232166(112232)Online publication date: Nov-2024
    • (2023)Double cyclic dominance promotes cooperation in spatial social dilemmasChaos, Solitons & Fractals10.1016/j.chaos.2023.113649173(113649)Online publication date: Aug-2023
    • (2022)The Evolutionary Dynamics of Soft-Max Policy Gradient in Multi-Agent SettingsProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3536029(1545-1547)Online publication date: 9-May-2022
    • (2014)The Potential Impact of Intelligent Systems for Mobile Health Self-Management Support: Monte Carlo Simulations of Text Message Support for Medication AdherenceAnnals of Behavioral Medicine10.1007/s12160-014-9634-749:1(84-94)Online publication date: 1-Aug-2014
    • (2009)A proximate dynamics model for data miningExpert Systems with Applications: An International Journal10.1016/j.eswa.2009.02.03336:6(9819-9833)Online publication date: 1-Aug-2009
    • (2008)Switching dynamics of multi-agent learningProceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 110.5555/1402383.1402430(307-313)Online publication date: 12-May-2008
    • (2007)Multi-agent Learning DynamicsProceedings of the 11th international workshop on Cooperative Information Agents XI10.1007/978-3-540-75119-9_4(36-56)Online publication date: 19-Sep-2007
    • (undefined)The Evolutionary Dynamics of Soft-Max Policy Gradient in Multi-Agent SettingSSRN Electronic Journal10.2139/ssrn.4159835

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media