skip to main content
10.1145/1102256.1102278acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
Article

Counter example for Q-bucket-brigade under prediction problem

Published: 25 June 2005 Publication History

Abstract

Aiming to clarify the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our previous work that showed equivalence between LCS's reinforcement process and Reinforcement Learning (RL) with Function approximation (FA) method, we present a counter-example for LCS with Q-bucket-brigade based on the 11-state star problem, a counter-example originally proposed to show the divergence of Q-learning with linear FA. Furthermore, the empirical results applying the counter-example to LCS verified the results predicted from the theory: (1) LCS with Q-bucket-brigade diverged under the prediction problem, where the action selection policy was fixed; and (2) such divergence was avoided by using implicit-bucket-brigade or applying residual gradient algorithm to Q-bucket-brigade.

References

[1]
L. C. Baird. Residual algorithms: Reinforcement learning with function approximation. In International Conference on Machine Learning, pages 30--37, 1995.]]
[2]
L. C. Baird. Reinforcement Learning Through Gradient Descent. PhD thesis, Carnegie Mellon University, Pittsburgh, PA 15213, 1999.]]
[3]
Butz, M., Kovacs, T., Lanzi, P. L., Wilson, S. W.: Toward a theory of generalization and learning in xcs. IEEE Transactions on Evolutionary Computation 8 (2004) 28--46]]
[4]
M. V. Butz and S. W. Wilson. Advances in Learning Classifier Systems, volume LNAI 1996, chapter An Algorithmic Description of XCS, pages 253--272. Berlin: Springer-Verlag, 2001.]]
[5]
G. J. Gordon. Stable function approximation in dynamic programming. In Proceedings of the Twelfth International Conference on Machine Learning, pages 261--268, 1995. Morgan Kaufmann.]]
[6]
P. L. Lanzi. Learning classifier systems from a reinforcement learning perspective. Soft Computing, 6:162--170, 2002.]]
[7]
S. P. Singh, T. Jaakkola, and M. I. Jordan. Reinforcement learning with soft state aggregation. In Advances in Neural Information Processing Systems, volume 7, pages 361--368. The MIT Press, 1995.]]
[8]
R. Sutton and A. Barto. An introduction to reinforcement learning. MIT Press, 1998.]]
[9]
R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems, volume 8, pages 1038--1044. The MIT Press, 1996.]]
[10]
J. N. Tsitsiklis and B. V. Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22(1--3):59--94, 1996.]]
[11]
J. N. Tsitsiklis and B. V. Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674--690, 1997.]]
[12]
A. Wada, K. Takadama, K. Shimohara, and O. Katai. Comparison between Q-learning and ZCS Learning Classifier System: From aspect of function approximation. In The 8th Conference on Intelligent Autonomous Systems, 422--429, 2004.]]
[13]
A. Wada, K. Takadama, K. Shimohara, and O. Katai. Learning classifier system equivalent with reinforcement learning with function approximation. In The Eighth International Workshop on Learning Classifier Systems, 2005. (accepted).]]
[14]
A. Wada, K. Takadama, K. Shimohara, and O. Katai. Foundations on Learning Classifier Systems, chapter Learning Classifier Systems with Convergence and Generalization. Springer, in press.]]
[15]
J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.]]
[16]
S. W. Wilson. ZCS: A zeroth level classifier system. Evolutionary Computation, 2(1):1--18, 1994.]]
[17]
S. W. Wilson. Classifier fitness based on accuracy. Evolutionary Computation, 3(2):149--175, 1995.]]

Cited By

View all

Index Terms

  1. Counter example for Q-bucket-brigade under prediction problem

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '05: Proceedings of the 7th annual workshop on Genetic and evolutionary computation
    June 2005
    431 pages
    ISBN:9781450378000
    DOI:10.1145/1102256
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 June 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. convergence
    2. function approximation
    3. genetic-based machine learning
    4. learning classifier systems
    5. reinforcement learning

    Qualifiers

    • Article

    Conference

    GECCO05
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media