Article

Counter example for Q-bucket-brigade under prediction problem

Authors:

Keiki Takadama,

Katsunori ShimoharaAuthors Info & Claims

GECCO '05: Proceedings of the 7th annual workshop on Genetic and evolutionary computation

Pages 94 - 99

https://doi.org/10.1145/1102256.1102278

Published: 25 June 2005 Publication History

Abstract

Aiming to clarify the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our previous work that showed equivalence between LCS's reinforcement process and Reinforcement Learning (RL) with Function approximation (FA) method, we present a counter-example for LCS with Q-bucket-brigade based on the 11-state star problem, a counter-example originally proposed to show the divergence of Q-learning with linear FA. Furthermore, the empirical results applying the counter-example to LCS verified the results predicted from the theory: (1) LCS with Q-bucket-brigade diverged under the prediction problem, where the action selection policy was fixed; and (2) such divergence was avoided by using implicit-bucket-brigade or applying residual gradient algorithm to Q-bucket-brigade.

References

[1]

L. C. Baird. Residual algorithms: Reinforcement learning with function approximation. In International Conference on Machine Learning, pages 30--37, 1995.]]

[2]

L. C. Baird. Reinforcement Learning Through Gradient Descent. PhD thesis, Carnegie Mellon University, Pittsburgh, PA 15213, 1999.]]

[3]

Butz, M., Kovacs, T., Lanzi, P. L., Wilson, S. W.: Toward a theory of generalization and learning in xcs. IEEE Transactions on Evolutionary Computation 8 (2004) 28--46]]

Digital Library

[4]

M. V. Butz and S. W. Wilson. Advances in Learning Classifier Systems, volume LNAI 1996, chapter An Algorithmic Description of XCS, pages 253--272. Berlin: Springer-Verlag, 2001.]]

Digital Library

[5]

G. J. Gordon. Stable function approximation in dynamic programming. In Proceedings of the Twelfth International Conference on Machine Learning, pages 261--268, 1995. Morgan Kaufmann.]]

[6]

P. L. Lanzi. Learning classifier systems from a reinforcement learning perspective. Soft Computing, 6:162--170, 2002.]]

[7]

S. P. Singh, T. Jaakkola, and M. I. Jordan. Reinforcement learning with soft state aggregation. In Advances in Neural Information Processing Systems, volume 7, pages 361--368. The MIT Press, 1995.]]

[8]

R. Sutton and A. Barto. An introduction to reinforcement learning. MIT Press, 1998.]]

Digital Library

[9]

R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems, volume 8, pages 1038--1044. The MIT Press, 1996.]]

Digital Library

[10]

J. N. Tsitsiklis and B. V. Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22(1--3):59--94, 1996.]]

Digital Library

[11]

J. N. Tsitsiklis and B. V. Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674--690, 1997.]]

[12]

A. Wada, K. Takadama, K. Shimohara, and O. Katai. Comparison between Q-learning and ZCS Learning Classifier System: From aspect of function approximation. In The 8th Conference on Intelligent Autonomous Systems, 422--429, 2004.]]

[13]

A. Wada, K. Takadama, K. Shimohara, and O. Katai. Learning classifier system equivalent with reinforcement learning with function approximation. In The Eighth International Workshop on Learning Classifier Systems, 2005. (accepted).]]

Digital Library

[14]

A. Wada, K. Takadama, K. Shimohara, and O. Katai. Foundations on Learning Classifier Systems, chapter Learning Classifier Systems with Convergence and Generalization. Springer, in press.]]

[15]

J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.]]

[16]

S. W. Wilson. ZCS: A zeroth level classifier system. Evolutionary Computation, 2(1):1--18, 1994.]]

Digital Library

[17]

S. W. Wilson. Classifier fitness based on accuracy. Evolutionary Computation, 3(2):149--175, 1995.]]

Digital Library

Cited By

Kovacs T(2012)Genetics-Based Machine LearningHandbook of Natural Computing10.1007/978-3-540-92910-9_30(937-986)Online publication date: 2012
https://doi.org/10.1007/978-3-540-92910-9_30
Lanzi P(2008)Learning classifier systems: then and nowEvolutionary Intelligence10.1007/s12065-007-0003-31:1(63-82)Online publication date: 8-Feb-2008
https://doi.org/10.1007/s12065-007-0003-3

Index Terms

Counter example for Q-bucket-brigade under prediction problem
1. Computing methodologies
  1. Machine learning

Recommendations

Learning classifier system equivalent with reinforcement learning with function approximation
GECCO '05: Proceedings of the 7th annual workshop on Genetic and evolutionary computation

We present an experimental comparison of the reinforcement process between Learning Classifier System (LCS) and Reinforcement Learning (RL) with function approximation (FA) method, regarding their generalization mechanisms. To validate our previous ...
Counter example for Q-bucket-brigade under prediction problem
IWLCS'03-05: Proceedings of the 2003-2005 international conference on Learning classifier systems

Aiming at clarifying the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our ...
Reinforcement learning via approximation of the Q-function

Relational reinforcement learning (RRL) combines traditional reinforcement learning (RL) with a strong emphasis on a relational (rather than attribute-value) representation. Earlier work used RRL on a learning version of the classic Blocks World ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '05: Proceedings of the 7th annual workshop on Genetic and evolutionary computation

June 2005

431 pages

ISBN:9781450378000

DOI:10.1145/1102256

Conference Chair:
Franz Rothlauf
University of Mannheim, Germany

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

GECCO05

Sponsor:

SIGEVO

GECCO05: Genetic and Evolutionary Computation Conference

June 25 - 26, 2005

D.C., Washington

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
134
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kovacs T(2012)Genetics-Based Machine LearningHandbook of Natural Computing10.1007/978-3-540-92910-9_30(937-986)Online publication date: 2012
https://doi.org/10.1007/978-3-540-92910-9_30
Lanzi P(2008)Learning classifier systems: then and nowEvolutionary Intelligence10.1007/s12065-007-0003-31:1(63-82)Online publication date: 8-Feb-2008
https://doi.org/10.1007/s12065-007-0003-3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten