ACM Home Page
Please provide us with feedback. Feedback
Comparing evolutionary and temporal difference methods in a reinforcement learning domain
Full text PdfPdf (231 KB)
Source Genetic And Evolutionary Computation Conference archive
Proceedings of the 8th annual conference on Genetic and evolutionary computation table of contents
Seattle, Washington, USA
SESSION: Genetic algorithms: papers table of contents
Pages: 1321 - 1328  
Year of Publication: 2006
ISBN:1-59593-186-4
Authors
Matthew E. Taylor  The University of Texas at Austin, Austin, Texas
Shimon Whiteson  The University of Texas at Austin, Austin, Texas
Peter Stone  The University of Texas at Austin, Austin, Texas
Sponsors
SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 96,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1143997.1144202
What is a DOI?

ABSTRACT

Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods' relative strengths and weaknesses. This paper presents the results of a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. In particular, we compare the performance of NEAT [19], a GA that evolves neural networks, with Sarsa [16, 17], a popular TD method. The results demonstrate that NEAT can learn better policies in this task, though it requires more evaluations to do so. Additional experiments in two variations of Keepaway demonstrate that Sarsa learns better policies when the task is fully observable and NEAT learns faster when the task is deterministic. Together, these results help isolate the factors critical to the performance of each method and yield insights into their general strengths and weaknesses.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
T. Beielstein and S. Markon. Threshold selection, hypothesis tests and DOE methods. In 2002 Congresss on Evolutionary Computation, pages 777--782, 2002.
 
2
S. J. Bradtke and M. O. Duff. Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, pages 393--400, San Mateo, CA, 1995. Morgan Kaufmann.
 
3
 
4
F. Gomez and R. Miikkulainen. Robust nonlinear control through neuroevolution. Technical Report AI02-292, Department of Computer Sciences, University of Texas at Austin, 2002.
5
 
6
F. Gruau, D. Whitley, and L. Pyeatt. A comparison between cellular encoding and direct encoding for genetic neural networks. In J. R. Koza, D. E. Goldberg, D. B. Fogel, and R. L. Riolo, editors, Genetic Programming 1996: Proceedings of the First Annual Conference, pages 81--89. MIT Press, 1996.
 
7
 
8
K. Kostiadis and H. Hu. KaBaGe-RL: Kanerva-based generalisation and reinforcement learning for possession football. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2001), October 2001.
 
9
 
10
I. Noda, H. Matsubara, K. Hiraki, and I. Frank. Soccer server: A tool for research on multiagent systems. Applied Artificial Intelligence, 12:233--250, 1998.
 
11
J. B. Pollack, A. D. Blair, and M. Land. Coevolution of a backgammon player. In C. G. Langton and K. Shimohara, editors, Artificial Life V: Proc. of the Fifth Int. Workshop on the Synthesis and Simulation of Living Systems, pages 92--98, Cambridge, MA, 1997. The MIT Press.
 
12
 
13
 
14
 
15
N. J. Radcliffe. Genetic set recombination and its application to neural network topology optimization. Neural computing and applications, 1(1):67--90, 1993.
 
16
G. A. Rummery and M. Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG-RT 116, Engineering Department, Cambridge University, 1994.
 
17
 
18
 
19
 
20
K. O. Stanley and R. Miikkulainen. Competitive coevolution through evolutionary complexification. Journal of Artificial Intelligence Research, 21:63--100, 2004.
 
21
K. O. Stanley and R. Miikkulainen. Evolving a roving eye for go. In Proceedinngs of the Genetic and Evolutionary Computation Conference, 2004.
 
22
P. Stone, G. Kuhlmann, M. E. Taylor, and Y. Liu. Keepaway soccer: From machine learning testbed to benchmark. In I. Noda, A. Jacoff, A. Bredenfeld, and Y. Takahashi, editors, RoboCup-2005: Robot Soccer World Cup IX. Springer Verlag, Berlin, 2006. To appear.
 
23
P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3):165--188, 2005.
 
24
R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 8, pages 1038--1044, 1996.
 
25
 
26
 
27
S. Whiteson and P. Stone. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research, 2006. To appear.
 
28
X. Yao. Evolving artificial neural networks. Proceedings of the IEEE, 87(9):1423--1447, 1999.


Collaborative Colleagues:
Matthew E. Taylor: colleagues
Shimon Whiteson: colleagues
Peter Stone: colleagues