|
ABSTRACT
Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods' relative strengths and weaknesses. This paper presents the results of a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. In particular, we compare the performance of NEAT [19], a GA that evolves neural networks, with Sarsa [16, 17], a popular TD method. The results demonstrate that NEAT can learn better policies in this task, though it requires more evaluations to do so. Additional experiments in two variations of Keepaway demonstrate that Sarsa learns better policies when the task is fully observable and NEAT learns faster when the task is deterministic. Together, these results help isolate the factors critical to the performance of each method and yield insights into their general strengths and weaknesses.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
T. Beielstein and S. Markon. Threshold selection, hypothesis tests and DOE methods. In 2002 Congresss on Evolutionary Computation, pages 777--782, 2002.
|
| |
2
|
S. J. Bradtke and M. O. Duff. Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, pages 393--400, San Mateo, CA, 1995. Morgan Kaufmann.
|
| |
3
|
|
| |
4
|
F. Gomez and R. Miikkulainen. Robust nonlinear control through neuroevolution. Technical Report AI02-292, Department of Computer Sciences, University of Texas at Austin, 2002.
|
 |
5
|
|
| |
6
|
F. Gruau, D. Whitley, and L. Pyeatt. A comparison between cellular encoding and direct encoding for genetic neural networks. In J. R. Koza, D. E. Goldberg, D. B. Fogel, and R. L. Riolo, editors, Genetic Programming 1996: Proceedings of the First Annual Conference, pages 81--89. MIT Press, 1996.
|
| |
7
|
|
| |
8
|
K. Kostiadis and H. Hu. KaBaGe-RL: Kanerva-based generalisation and reinforcement learning for possession football. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2001), October 2001.
|
| |
9
|
|
| |
10
|
I. Noda, H. Matsubara, K. Hiraki, and I. Frank. Soccer server: A tool for research on multiagent systems. Applied Artificial Intelligence, 12:233--250, 1998.
|
| |
11
|
J. B. Pollack, A. D. Blair, and M. Land. Coevolution of a backgammon player. In C. G. Langton and K. Shimohara, editors, Artificial Life V: Proc. of the Fifth Int. Workshop on the Synthesis and Simulation of Living Systems, pages 92--98, Cambridge, MA, 1997. The MIT Press.
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
N. J. Radcliffe. Genetic set recombination and its application to neural network topology optimization. Neural computing and applications, 1(1):67--90, 1993.
|
| |
16
|
G. A. Rummery and M. Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG-RT 116, Engineering Department, Cambridge University, 1994.
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
K. O. Stanley and R. Miikkulainen. Competitive coevolution through evolutionary complexification. Journal of Artificial Intelligence Research, 21:63--100, 2004.
|
| |
21
|
K. O. Stanley and R. Miikkulainen. Evolving a roving eye for go. In Proceedinngs of the Genetic and Evolutionary Computation Conference, 2004.
|
| |
22
|
P. Stone, G. Kuhlmann, M. E. Taylor, and Y. Liu. Keepaway soccer: From machine learning testbed to benchmark. In I. Noda, A. Jacoff, A. Bredenfeld, and Y. Takahashi, editors, RoboCup-2005: Robot Soccer World Cup IX. Springer Verlag, Berlin, 2006. To appear.
|
| |
23
|
P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3):165--188, 2005.
|
| |
24
|
R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 8, pages 1038--1044, 1996.
|
| |
25
|
|
| |
26
|
|
| |
27
|
S. Whiteson and P. Stone. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research, 2006. To appear.
|
| |
28
|
X. Yao. Evolving artificial neural networks. Proceedings of the IEEE, 87(9):1423--1447, 1999.
|
CITED BY 5
|
Yohannes Kassahun , Mark Edgington , Jan Hendrik Metzen , Gerald Sommer , Frank Kirchner, A common genetic encoding for both direct and indirect encodings of networks, Proceedings of the 9th annual conference on Genetic and evolutionary computation, July 07-11, 2007, London, England
|
|
|
|
|
|
|
Shimon Whiteson , Matthew E. Taylor , Peter Stone, Empirical Studies in Action Selection with Reinforcement Learning, Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, v.15 n.1, p.33-50, March 2007
|
|
|
|