ABSTRACT
Reinforcement learning problems are commonly tackled with temporal difference methods, which estimate the long-term value of taking each action in each state. In most problems of real-world interest, learning this value function requires a function approximator. However, the feasibility of using function approximators depends on the ability of the human designer to select an appropriate representation for the value function. My thesis presents a new approach to function approximation that automates some of these difficult design choices by coupling temporal difference methods with policy search methods such as evolutionary computation. It also presents a particular implementation which combines NEAT, a neuroevolutionary policy search method, and Q-learning, a popular temporal difference method, to yield a new method called NEAT+Q that automatically learns effective representations for neural network function approximators. Empirical results in a server job scheduling task demonstrate that NEAT+Q can outperform both NEAT and Q-learning with manually designed neural networks.
- K. O. Stanley and R. Miikkulainen. Evolving neural networks through augmenting topologies. Evolutionary Computation, 10(2):99--127, 2002. Google ScholarDigital Library
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. Google ScholarDigital Library
Index Terms
- Improving reinforcement learning function approximators via neuroevolution
Recommendations
Improving reinforcement learning function approximators via neuroevolution
AAAI'05: Proceedings of the 20th national conference on Artificial intelligence - Volume 4Reinforcement learning problems are commonly tackled with temporal difference methods, which use dynamic programming and statistical sampling to estimate the long-term value of taking each action in each state. In most problems of real-world interest, ...
Genetic Reinforcement Learning for Neurocontrol Problems
Special issue on genetic algorithmsEmpirical tests indicate that at least one class of genetic algorithms yields good performance for neural network weight optimization in terms of learning rates and scalability. The successful application of these genetic algorithms to supervised ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Comments