| Using inaccurate models in reinforcement learning |
| Full text |
Pdf
(678 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 148
archive
Proceedings of the 23rd international conference on Machine learning
table of contents
Pittsburgh, Pennsylvania
Pages: 1 - 8
Year of Publication: 2006
ISBN:1-59593-383-2
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 50, Citation Count: 1
|
|
|
ABSTRACT
In the model-based policy search approach to reinforcement learning (RL), policies are found using a model (or "simulator") of the Markov decision process. However, for high-dimensional continuous-state tasks, it can be extremely difficult to build an accurate model, and thus often the algorithm returns a policy that works in simulation but not in real-life. The other extreme, model-free RL, tends to require infeasibly large numbers of real-life trials. In this paper, we present a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials. The key idea is to successively "ground" the policy evaluations using real-life trials, but to rely on the approximate model to suggest local changes. Our theoretical results show that this algorithm achieves near-optimal performance in the real system, even when the model is only approximate. Empirical results also demonstrate that---when given only a crude model and a small number of real-life trials---our algorithm can obtain near-optimal performance in the real system.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Abbeel, P., Quigley, M., & Ng, A. Y. (2006). Using inaccurate models in reinforcement learning. (Full paper.) http://www.cs.stanford.edu/~pabbeel/.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
Bagnell, J., Ng, A. Y., & Schneider, J. (2001). Solving uncertain Markov decision problems (Technical Report). Robotics Institute, Carnegie-Mellon University.
|
| |
6
|
|
| |
7
|
|
| |
8
|
Dullerud, G. E., & Paganini, F. (2000). A course in robust control theory: A convex approach, vol. 36 of Texts in Applied Mathematics. Springer - New York.
|
| |
9
|
Gillespie, T. (1992). Fundamentals of vehicle dynamics. SAE.
|
| |
10
|
Intel (2001). Opencv libraries for computer vision. http://www.intel.com/research/mrl/research/opencv/.
|
| |
11
|
Jacobson, D. H., & Mayne, D. Q. (1970). Differential dynamic programming. Elsevier.
|
| |
12
|
Kohl, N., & Stone, P. (2004). Machine learning for fast quadrupedal locomotion. Proc. AAAI.
|
| |
13
|
Moore, K. L. (1998). Iterative learning control: An expository overview. Applied and Computational Controls, Signal Processing, and Circuits.
|
| |
14
|
Morimoto, J., & Atkeson, C. G. (2002). Minimax differential dynamic programming: An application to robust biped walking. NIPS 14.
|
| |
15
|
Morimoto, J., & Doya, K. (2001). Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems.
|
| |
16
|
Nilim, A., & El Ghaoui, L. (2005). Robust solutions to Markov decision problems with uncertain transition matrices. Operations Research.
|
| |
17
|
Stevens, B. L., & Lewis, F. L. (2003). Aircraft control and simulation. Wiley and Sons. 2nd edition.
|
| |
18
|
|
| |
19
|
|
CITED BY
|
Julian Togelius , Renzo De Nardi , Hugo Marques , Richard Newcombe , Simon M. Lucas , Owen Holland, Nonlinear dynamics modelling for controller evolution, Proceedings of the 9th annual conference on Genetic and evolutionary computation, July 07-11, 2007, London, England
|
|