ACM Home Page
Please provide us with feedback. Feedback
Using inaccurate models in reinforcement learning
Full text PdfPdf (678 KB)
Source ACM International Conference Proceeding Series; Vol. 148 archive
Proceedings of the 23rd international conference on Machine learning table of contents
Pittsburgh, Pennsylvania
Pages: 1 - 8  
Year of Publication: 2006
ISBN:1-59593-383-2
Authors
Pieter Abbeel  Stanford University, Stanford, CA
Morgan Quigley  Stanford University, Stanford, CA
Andrew Y. Ng  Stanford University, Stanford, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 50,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1143844.1143845
What is a DOI?

ABSTRACT

In the model-based policy search approach to reinforcement learning (RL), policies are found using a model (or "simulator") of the Markov decision process. However, for high-dimensional continuous-state tasks, it can be extremely difficult to build an accurate model, and thus often the algorithm returns a policy that works in simulation but not in real-life. The other extreme, model-free RL, tends to require infeasibly large numbers of real-life trials. In this paper, we present a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials. The key idea is to successively "ground" the policy evaluations using real-life trials, but to rely on the approximate model to suggest local changes. Our theoretical results show that this algorithm achieves near-optimal performance in the real system, even when the model is only approximate. Empirical results also demonstrate that---when given only a crude model and a small number of real-life trials---our algorithm can obtain near-optimal performance in the real system.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Abbeel, P., Quigley, M., & Ng, A. Y. (2006). Using inaccurate models in reinforcement learning. (Full paper.) http://www.cs.stanford.edu/~pabbeel/.
 
2
 
3
 
4
 
5
Bagnell, J., Ng, A. Y., & Schneider, J. (2001). Solving uncertain Markov decision problems (Technical Report). Robotics Institute, Carnegie-Mellon University.
 
6
 
7
 
8
Dullerud, G. E., & Paganini, F. (2000). A course in robust control theory: A convex approach, vol. 36 of Texts in Applied Mathematics. Springer - New York.
 
9
Gillespie, T. (1992). Fundamentals of vehicle dynamics. SAE.
 
10
Intel (2001). Opencv libraries for computer vision. http://www.intel.com/research/mrl/research/opencv/.
 
11
Jacobson, D. H., & Mayne, D. Q. (1970). Differential dynamic programming. Elsevier.
 
12
Kohl, N., & Stone, P. (2004). Machine learning for fast quadrupedal locomotion. Proc. AAAI.
 
13
Moore, K. L. (1998). Iterative learning control: An expository overview. Applied and Computational Controls, Signal Processing, and Circuits.
 
14
Morimoto, J., & Atkeson, C. G. (2002). Minimax differential dynamic programming: An application to robust biped walking. NIPS 14.
 
15
Morimoto, J., & Doya, K. (2001). Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems.
 
16
Nilim, A., & El Ghaoui, L. (2005). Robust solutions to Markov decision problems with uncertain transition matrices. Operations Research.
 
17
Stevens, B. L., & Lewis, F. L. (2003). Aircraft control and simulation. Wiley and Sons. 2nd edition.
 
18
 
19


Collaborative Colleagues:
Pieter Abbeel: colleagues
Morgan Quigley: colleagues
Andrew Y. Ng: colleagues