ACM Home Page
Please provide us with feedback. Feedback
Automatic basis function construction for approximate dynamic programming and reinforcement learning
Full text PdfPdf (748 KB)
Source ACM International Conference Proceeding Series; Vol. 148 archive
Proceedings of the 23rd international conference on Machine learning table of contents
Pittsburgh, Pennsylvania
Pages: 449 - 456  
Year of Publication: 2006
ISBN:1-59593-383-2
Authors
Philipp W. Keller  McGill University, Montreal, QC, Canada
Shie Mannor  McGill University, Montreal, QC, Canada
Doina Precup  McGill University, Montreal, QC, Canada
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 68,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1143844.1143901
What is a DOI?

ABSTRACT

We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castañon (1989) who proposed a method for automatically aggregating states to speed up value iteration. We propose to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a high-dimensional state space to a low-dimensional space, based on the Bellman error, or on the temporal difference (TD) error. We then place basis function in the lower-dimensional space. These are added as new features for the linear function approximator. This approach is applied to a high-dimensional inventory control problem.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bertsekas, D. (2005). Dynamic programming and optimal control vol. 1 third edition. Athena Scientific.
 
2
Bertsekas, D., & Castaññon, D. (1989). Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control, 34, 589--598.
 
3
 
4
5
 
6
Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2005). Neighbourhood components analysis. In NIPS 17, 513--520.
 
7
Jacobs, R. A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295--307.
 
8
Mahadevan, S. (2005). Samuel meets Amarel: Automating value function approximation using global state space analysis. In Proceedings of AAAI.
 
9
Mannor, S., Menache, I., & Shimkin, N. (2005). Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134, 215--238.
 
10
 
11
Ratitch, B., & Precup, D. (2004). Sparse distributed memories for on-line value-based reinforcement learning. In Proceedings of ECML.
 
12
Smart, W. (2004). Explicit manifold representations for value-functions in reinforcement learning. In Proceedings of the 8th int. symp. on ai and mathematics.
 
13
 
14
 
15
 
16
Tsitsiklis, J., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674--690.
 
17
Ziv, O., & Shimkin, N. (2005). Multigrid algorithms for temporal difference reinforcement learning. In Proc. icml workshop on rich representations for rl.


Collaborative Colleagues:
Philipp W. Keller: colleagues
Shie Mannor: colleagues
Doina Precup: colleagues