| Automatic basis function construction for approximate dynamic programming and reinforcement learning |
| Full text |
Pdf
(748 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 148
archive
Proceedings of the 23rd international conference on Machine learning
table of contents
Pittsburgh, Pennsylvania
Pages: 449 - 456
Year of Publication: 2006
ISBN:1-59593-383-2
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 68, Citation Count: 5
|
|
|
ABSTRACT
We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castañon (1989) who proposed a method for automatically aggregating states to speed up value iteration. We propose to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a high-dimensional state space to a low-dimensional space, based on the Bellman error, or on the temporal difference (TD) error. We then place basis function in the lower-dimensional space. These are added as new features for the linear function approximator. This approach is applied to a high-dimensional inventory control problem.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bertsekas, D. (2005). Dynamic programming and optimal control vol. 1 third edition. Athena Scientific.
|
| |
2
|
Bertsekas, D., & Castaññon, D. (1989). Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control, 34, 589--598.
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2005). Neighbourhood components analysis. In NIPS 17, 513--520.
|
| |
7
|
Jacobs, R. A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295--307.
|
| |
8
|
Mahadevan, S. (2005). Samuel meets Amarel: Automating value function approximation using global state space analysis. In Proceedings of AAAI.
|
| |
9
|
Mannor, S., Menache, I., & Shimkin, N. (2005). Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134, 215--238.
|
| |
10
|
|
| |
11
|
Ratitch, B., & Precup, D. (2004). Sparse distributed memories for on-line value-based reinforcement learning. In Proceedings of ECML.
|
| |
12
|
Smart, W. (2004). Explicit manifold representations for value-functions in reinforcement learning. In Proceedings of the 8th int. symp. on ai and mathematics.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Tsitsiklis, J., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674--690.
|
| |
17
|
Ziv, O., & Shimkin, N. (2005). Multigrid algorithms for temporal difference reinforcement learning. In Proc. icml workshop on rich representations for rl.
|
CITED BY 5
|
Ronald Parr , Christopher Painter-Wakefield , Lihong Li , Michael Littman, Analyzing feature generation for value-function approximation, Proceedings of the 24th international conference on Machine learning, p.737-744, June 20-24, 2007, Corvalis, Oregon
|
|
|
|
|
|
|
|
|
|
|
|