ABSTRACT
Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration. We take a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Our two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient point-based value iteration algorithm that exploits this simple parameterization.
- Boger, J., Poupart, P., Hoey, J., Boutilier, C., Fernie, G., & Mihailidis, A. (2005). A decision-theoretic approach to task assistance for persons with dementia. IJCAI (pp. 1293--1299). Google ScholarDigital Library
- Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. NIPS (pp. 1017--1023).Google Scholar
- Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. UAI (pp. 150--159). Google ScholarDigital Library
- Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. AAAI (pp. 761--768). Google ScholarDigital Library
- DeGroot, M. H. (1970). Optimal statistical decisions. New York: McGraw-Hill.Google Scholar
- Duff, M. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. Doctoral dissertation, University of Massassachusetts Amherst. Google ScholarDigital Library
- Duff, M. (2003). Design for an optimal probe. ICML (pp. 131--138).Google Scholar
- Kaelbling, L. P. (1993). Learning in embedded systems. MIT Press. Google ScholarCross Ref
- Meuleau, N., & Bourgine, P. (1999). Exploration of multi-state environments: local measures and back-propagation of uncertainty. Machine Learning, 35, 117--154. Google ScholarDigital Library
- Ng, A., Kim, H. J., Jordan, M., & Sastry, S. (2003). Autonomous helicopter flight via reinforcement learning. NIPS.Google Scholar
- Porta, J. M., Spaan, M. T., & Vlassis, N. (2005). Robot planning in partially observable continuous domains. Proc. Robotics: Science and Systems.Google ScholarCross Ref
- Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21, 1071--1088.Google ScholarDigital Library
- Spaan, M. T. J., & Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24, 195--220. Google ScholarCross Ref
- Strens, M. (2000). A Bayesian framework for reinforcement learning. ICML. Google ScholarDigital Library
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. Cambridge, MA: MIT Press. Google ScholarDigital Library
- Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38, 58--68. Google ScholarDigital Library
- Wang, T., Lizotte, D., Bowling, M., & Schuurmans, D. (2005). Bayesian sparse sampling for on-line reward optimization. ICML. Google ScholarDigital Library
Index Terms
- An analytic solution to discrete Bayesian reinforcement learning
Recommendations
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Discrete-to-deep reinforcement learning methods
AbstractNeural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. In complex problems, a neural RL approach is often able to learn a better solution than ...
Comments