Article

An analytic solution to discrete Bayesian reinforcement learning

Authors:
Pascal Poupart

University of Waterloo, Waterloo, Ontario, Canada

University of Waterloo, Waterloo, Ontario, Canada
View Profile

,
Nikos Vlassis

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

,
Jesse Hoey

University of Toronto, Toronto, Ontario, Canada

University of Toronto, Toronto, Ontario, Canada
View Profile

,
Kevin Regan

University of Waterloo, Waterloo, Ontario, Canada

University of Waterloo, Waterloo, Ontario, Canada
View Profile

ICML '06: Proceedings of the 23rd international conference on Machine learningJune 2006Pages 697–704https://doi.org/10.1145/1143844.1143932

Published:25 June 2006Publication History

ICML '06: Proceedings of the 23rd international conference on Machine learning

Pages 697–704

ABSTRACT

Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration. We take a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Our two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient point-based value iteration algorithm that exploits this simple parameterization.

References

Boger, J., Poupart, P., Hoey, J., Boutilier, C., Fernie, G., & Mihailidis, A. (2005). A decision-theoretic approach to task assistance for persons with dementia. IJCAI (pp. 1293--1299). Google ScholarDigital Library
Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. NIPS (pp. 1017--1023).Google Scholar
Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. UAI (pp. 150--159). Google ScholarDigital Library
Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. AAAI (pp. 761--768). Google ScholarDigital Library
DeGroot, M. H. (1970). Optimal statistical decisions. New York: McGraw-Hill.Google Scholar
Duff, M. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. Doctoral dissertation, University of Massassachusetts Amherst. Google ScholarDigital Library
Duff, M. (2003). Design for an optimal probe. ICML (pp. 131--138).Google Scholar
Kaelbling, L. P. (1993). Learning in embedded systems. MIT Press. Google ScholarCross Ref
Meuleau, N., & Bourgine, P. (1999). Exploration of multi-state environments: local measures and back-propagation of uncertainty. Machine Learning, 35, 117--154. Google ScholarDigital Library
Ng, A., Kim, H. J., Jordan, M., & Sastry, S. (2003). Autonomous helicopter flight via reinforcement learning. NIPS.Google Scholar
Porta, J. M., Spaan, M. T., & Vlassis, N. (2005). Robot planning in partially observable continuous domains. Proc. Robotics: Science and Systems.Google ScholarCross Ref
Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21, 1071--1088.Google ScholarDigital Library
Spaan, M. T. J., & Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24, 195--220. Google ScholarCross Ref
Strens, M. (2000). A Bayesian framework for reinforcement learning. ICML. Google ScholarDigital Library
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. Cambridge, MA: MIT Press. Google ScholarDigital Library
Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38, 58--68. Google ScholarDigital Library
Wang, T., Lizotte, D., Bowling, M., & Schuurmans, D. (2005). Bayesian sparse sampling for on-line reward optimization. ICML. Google ScholarDigital Library

Index Terms

An analytic solution to discrete Bayesian reinforcement learning

Recommendations

Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Read More
Discrete-to-deep reinforcement learning methods
Abstract
Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. In complex problems, a neural RL approach is often able to learn a better solution than ...
Read More
Reinforcement Learning: With Open AI, TensorFlow and Keras Using Python
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Program Chairs:
William Cohen,
Andrew Moore
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 June 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 134
  Total Citations
  View Citations
- 1,610
  Total Downloads
- Downloads (Last 12 months)96
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An analytic solution to discrete Bayesian reinforcement learning

ICML '06: Proceedings of the 23rd international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reward Shaping in Episodic Reinforcement Learning

Discrete-to-deep reinforcement learning methods

Reinforcement Learning: With Open AI, TensorFlow and Keras Using Python

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An analytic solution to discrete Bayesian reinforcement learning

ICML '06: Proceedings of the 23rd international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reward Shaping in Episodic Reinforcement Learning

Discrete-to-deep reinforcement learning methods

Reinforcement Learning: With Open AI, TensorFlow and Keras Using Python

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media