skip to main content
article
Free Access

Value Function Based Reinforcement Learning in Changing Markovian Environments

Published:01 June 2008Publication History
Skip Abstract Section

Abstract

The paper investigates the possibility of applying value function based reinforcement learning (RL) methods in cases when the environment may change over time. First, theorems are presented which show that the optimal value function of a discounted Markov decision process (MDP) Lipschitz continuously depends on the immediate-cost function and the transition-probability function. Dependence on the discount factor is also analyzed and shown to be non-Lipschitz. Afterwards, the concept of (ε,δ)-MDPs is introduced, which is a generalization of MDPs and ε-MDPs. In this model the environment may change over time, more precisely, the transition function and the cost function may vary from time to time, but the changes must be bounded in the limit. Then, learning algorithms in changing environments are analyzed. A general relaxed convergence theorem for stochastic iterative algorithms is presented. We also demonstrate the results through three classical RL methods: asynchronous value iteration, Q-learning and temporal difference learning. Finally, some numerical experiments concerning changing environments are presented.

References

  1. D. P. Bertsekas. Dynamic Programming and Optimal Control, volume 2. Athena Scientific, Belmont, Massachusetts, 3rd edition, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Cs. Csáji. Adaptive Resource Control: Machine Learning Approaches to Resource Allocation in Uncertain and Changing Environments. PhD thesis, Faculty of Informatics, Eötvös Loránd University, Budapest, 2008.Google ScholarGoogle Scholar
  4. B. Cs. Csáji and L. Monostori. Adaptive sampling based large-scale stochastic resource control. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), July 16-20, Boston, Massachusetts, pages 815-820, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Montes de Oca, A. Sakhanenko, and F. Salem. Estimates for perturbations of general discounted Markov control chains. Applied Mathematics, 30:287-304, 2003.Google ScholarGoogle Scholar
  6. E. Even-Dar and Y. Mansour. Learning rates for Q-learning. Journal of Machine Learning Research (JMLR), 5:1-25, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Favero and W. J. Runggaldier. A robustness result for stochastic control. Systems and Control Letters, 46:91-66, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  8. E. A. Feinberg and A. Shwartz, editors. Handbook of Markov Decision Processes: Methods and Applications. Kluwer Academic Publishers, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  9. E. Gordienko and F. S. Salem. Estimates of stability of Markov control processes with unbounded cost. Kybernetika, 36:195-210, 2000.Google ScholarGoogle Scholar
  10. Zs. Kalmár, Cs. Szepesvári, and A. Lorincz. Module-based reinforcement learning: Experiments with a real robot. Machine Learning, 31:55-85, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning , 49:209-232, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Müller. How does the solution of a Markov decision process depend on the transition probabilities? Technical report, Institute for Economic Theory and Operations Research, University of Karlsruhe, 1996.Google ScholarGoogle Scholar
  13. R. Munos and A. W. Moore. Rates of convergence for variable resolution schemes in optimal control. In Proceedings of the 17th International Conference on Machine Learning (ICML), pages 647-654. Morgan Kaufmann, San Francisco, CA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Pinedo. Scheduling: Theory, Algorithms, and Systems. Prentice-Hall, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Singh and D. Bertsekas. Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems, volume 9, pages 974-980. The MIT Press, 1997.Google ScholarGoogle Scholar
  16. R. S. Sutton and A. G. Barto. Reinforcement Learning. The MIT Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 12: 1057-1063, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cs. Szepesvári and M. L. Littman. A unified analysis of value-function-based reinforcement learning algorithms. Neural Computation, 11(8):2017-2060, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. I. Szita, B. Takács, and A. Lörincz. ¿-MDPs: Learning in varying environments. Journal of Machine Learning Research (JMLR), 3:145-174, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Van Roy, D. Bertsekas, Y. Lee, and J. Tsitsiklis. A neuro-dynamic programming approach to retailer inventory management. Technical report, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA., 1996.Google ScholarGoogle Scholar

Index Terms

  1. Value Function Based Reinforcement Learning in Changing Markovian Environments
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image The Journal of Machine Learning Research
          The Journal of Machine Learning Research  Volume 9, Issue
          6/1/2008
          1964 pages
          ISSN:1532-4435
          EISSN:1533-7928
          Issue’s Table of Contents

          Publisher

          JMLR.org

          Publication History

          • Published: 1 June 2008
          Published in jmlr Volume 9, Issue

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader