research-article

Learning all optimal policies with multiple criteria

Authors:

Leon Barrett,

Srini NarayananAuthors Info & Claims

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 41 - 47

https://doi.org/10.1145/1390156.1390162

Published: 05 July 2008 Publication History

Get Access

Abstract

We describe an algorithm for learning in the presence of multiple criteria. Our technique generalizes previous approaches in that it can learn optimal policies for all linear preference assignments over the multiple reward criteria at once. The algorithm can be viewed as an extension to standard reinforcement learning for MDPs where instead of repeatedly backing up maximal expected rewards, we back up the set of expected rewards that are maximal for some set of linear preferences (given by a weight vector, w). We present the algorithm along with a proof of correctness showing that our solution gives the optimal policy for any linear preference function. The solution reduces to the standard value iteration algorithm for a specific weight vector, w.

References

[1]

Abeel, P., & Ng, A. (2004). Apprentice learning via inverse reinforcement learning. Proc. ICML-04.

Digital Library

Google Scholar

[2]

Ainslie, G. (2001). Breakdown of will. Cambridge, Massachusetts: Cambridge University Press.

Google Scholar

[3]

Bellman, R. E. (1957). Dynamic programming. Princeton: Princeton University Press.

Digital Library

Google Scholar

[4]

Clarkson, K. L., & Shor, P. W. (1989). Applications of random sampling in computational geometry, II. Discrete and Computational Geometry, 4, 387--421.

Digital Library

Google Scholar

[5]

Feinberg, E., & Schwartz, A. (1995). Constrained markov decision models with weighted discounted rewards. Mathematics of Operations Research, 20, 302--320.

Digital Library

Google Scholar

[6]

Gabor, Z., Kalmar, Z., & Szepesvari, C. (1998). Multi-criteria reinforcement learning. Proc. ICML-98.

Digital Library

Google Scholar

[7]

Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence.

Digital Library

Google Scholar

[8]

Mannor, S., & Shimkin, N. (2004). A geometric approach to multi-criterion reinforcement learning. Journal of Machine Learning Research, 325--360.

Digital Library

Google Scholar

[9]

Natarajan, S., & Tadepalli, P. (2005). Dynamic preferences in mult-criteria reinforcement learning. Proc. ICML-05. Bonn, Germany.

Digital Library

Google Scholar

[10]

Ng, A., & Russell, S. (2000). Algorithms for inverse reinforcement learning. Proc. ICML-00.

Digital Library

Google Scholar

[11]

Russell, S., & Zimdars, A. (2003). Q-decomposition for reinforcement learning agents. Proc. ICML-03. Washington, DC.

Google Scholar

[12]

Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, Massachusetts: The MIT Press.

Digital Library

Google Scholar

Cited By

View all

Zhou THairi FYang HLiu JTong TYang FMomma MGao YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Finite-time convergence and sample complexity of actor-critic multi-objective reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694632(61913-61933)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694632
Yang RPan XLuo FQiu SZhong HYu DChen JSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Rewards-in-contextProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694392(56276-56297)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694392
Ikenaga AArai S(2024)Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-MakingJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2024.p039328:2(393-402)Online publication date: 20-Mar-2024
https://doi.org/10.20965/jaciii.2024.p0393
Show More Cited By

Index Terms

Learning all optimal policies with multiple criteria

Recommendations

Sufficiency of Deterministic Policies for Atomless Discounted and Uniformly Absorbing MDPs with Multiple Criteria

This paper studies Markov decision processes (MDPs) with atomless initial state distributions and atomless transition probabilities. Such MDPs are called atomless. The initial state distribution is considered to be fixed. We show that for discounted ...
An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes

This paper concerns discrete-time, finite state multichain MDPs with compact action sets. The optimality criterion is long-run average cost. Simple examples illustrate that optimal stationary Markov policies do not always exist. We establish the ...
Uniqueness and Stability of Optimal Policies of Finite State Markov Decision Processes

In this paper we consider infinite horizon discrete-time optimal control of Markov decision processes (MDPs) with finite state spaces and compact action sets. We restrict attention to unicost MDPs, which form a class that contains all the weakly ...

Comments

Information & Contributors

Information

Published In

ICML '08: Proceedings of the 25th international conference on Machine learning

July 2008

1310 pages

ISBN:9781605582054

DOI:10.1145/1390156

General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICML '08

Sponsor:

Microsoft Research
Intel
IBM

ICML '08: The 25th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

July 5 - 9, 2008

Helsinki, Finland

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

100
Total Citations
View Citations
767
Total Downloads

Downloads (Last 12 months)116
Downloads (Last 6 weeks)12

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhou THairi FYang HLiu JTong TYang FMomma MGao YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Finite-time convergence and sample complexity of actor-critic multi-objective reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694632(61913-61933)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694632
Yang RPan XLuo FQiu SZhong HYu DChen JSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Rewards-in-contextProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694392(56276-56297)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694392
Ikenaga AArai S(2024)Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-MakingJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2024.p039328:2(393-402)Online publication date: 20-Mar-2024
https://doi.org/10.20965/jaciii.2024.p0393
Ma XXie YChigan C(2024)Graph Convolutional Network Based Multi-Objective Meta-Deep Q-Learning for Eco-RoutingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.334803425:7(7323-7338)Online publication date: Jul-2024
https://doi.org/10.1109/TITS.2023.3348034
Savas YVerginis CHibbard MTopcu U(2024)On Minimizing Total Discounted Cost in MDPs Subject to Reachability ConstraintsIEEE Transactions on Automatic Control10.1109/TAC.2024.338483469:9(6466-6473)Online publication date: Sep-2024
https://doi.org/10.1109/TAC.2024.3384834
Shirali ASchubert AAlaa A(2024)Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical CareIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2024.341511528:10(6268-6279)Online publication date: Oct-2024
https://doi.org/10.1109/JBHI.2024.3415115
Shi DZhu SWeinkauf TOulasvirta A(2024)Interactive Reward Tuning: Interactive Visualization for Preference Elicitation2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS58592.2024.10801540(9254-9261)Online publication date: 14-Oct-2024
https://doi.org/10.1109/IROS58592.2024.10801540
Xiao QWu CYang ZGuo YGuo H(2024)Generalizing knowledge enabled fast-adaptive optimization for advanced machining systems2024 IEEE 20th International Conference on Automation Science and Engineering (CASE)10.1109/CASE59546.2024.10711708(1650-1655)Online publication date: 28-Aug-2024
https://doi.org/10.1109/CASE59546.2024.10711708
Ravari FJalili S(2024)Reward Shaping in Reinforcement Learning of Multi-Objective Safety Critical Systems2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP)10.1109/AISP61396.2024.10475272(1-6)Online publication date: 21-Feb-2024
https://doi.org/10.1109/AISP61396.2024.10475272
Song YWang L(2024)Multiobjective tree-based reinforcement learning for estimating tolerant dynamic treatment regimesBiometrics10.1093/biomtc/ujad01780:1Online publication date: 14-Feb-2024
https://doi.org/10.1093/biomtc/ujad017
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Sufficiency of Deterministic Policies for Atomless Discounted and Uniformly Absorbing MDPs with Multiple Criteria

An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes

Uniqueness and Stability of Optimal Policies of Finite State Markov Decision Processes

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Sufficiency of Deterministic Policies for Atomless Discounted and Uniformly Absorbing MDPs with Multiple Criteria

An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes

Uniqueness and Stability of Optimal Policies of Finite State Markov Decision Processes

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations