skip to main content
10.1145/1160633.1160818acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
Article

On the relationship between MDPs and the BDI architecture

Published: 08 May 2006 Publication History

Abstract

In this paper we describe the initial results of an investigation into the relationship between Markov Decision Processes (MDPS) and Belief-Desire-Intention (BDI) architectures. While these approaches look rather different, and have at times been seen as alternatives, we show that they can be related to one another quite easily. In particular, we show how to map intentions in the BDI architecture to policies in an MDP and vice-versa. In both cases, we derive both theoretical and related algorithmic mappings. While the mappings that we obtain are of theoretical rather than practical value, we describe how they can be extended to provide mappings that are useful in practice.

References

[1]
D. Aberdeen. A (revised) survey of approximate methods for solving partially observable markov decision processes. Technical report, National ICT Australia, Canberra, Austalia, 2003.]]
[2]
C. Boutilier, R. Dearden, and M. Goldszmidt. Stochastic dynamic programming with factored representations. Artificial Intelligence, 121(1-2):49--107, 2000.]]
[3]
M. E. Bratman, D. Israel, and M. Pollack. Plans and resource-bounded practical reasoning. In R. Cummins and J. L. Pollock, editors, Philosophy and AI: Essays at the Interface, pages 1--22. The MIT Press, Cambridge, Massachusetts, 1991.]]
[4]
T. Dean, R. Givan, and S. Leach. Model reduction techniques for computing approximately optimal solutions for Markov decision processes. In Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence, pages 124--131, 1997.]]
[5]
M. P. Georgeff and F. F. Ingrand. Decision-making in embedded reasoning systems. In Proceedings of the 6th International Joint Conference on Artificial Intelligence, pages 972--978, 1989.]]
[6]
J. Goldsmith and R. H. Sloan. The complexity of model aggregation. In Artificial Intelligence Planning Systems, pages 122--129, 2000.]]
[7]
W. S. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28(1--4):47--66, Apr. 1991.]]
[8]
J. Pineau, G. Gordon, and S. Thrun. Policy-contingent abstraction for robust robot control. In Proceedings of the Conference on Uncertainty in AI, Acapulco, Mexico, 2003.]]
[9]
M. Pollack and M. Ringuette. Introducing the Tileworld: experimentally evaluating agent architectures. In Proceedings of the 8th National Conference on Artificial Intelligence, pages 183--189, 1990.]]
[10]
M. L. Puterman. Markov decision processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc., New York, 1994.]]
[11]
A. S. Rao and M. P. Georgeff. Deliberation and its role in the formation of intentions. In Proceedings of the 7th Annual Conference on Uncertainty in Artificial Intelligence, pages 300--307, 1991.]]
[12]
A. S. Rao and M. P. Georgeff. BDI-agents: from theory to practice. In Proceedings of the First International Conference on Multiagent Systems, 1995.]]
[13]
M. J. Schoppers. Universal plans for reactive robots in unpredictable environments. In Proceedings of the 10th International Joint Conference on Artificial Intelligence, pages 1039--1046, 1987.]]
[14]
M. Schut and M. Wooldridge. Intention reconsideration in complex environments. In Proceedings of the Fourth International Conference on Autonomous Agents, pages 209--216, 2000.]]
[15]
M. Schut and M. Wooldridge. Principles of intention reconsideration. In Proceedings of the Fifth International Conference on Autonomous Agents, pages 340--347, 2001.]]
[16]
M. Schut, M. Wooldridge, and S. Parsons. On partially observable MDPs and BDI models. Lecture Notes in Computer Science, 2403--243-??, 2002.]]
[17]
M. Schut, M. Wooldridge, and S. Parsons. The theory and practice of intention reconsideration. Journal of Theoretical and Experimental AI, 16(4):261--293, 2004.]]
[18]
M. C. Schut. Intention Reconsideration. PhD thesis, University of Liverpool, 2002.]]
[19]
G. I. Simari and S. Parsons. On approximating the best decision for and autonomous agent. In Proceedings of the Sixth Workshop on Game Theoretic and Decision Theoretic Agents, pages 91--100, 2004.]]
[20]
R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT Press, Cambridge, MA, 1998.]]
[21]
J. Tsitsiklis and B. van Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22(1/2/3):59--94, 1996.]]
[22]
C. J. C. H. Watkins. Learning with delayed rewards. PhD thesis, Cambridge University, 1989.]]
[23]
M. Wooldridge. Intelligent Agents. In G. Weiss, editor, Multiagent Systems - A Modern Approach to Distributed Artificial Intelligence, chapter 1, pages 27--78. The MIT Press, Cambridge, Massachussetts, 1999.]]
[24]
M. Wooldridge and S. Parsons. Intention reconsideration reconsidered. In J. Müller, M. P. Singh, and A. S. Rao, editors, Agent Theories, Architectures, and Languages V, pages 63--80. Springer-Verlag: Heidelberg, Germany, July 1999.]]

Cited By

View all
  • (2022)A Reinforcement Learning Integrating Distributed Caches for Contextual Road NavigationInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.30079213:1(1-19)Online publication date: 24-May-2022
  • (2022)Reinforcement Learning at the Cognitive Level in a Belief, Desire, Intention UAS AgentIntelligent Autonomous Systems 1610.1007/978-3-030-95892-3_35(458-470)Online publication date: 8-Apr-2022
  • (2018)A Formal Approach to Embedding First-Principles Planning in BDI Agent SystemsScalable Uncertainty Management10.1007/978-3-030-00461-3_23(333-347)Online publication date: 11-Sep-2018
  • Show More Cited By

Index Terms

  1. On the relationship between MDPs and the BDI architecture

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
    May 2006
    1631 pages
    ISBN:1595933034
    DOI:10.1145/1160633
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 May 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. intention
    2. markov decision process
    3. policy

    Qualifiers

    • Article

    Conference

    AAMAS06
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A Reinforcement Learning Integrating Distributed Caches for Contextual Road NavigationInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.30079213:1(1-19)Online publication date: 24-May-2022
    • (2022)Reinforcement Learning at the Cognitive Level in a Belief, Desire, Intention UAS AgentIntelligent Autonomous Systems 1610.1007/978-3-030-95892-3_35(458-470)Online publication date: 8-Apr-2022
    • (2018)A Formal Approach to Embedding First-Principles Planning in BDI Agent SystemsScalable Uncertainty Management10.1007/978-3-030-00461-3_23(333-347)Online publication date: 11-Sep-2018
    • (2017)Improving plan execution robustness through capability aware maintenance of plans by BDI agentsInternational Journal of Agent-Oriented Software Engineering10.1504/IJAOSE.2017.0876795:4(306-335)Online publication date: 1-Jan-2017
    • (2017)A hybrid POMDP-BDI agent architecture with online stochastic planning and plan cachingCognitive Systems Research10.1016/j.cogsys.2016.12.00243:C(1-20)Online publication date: 1-Jun-2017
    • (2016)Learning from situated experiences for a contextual planning guidanceJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-016-0342-y7:4(555-566)Online publication date: 25-Jan-2016
    • (2016)Probabilistic Planning in AgentSpeak Using the POMDP FrameworkCombinations of Intelligent Methods and Applications10.1007/978-3-319-26860-6_2(19-37)Online publication date: 28-Jan-2016
    • (2015)A Hybrid POMDP-BDI Agent Architecture with Online Stochastic Planning and Desires with Changing Intensity LevelsAgents and Artificial Intelligence10.1007/978-3-319-27947-3_1(3-19)Online publication date: 19-Dec-2015
    • (2015)CAMP-BDI: A Pre-emptive Approach for Plan Execution Robustness in Multiagent SystemsPRIMA 2015: Principles and Practice of Multi-Agent Systems10.1007/978-3-319-25524-8_5(65-84)Online publication date: 28-Nov-2015
    • (2013)Planning in BDI agents: a survey of the integration of planning algorithms and agent reasoningThe Knowledge Engineering Review10.1017/S026988891300033730:1(1-44)Online publication date: 4-Sep-2013
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media