ABSTRACT
We present a method for transforming the infinite interactive state space of interactive POMDPs (I-POMDPs) into a finite one, thereby enabling the computation of exact solutions. I-POMDPs allow sequential decision making in multi-agent environments by modeling other agents' beliefs, capabilities, and preferences as part of the interactive state space. Since beliefs are allowed to be arbitrarily nested and are continuous, it is not possible to compute optimal solutions using value iteration as in POMDPs. We present a method that transforms the original state space into a finite one by grouping the other agents' behaviorally equivalent models into equivalence classes. This enables us to compute the complete optimal solution for the I-POMDP, which may be represented as a policy graph. We illustrate our method using the multi-agent Tiger problem and discuss features of the solution.
- R. Becker, S. Zilberstein, and V. Lesser. Decentralized markov decision processes with event-driven interactions. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS-04), 2004. Google ScholarDigital Library
- D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of markov decision processes. In Mathematics of Operations Research, volume 27, No. 4, pages 819--840, 2002. Google ScholarDigital Library
- D. S. Bernstein, E. A. Hansen, and S. Zilberstein. Bounded policy iteration for decentralized pomdps. In Proceedings of the The Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05), 2005. Google ScholarDigital Library
- A. R. Cassandra, M. L. Littman, and N. L. Zhang. Incremental pruning: A simple, fast, exact method for partially observable markov decision processes. In Uncertainty in Artificial Intelligence, Rhode Island, Providence, 1997.Google Scholar
- P. Doshi and P. Gmytrasiewicz. Approximating state estimation in multiagent settings using particle filters. In Proceedings of the Fourth International Conference on Autonomous Agents and Multi Agent Systems (AAMAS-05), 2005. Google ScholarDigital Library
- P. Doshi and P. Gmytrasiewicz. A particle filtering based approach to approximating interactive pomdps. In Proceedings of the The Twentieth National Conference on Artificial Intelligence (AAAI-05), 2005. Google ScholarDigital Library
- R. Givan, T. Dean, and M. Greig. Equivalence notions and model minimization in markov decision processes. In Artificial Intelligence Journal, 2003. Google ScholarDigital Library
- P. J. Gmytrasiewicz and P. Doshi. A framework for sequential planning in multi-agent settings. In Journal of AI Research, 2005. Google ScholarDigital Library
- E. A. Hansen, D. S. Bernstein, and S. Zilberstein. Dynamic programming for partially observable stochastic games. In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI-04), 2004. Google ScholarDigital Library
- J. C. Harsanyi. Games with incomplete information played by 'bayesian' players. In Management Science, pages 14(3):159--182, Nov 2005. Google ScholarDigital Library
- L. Kaelbling, M. Littman, and A. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 2, 1998. Google ScholarDigital Library
- R. Nair, M. Tambe, M. Yokoo, D. Pynadath, and S. Marsella. Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In Proceedings of International Joint Conference in Artificial Intelligence (IJCAI), 2003. Google ScholarDigital Library
- J. M. Porta, M. T. Spaan, and N. Vlassis. Value iteration for continuous-state pomdps. In IAS Technical report IAS-UVA-04-04, 2004.Google Scholar
- P. Poupart and C. Boutilier. Value directed compression of pomdps. In Seventeenth Annual Conference on Neural Information Processing Systems, 2003.Google Scholar
- D. V. Pynadath and M. Tambe. Multiagent teamwork: Analyzing the optimality and complexity of key theories and models. In Joint Conference on Autonomous Agents and Multi-Agent Systems, 2002. Google ScholarDigital Library
- R. Smallwood and E. Sondik. The optimal control of partially observable markov decision processes over a finite horizon. Operations Research, 21:1071--1088, 1973.Google ScholarDigital Library
- S. Thrun. Monte carlo POMDPs. In S. Solla, T. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems 12, pages 1064--1070. MIT Press, 2000.Google Scholar
Recommendations
Graphical models for online solutions to interactive POMDPs
AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systemsWe develop a new graphical representation for interactive partially observable Markov decision processes (I-POMDPs) that is significantly more transparent and semantically clear than the previous representation. These graphical models called interactive ...
Apprenticeship learning via inverse reinforcement learning
ICML '04: Proceedings of the twenty-first international conference on Machine learningWe consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as ...
Comments