Article

Exact solutions of interactive POMDPs using behavioral equivalence

Authors:
Bharanee Rathnasabapathy

University of Illinois at Chicago

University of Illinois at Chicago
View Profile

,
Prashant Doshi

University of Georgia

University of Georgia
View Profile

,
Piotr Gmytrasiewicz

University of Illinois at Chicago

University of Illinois at Chicago
View Profile

AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systemsMay 2006Pages 1025–1032https://doi.org/10.1145/1160633.1160816

Published:08 May 2006Publication History

AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems

Pages 1025–1032

ABSTRACT

We present a method for transforming the infinite interactive state space of interactive POMDPs (I-POMDPs) into a finite one, thereby enabling the computation of exact solutions. I-POMDPs allow sequential decision making in multi-agent environments by modeling other agents' beliefs, capabilities, and preferences as part of the interactive state space. Since beliefs are allowed to be arbitrarily nested and are continuous, it is not possible to compute optimal solutions using value iteration as in POMDPs. We present a method that transforms the original state space into a finite one by grouping the other agents' behaviorally equivalent models into equivalence classes. This enables us to compute the complete optimal solution for the I-POMDP, which may be represented as a policy graph. We illustrate our method using the multi-agent Tiger problem and discuss features of the solution.

References

R. Becker, S. Zilberstein, and V. Lesser. Decentralized markov decision processes with event-driven interactions. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS-04), 2004. Google ScholarDigital Library
D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of markov decision processes. In Mathematics of Operations Research, volume 27, No. 4, pages 819--840, 2002. Google ScholarDigital Library
D. S. Bernstein, E. A. Hansen, and S. Zilberstein. Bounded policy iteration for decentralized pomdps. In Proceedings of the The Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05), 2005. Google ScholarDigital Library
A. R. Cassandra, M. L. Littman, and N. L. Zhang. Incremental pruning: A simple, fast, exact method for partially observable markov decision processes. In Uncertainty in Artificial Intelligence, Rhode Island, Providence, 1997.Google Scholar
P. Doshi and P. Gmytrasiewicz. Approximating state estimation in multiagent settings using particle filters. In Proceedings of the Fourth International Conference on Autonomous Agents and Multi Agent Systems (AAMAS-05), 2005. Google ScholarDigital Library
P. Doshi and P. Gmytrasiewicz. A particle filtering based approach to approximating interactive pomdps. In Proceedings of the The Twentieth National Conference on Artificial Intelligence (AAAI-05), 2005. Google ScholarDigital Library
R. Givan, T. Dean, and M. Greig. Equivalence notions and model minimization in markov decision processes. In Artificial Intelligence Journal, 2003. Google ScholarDigital Library
P. J. Gmytrasiewicz and P. Doshi. A framework for sequential planning in multi-agent settings. In Journal of AI Research, 2005. Google ScholarDigital Library
E. A. Hansen, D. S. Bernstein, and S. Zilberstein. Dynamic programming for partially observable stochastic games. In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI-04), 2004. Google ScholarDigital Library
J. C. Harsanyi. Games with incomplete information played by 'bayesian' players. In Management Science, pages 14(3):159--182, Nov 2005. Google ScholarDigital Library
L. Kaelbling, M. Littman, and A. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 2, 1998. Google ScholarDigital Library
R. Nair, M. Tambe, M. Yokoo, D. Pynadath, and S. Marsella. Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In Proceedings of International Joint Conference in Artificial Intelligence (IJCAI), 2003. Google ScholarDigital Library
J. M. Porta, M. T. Spaan, and N. Vlassis. Value iteration for continuous-state pomdps. In IAS Technical report IAS-UVA-04-04, 2004.Google Scholar
P. Poupart and C. Boutilier. Value directed compression of pomdps. In Seventeenth Annual Conference on Neural Information Processing Systems, 2003.Google Scholar
D. V. Pynadath and M. Tambe. Multiagent teamwork: Analyzing the optimality and complexity of key theories and models. In Joint Conference on Autonomous Agents and Multi-Agent Systems, 2002. Google ScholarDigital Library
R. Smallwood and E. Sondik. The optimal control of partially observable markov decision processes over a finite horizon. Operations Research, 21:1071--1088, 1973.Google ScholarDigital Library
S. Thrun. Monte carlo POMDPs. In S. Solla, T. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems 12, pages 1064--1070. MIT Press, 2000.Google Scholar

Recommendations

Graphical models for online solutions to interactive POMDPs
AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems

We develop a new graphical representation for interactive partially observable Markov decision processes (I-POMDPs) that is significantly more transparent and semantically clear than the previous representation. These graphical models called interactive ...
Read More
Apprenticeship learning via inverse reinforcement learning
ICML '04: Proceedings of the twenty-first international conference on Machine learning

We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
May 2006
1631 pages
ISBN:1595933034
DOI:10.1145/1160633
General Chairs:
Hideyuki Nakashima
Future University - Hakodate, Japan
,
Michael Wellman
University of Michigan
,
Program Chairs:
Gerhard Weiss
Technical University Munich, Germany
,
Peter Stone
The University of Texas at Austin
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 May 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,155of5,036submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 359
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exact solutions of interactive POMDPs using behavioral equivalence

AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems

ABSTRACT

References

Cited By

Recommendations

Graphical models for online solutions to interactive POMDPs

Apprenticeship learning via inverse reinforcement learning