| Learning algorithms for online principal-agent problems (and selling goods online) |
| Full text |
Pdf
(172 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 148
archive
Proceedings of the 23rd international conference on Machine learning
table of contents
Pittsburgh, Pennsylvania
Pages: 209 - 216
Year of Publication: 2006
ISBN:1-59593-383-2
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 23, Citation Count: 1
|
|
|
ABSTRACT
In a principal-agent problem, a principal seeks to motivate an agent to take a certain action beneficial to the principal, while spending as little as possible on the reward. This is complicated by the fact that the principal does not know the agent's utility function (or type). We study the online setting where at each round, the principal encounters a new agent, and the principal sets the rewards anew. At the end of each round, the principal only finds out the action that the agent took, but not his type. The principal must learn how to set the rewards optimally. We show that this setting generalizes the setting of selling a digital good online.We study and experimentally compare three main approaches to this problem. First, we show how to apply a standard bandit algorithm to this setting. Second, for the case where the distribution of agent types is fixed (but unknown to the principal), we introduce a new gradient ascent algorithm. Third, for the case where the distribution of agents' types is fixed, and the principal has a prior belief (distribution) over a limited class of type distributions, we study a Bayesian approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Babaioff, M., Lavi, R., & Pavlov, E. (2005). Mechanism design for single-value domains. AAAI (pp. 241--247).
|
| |
3
|
Bahar, G., & Tennenholtz, M. (2005). Sequential-simultaneous information elicitation in multi-agent systems. IJCAI (pp. 923--928).
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
Blumberg, A., & Shelat, A. (2004). Searching for stable mechanisms: Automated design for imperfect players. AAAI (pp. 8--13).
|
 |
8
|
Nicolò Cesa-Bianchi , Yoav Freund , David Haussler , David P. Helmbold , Robert E. Schapire , Manfred K. Warmuth, How to use expert advice, Journal of the ACM (JACM), v.44 n.3, p.427-485, May 1997
[doi> 10.1145/258128.258179]
|
 |
9
|
|
| |
10
|
de Farias, D. P., & Megiddo, N. (2003). How to combine expert (or novice) advice when actions impact the environment? NIPS.
|
| |
11
|
|
| |
12
|
Mas-Colell, A., Whinston, M., & Green, J. R. (1995). Microeconomic theory. Oxford University Press.
|
| |
13
|
Parkes, D., & Schoenebeck, G. (2004). GROWRANGE: Anytime VCG-based mechanisms. AAAI (pp. 34--41).
|
 |
14
|
|
| |
15
|
|
| |
16
|
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. ICML (pp. 928--936).
|
|