research-article

Public Access

Watch and learn: optimizing from revealed preferences feedback

Authors:

Jonathan Ullman,

Zhiwei Steven WuAuthors Info & Claims

STOC '16: Proceedings of the forty-eighth annual ACM symposium on Theory of Computing

Pages 949 - 962

https://doi.org/10.1145/2897518.2897579

Published: 19 June 2016 Publication History

Abstract

A Stackelberg game is played between a leader and a follower. The leader first chooses an action, then the follower plays his best response. The goal of the leader is to pick the action that will maximize his payoff given the follower’s best response. In this paper we present an approach to solving for the leader’s optimal strategy in certain Stackelberg games where the follower’s utility function (and thus the subsequent best response of the follower) is unknown. Stackelberg games capture, for example, the following interaction between a producer and a consumer. The producer chooses the prices of the goods he produces, and then a consumer chooses to buy a utility maximizing bundle of goods. The goal of the seller here is to set prices to maximize his profit—his revenue, minus the production cost of the purchased bundle. It is quite natural that the seller in this example should not know the buyer’s utility function. However, he does have access to revealed preference feedback---he can set prices, and then observe the purchased bundle and his own profit. We give algorithms for efficiently solving, in terms of both computational and query complexity, a broad class of Stackelberg games in which the follower’s utility function is unknown, using only “revealed preference” access to it. This class includes in particular the profit maximization problem, as well as the optimal tolling problem in nonatomic congestion games, when the latency functions are unknown. Surprisingly, we are able to solve these problems even though the optimization problems are non-convex in the leader’s actions.

References

[1]

K. Amin, R. Cummings, L. Dworkin, M. Kearns, and A. Roth. Online learning and profit maximization from revealed preferences. In Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), 2015.

Digital Library

[2]

M. Babaioff, S. Dughmi, R. Kleinberg, and A. Slivkins. Dynamic pricing with limited supply. ACM Transactions on Economics and Computation, 3(1):4, 2015.

Digital Library

[3]

M.-F. Balcan, A. Blum, J. D. Hartline, and Y. Mansour. Reducing mechanism design to algorithm design via machine learning. Journal of Computer and System Sciences, 74(8):1245–1270, 2008.

Digital Library

[4]

M.-F. Balcan, A. Daniely, R. Mehta, R. Urner, and V. V. Vazirani. Learning economic parameters from revealed preferences. In Web and Internet Economics, pages 338–353. Springer, 2014.

[5]

E. Beigman and R. Vohra. Learning from revealed preference. In Proceedings of the 7th ACM Conference on Electronic Commerce, pages 36–42. ACM, 2006.

Digital Library

[6]

A. Belloni, T. Liang, H. Narayanan, and A. Rakhlin. Escaping the local minima via simulated annealing: Optimization of approximately convex functions. In Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, pages 240–265, 2015.

[7]

U. Bhaskar, K. Ligett, L. J. Schulman, and C. Swamy. Achieving target equilibria in network routing games without knowing the latency functions. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 31–40. IEEE Computer Society, 2014.

Digital Library

[8]

A. Blum, N. Haghtalab, and A. D. Procaccia. Learning optimal commitment to overcome insecurity. In Advances in Neural Information Processing Systems, pages 1826–1834, 2014.

[9]

A. Blum, Y. Mansour, and J. Morgenstern. Learning valuation distributions from partial observation. In Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), 2015.

Digital Library

[10]

L. Blumrosen and N. Nisan. On the computational power of demand queries. SIAM Journal on Computing, 39(4):1372–1391, 2009.

[11]

S. Chawla, J. D. Hartline, and D. Nekipelov. Mechanism design for data science. In M. Babaioff, V. Conitzer, and D. Easley, editors, ACM Conference on Economics and Computation, EC ’14, Stanford, CA, USA, June 8-12, 2014, pages 711–712. ACM, 2014.

Digital Library

[12]

R. Cole and T. Roughgarden. The sample complexity of revenue maximization. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pages 243–252. ACM, 2014.

Digital Library

[13]

B. Colson, P. Marcotte, and G. Savard. Bilevel programming: A survey. 4OR, 3(2):87–107, 2005.

[14]

S. Dughmi, L. Han, and N. Nisan. Sampling and representation complexity of revenue maximization. In Web and Internet Economics, pages 277–291. Springer, 2014.

[15]

Z. Huang, Y. Mansour, and T. Roughgarden. Making the most of your samples. arXiv preprint arXiv:1407.2479, 2014.

[16]

D. Korzhyk, V. Conitzer, and R. Parr. Complexity of computing optimal stackelberg strategies in security resource allocation games. In Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10), 2010.

Digital Library

[17]

J. Letchford, V. Conitzer, and K. Munagala. Learning and approximating the optimal strategy to commit to. In Algorithmic Game Theory, pages 250–262. Springer, 2009.

Digital Library

[18]

L. Lovász and S. Vempala. Simulated annealing in convex bodies and an O*(n4) volume algorithm. J. Comput. Syst. Sci., 72(2):392–417, 2006.

Digital Library

[19]

D. Monderer and L. Shapley. Potential games. Games and Economic Behavior, 14(1):124–143, 1996.

[20]

J. Morgenstern and T. Roughgarden. The pseudo-dimension of near-optimal auctions. arXiv preprint arXiv:1506.03684, 2015.

[21]

A. Roth, J. Ullman, and Z. S. Wu. Watch and learn: Optimizing from revealed preferences feedback. CoRR, abs/1504.01033, 2015.

[22]

M. Sion. On general minimax theorems. Pacific J. Math., 8(1):171–176, 1958.

[23]

M. Zadimoghaddam and A. Roth. Efficiently learning from revealed preference. In P. W. Goldberg, editor, Internet and Network Economics - 8th International Workshop, WINE 2012, Liverpool, UK, December 10-12, 2012. Proceedings, volume 7695 of Lecture Notes in Computer Science, pages 114–127. Springer, 2012.

Digital Library

[24]

A Routing Game Where Social Cost is Not Convex in The Tolls

Cited By

Yu YChen HKiyavash NMooij J(2024)Decentralized online learning in general-sum stackelberg gamesProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702866(4056-4077)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702866
Han MZhang FChen YKiyavash NMooij J(2024)No-regret learning of nash equilibrium for black-box games via Gaussian processesProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702748(1541-1557)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702748
Chen YTang WHo CLiu YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Performative prediction with bandit feedbackProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692354(7298-7324)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692354
Show More Cited By

Index Terms

Watch and learn: optimizing from revealed preferences feedback
1. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Multidimensional Dynamic Pricing for Welfare Maximization
Special Issue on EC'17

We study the problem of a seller dynamically pricing d distinct types of indivisible goods, when faced with the online arrival of unit-demand buyers drawn independently from an unknown distribution. The goods are not in limited supply, but can only be ...
Multidimensional Dynamic Pricing for Welfare Maximization
EC '17: Proceedings of the 2017 ACM Conference on Economics and Computation

We study the problem of a seller dynamically pricing d distinct types of indivisible goods, when faced with the online arrival of unit-demand buyers drawn independently from an unknown distribution. The goods are not in limited supply, but can only be ...
Watch and learn: optimizing from revealed preferences feedback

A Stackelberg game is played between a leader and a follower. The leader first chooses an action, and then the follower plays his best response, and the goal of the leader is to pick the action that will maximize his payoff given the follower's best ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

STOC '16: Proceedings of the forty-eighth annual ACM symposium on Theory of Computing

June 2016

1141 pages

ISBN:9781450341325

DOI:10.1145/2897518

General Chair:
Daniel Wichs
Northeastern, USA
,
Program Chair:
Yishay Mansour
Tel Aviv

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Google
Simons
NSF

Conference

STOC '16

Sponsor:

SIGACT

STOC '16: Symposium on Theory of Computing

June 19 - 21, 2016

MA, Cambridge, USA

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25

Sponsor:
sigact

57th Annual ACM Symposium on Theory of Computing (STOC 2025)

June 23 - 27, 2025

Prague , Czech Republic

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
684
Total Downloads

Downloads (Last 12 months)156
Downloads (Last 6 weeks)23

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yu YChen HKiyavash NMooij J(2024)Decentralized online learning in general-sum stackelberg gamesProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702866(4056-4077)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702866
Han MZhang FChen YKiyavash NMooij J(2024)No-regret learning of nash equilibrium for black-box games via Gaussian processesProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702748(1541-1557)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702748
Chen YTang WHo CLiu YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Performative prediction with bandit feedbackProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692354(7298-7324)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692354
He SLi MZhang R(2024)Signaling Security Games with Attack Planner DeceptionMathematics10.3390/math1216253212:16(2532)Online publication date: 16-Aug-2024
https://doi.org/10.3390/math12162532
Collina NRoth AShao HKleinberg RSaban DBergemann D(2024)Efficient Prior-Free Mechanisms for No-Regret AgentsProceedings of the 25th ACM Conference on Economics and Computation10.1145/3670865.3673460(511-541)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3670865.3673460
Rokhlin D(2024)On the Dual Gradient Descent Method for the Resource Allocation Problem in Multiagent SystemsJournal of Applied and Industrial Mathematics10.1134/S199047892402013318:2(316-332)Online publication date: 15-Aug-2024
https://doi.org/10.1134/S1990478924020133
Chen YDeng XLi Y(2024)Optimal Private Payoff Manipulation against Commitment in Extensive-form GamesGames and Economic Behavior10.1016/j.geb.2024.11.008Online publication date: Nov-2024
https://doi.org/10.1016/j.geb.2024.11.008
Candon KChen JKim YHsu ZTsoi NVázquez MAgmon NAn BRicci AYeoh W(2023)Nonverbal Human Signals Can Help Autonomous Agents Infer Human Preferences for Their BehaviorProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598652(307-316)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.5555/3545946.3598652
Pattanayak KKrishnamurthy VBerry C(2023)Metacognitive Radar: Masking Cognition From an Inverse Reinforcement LearnerIEEE Transactions on Aerospace and Electronic Systems10.1109/TAES.2023.331440659:6(8826-8844)Online publication date: Dec-2023
https://doi.org/10.1109/TAES.2023.3314406
Jalota DSun HAzizan N(2023)Online Learning for Equilibrium Pricing in Markets under Incomplete Information2023 62nd IEEE Conference on Decision and Control (CDC)10.1109/CDC49753.2023.10383748(4996-5001)Online publication date: 13-Dec-2023
https://doi.org/10.1109/CDC49753.2023.10383748
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten