skip to main content
10.1145/2897518.2897579acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Public Access

Watch and learn: optimizing from revealed preferences feedback

Published: 19 June 2016 Publication History

Abstract

A Stackelberg game is played between a leader and a follower. The leader first chooses an action, then the follower plays his best response. The goal of the leader is to pick the action that will maximize his payoff given the follower’s best response. In this paper we present an approach to solving for the leader’s optimal strategy in certain Stackelberg games where the follower’s utility function (and thus the subsequent best response of the follower) is unknown. Stackelberg games capture, for example, the following interaction between a producer and a consumer. The producer chooses the prices of the goods he produces, and then a consumer chooses to buy a utility maximizing bundle of goods. The goal of the seller here is to set prices to maximize his profit—his revenue, minus the production cost of the purchased bundle. It is quite natural that the seller in this example should not know the buyer’s utility function. However, he does have access to revealed preference feedback---he can set prices, and then observe the purchased bundle and his own profit. We give algorithms for efficiently solving, in terms of both computational and query complexity, a broad class of Stackelberg games in which the follower’s utility function is unknown, using only “revealed preference” access to it. This class includes in particular the profit maximization problem, as well as the optimal tolling problem in nonatomic congestion games, when the latency functions are unknown. Surprisingly, we are able to solve these problems even though the optimization problems are non-convex in the leader’s actions.

References

[1]
K. Amin, R. Cummings, L. Dworkin, M. Kearns, and A. Roth. Online learning and profit maximization from revealed preferences. In Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), 2015.
[2]
M. Babaioff, S. Dughmi, R. Kleinberg, and A. Slivkins. Dynamic pricing with limited supply. ACM Transactions on Economics and Computation, 3(1):4, 2015.
[3]
M.-F. Balcan, A. Blum, J. D. Hartline, and Y. Mansour. Reducing mechanism design to algorithm design via machine learning. Journal of Computer and System Sciences, 74(8):1245–1270, 2008.
[4]
M.-F. Balcan, A. Daniely, R. Mehta, R. Urner, and V. V. Vazirani. Learning economic parameters from revealed preferences. In Web and Internet Economics, pages 338–353. Springer, 2014.
[5]
E. Beigman and R. Vohra. Learning from revealed preference. In Proceedings of the 7th ACM Conference on Electronic Commerce, pages 36–42. ACM, 2006.
[6]
A. Belloni, T. Liang, H. Narayanan, and A. Rakhlin. Escaping the local minima via simulated annealing: Optimization of approximately convex functions. In Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, pages 240–265, 2015.
[7]
U. Bhaskar, K. Ligett, L. J. Schulman, and C. Swamy. Achieving target equilibria in network routing games without knowing the latency functions. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 31–40. IEEE Computer Society, 2014.
[8]
A. Blum, N. Haghtalab, and A. D. Procaccia. Learning optimal commitment to overcome insecurity. In Advances in Neural Information Processing Systems, pages 1826–1834, 2014.
[9]
A. Blum, Y. Mansour, and J. Morgenstern. Learning valuation distributions from partial observation. In Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), 2015.
[10]
L. Blumrosen and N. Nisan. On the computational power of demand queries. SIAM Journal on Computing, 39(4):1372–1391, 2009.
[11]
S. Chawla, J. D. Hartline, and D. Nekipelov. Mechanism design for data science. In M. Babaioff, V. Conitzer, and D. Easley, editors, ACM Conference on Economics and Computation, EC ’14, Stanford, CA, USA, June 8-12, 2014, pages 711–712. ACM, 2014.
[12]
R. Cole and T. Roughgarden. The sample complexity of revenue maximization. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pages 243–252. ACM, 2014.
[13]
B. Colson, P. Marcotte, and G. Savard. Bilevel programming: A survey. 4OR, 3(2):87–107, 2005.
[14]
S. Dughmi, L. Han, and N. Nisan. Sampling and representation complexity of revenue maximization. In Web and Internet Economics, pages 277–291. Springer, 2014.
[15]
Z. Huang, Y. Mansour, and T. Roughgarden. Making the most of your samples. arXiv preprint arXiv:1407.2479, 2014.
[16]
D. Korzhyk, V. Conitzer, and R. Parr. Complexity of computing optimal stackelberg strategies in security resource allocation games. In Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10), 2010.
[17]
J. Letchford, V. Conitzer, and K. Munagala. Learning and approximating the optimal strategy to commit to. In Algorithmic Game Theory, pages 250–262. Springer, 2009.
[18]
L. Lovász and S. Vempala. Simulated annealing in convex bodies and an O*(n4) volume algorithm. J. Comput. Syst. Sci., 72(2):392–417, 2006.
[19]
D. Monderer and L. Shapley. Potential games. Games and Economic Behavior, 14(1):124–143, 1996.
[20]
J. Morgenstern and T. Roughgarden. The pseudo-dimension of near-optimal auctions. arXiv preprint arXiv:1506.03684, 2015.
[21]
A. Roth, J. Ullman, and Z. S. Wu. Watch and learn: Optimizing from revealed preferences feedback. CoRR, abs/1504.01033, 2015.
[22]
M. Sion. On general minimax theorems. Pacific J. Math., 8(1):171–176, 1958.
[23]
M. Zadimoghaddam and A. Roth. Efficiently learning from revealed preference. In P. W. Goldberg, editor, Internet and Network Economics - 8th International Workshop, WINE 2012, Liverpool, UK, December 10-12, 2012. Proceedings, volume 7695 of Lecture Notes in Computer Science, pages 114–127. Springer, 2012.
[24]
A Routing Game Where Social Cost is Not Convex in The Tolls

Cited By

View all
  • (2024)Decentralized online learning in general-sum stackelberg gamesProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702866(4056-4077)Online publication date: 15-Jul-2024
  • (2024)No-regret learning of nash equilibrium for black-box games via Gaussian processesProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702748(1541-1557)Online publication date: 15-Jul-2024
  • (2024)Performative prediction with bandit feedbackProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692354(7298-7324)Online publication date: 21-Jul-2024
  • Show More Cited By

Index Terms

  1. Watch and learn: optimizing from revealed preferences feedback

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC '16: Proceedings of the forty-eighth annual ACM symposium on Theory of Computing
    June 2016
    1141 pages
    ISBN:9781450341325
    DOI:10.1145/2897518
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Game Theory
    2. Learning
    3. Optimization
    4. Revealed Preferences

    Qualifiers

    • Research-article

    Funding Sources

    • Google
    • Simons
    • NSF

    Conference

    STOC '16
    Sponsor:
    STOC '16: Symposium on Theory of Computing
    June 19 - 21, 2016
    MA, Cambridge, USA

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Upcoming Conference

    STOC '25
    57th Annual ACM Symposium on Theory of Computing (STOC 2025)
    June 23 - 27, 2025
    Prague , Czech Republic

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)156
    • Downloads (Last 6 weeks)23
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Decentralized online learning in general-sum stackelberg gamesProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702866(4056-4077)Online publication date: 15-Jul-2024
    • (2024)No-regret learning of nash equilibrium for black-box games via Gaussian processesProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702748(1541-1557)Online publication date: 15-Jul-2024
    • (2024)Performative prediction with bandit feedbackProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692354(7298-7324)Online publication date: 21-Jul-2024
    • (2024)Signaling Security Games with Attack Planner DeceptionMathematics10.3390/math1216253212:16(2532)Online publication date: 16-Aug-2024
    • (2024)Efficient Prior-Free Mechanisms for No-Regret AgentsProceedings of the 25th ACM Conference on Economics and Computation10.1145/3670865.3673460(511-541)Online publication date: 8-Jul-2024
    • (2024)On the Dual Gradient Descent Method for the Resource Allocation Problem in Multiagent SystemsJournal of Applied and Industrial Mathematics10.1134/S199047892402013318:2(316-332)Online publication date: 15-Aug-2024
    • (2024)Optimal Private Payoff Manipulation against Commitment in Extensive-form GamesGames and Economic Behavior10.1016/j.geb.2024.11.008Online publication date: Nov-2024
    • (2023)Nonverbal Human Signals Can Help Autonomous Agents Infer Human Preferences for Their BehaviorProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598652(307-316)Online publication date: 30-May-2023
    • (2023)Metacognitive Radar: Masking Cognition From an Inverse Reinforcement LearnerIEEE Transactions on Aerospace and Electronic Systems10.1109/TAES.2023.331440659:6(8826-8844)Online publication date: Dec-2023
    • (2023)Online Learning for Equilibrium Pricing in Markets under Incomplete Information2023 62nd IEEE Conference on Decision and Control (CDC)10.1109/CDC49753.2023.10383748(4996-5001)Online publication date: 13-Dec-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media