Differentiable Economics - David C. Parkes Harvard University - CityU CS
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Differentiable Economics David C. Parkes Harvard University Joint work with G. Brero, D. Chakrabarti, P. Dütting, A. Eden, Z. Feng, M. Gerstgrasser, N. Golowich, S. Li, V. Li, S. Kominers, J. Ma, H. Narasimhan, S. S. Ravindranath, D. Rheingans-Yoo, and R. Trivedi July 13, 2021
Inverse Problems I Revenue optimal auction design I Matching market design I Optimal taxation policy design I Contract design I ... Differentiable Economics 2 / 36
Typical Recipe Solve via revelation principle (for valuation profile v, density function f , exp utility U ): Z X max ti (v)f (v)dv g,t v i s.t. feasible g Ui (g, t, vi ) ≥ 0 Ui (g, t, vi ) ≥ Ui (g, t, v̂i ), ∀i, ∀vi , ∀v̂i Allocation g : V → [0, 1]n , payment t : V → Rn Differentiable Economics 3 / 36
An Optimal 1-Bidder, 2-item Auction Two additive items i.i.d. U (0, 1). Manelli-Vincent (2006) value item 2 1 (0, 1) 2 3 (1, 1) y √ 2− 2 (0, 0) 3 (1, 0) value x item 1 Differentiable Economics 4 / 36
An Optimal 1-Bidder, 2-item Auction Two additive items i.i.d. U (0, 1). Manelli-Vincent (2006) value item 2 1 (0, 1) 2 3 (1, 1) y √ 2− 2 (0, 0) 3 (1, 0) value x item 1 Differentiable Economics 4 / 36
A Role for Computation I Automated mechanism design (CS02) I Standard approaches: I Dominant strategy incentive compatible, but via LPs and fail to scale or handle continue types I Bayesian incentive compatible (HL10, CDW12, AFH+12,...) Differentiable Economics 5 / 36
A Role for Computation I Automated mechanism design (CS02) I Standard approaches: I Dominant strategy incentive compatible, but via LPs and fail to scale or handle continue types I Bayesian incentive compatible (HL10, CDW12, AFH+12,...) Differentiable Economics 5 / 36
This work: Use Neural Networks Model the rules of an economic system as a flexible, differentiable representation. Differentiable economics. Differentiable Economics 6 / 36
Talk Outline: Three Vignettes I Revenue-optimal auction design I Two-sided matching market design I Indirect mechanism design (sequential price mechanisms) Differentiable Economics 7 / 36
Part I: Revenue-Optimal Auction Design “Optimal Auctions through Deep Learning,” Dütting, Feng, Narasimhan, Parkes, Ravindranath, Proc. ICML’19 I A seller with m distinct indivisible items I A set of n(≥ 1) additive buyers, with valuations vi = (vi1 , . . . , vim ), and vi ∼ Fi I Design an auction (g w , tw ) that maximizes expected revenue s.t. dominant-strategy IC I parameters w, allocation rule g w , payment rule tw Differentiable Economics 8 / 36
Part I: Revenue-Optimal Auction Design “Optimal Auctions through Deep Learning,” Dütting, Feng, Narasimhan, Parkes, Ravindranath, Proc. ICML’19 I A seller with m distinct indivisible items I A set of n(≥ 1) additive buyers, with valuations vi = (vi1 , . . . , vim ), and vi ∼ Fi I Design an auction (g w , tw ) that maximizes expected revenue s.t. dominant-strategy IC I parameters w, allocation rule g w , payment rule tw Differentiable Economics 8 / 36
Architecture 1: RochetNet (Single Bidder) I Learn a menu set {hj }. Utility for choice j, hj (b) = αj · b − βj I Parameters for jth choice define a randomized allocation αj ∈ [0, 1]m and a payment βj ∈ R I Train to maximize expected revenue Differentiable Economics 9 / 36
Architecture 1: RochetNet (Single Bidder) I Learn a menu set {hj }. Utility for choice j, hj (b) = αj · b − βj I Parameters for jth choice define a randomized allocation αj ∈ [0, 1]m and a payment βj ∈ R I Train to maximize expected revenue Differentiable Economics 9 / 36
Architecture 1: RochetNet (Single Bidder) I Learn a menu set {hj }. Utility for choice j, hj (b) = αj · b − βj I Parameters for jth choice define a randomized allocation αj ∈ [0, 1]m and a payment βj ∈ R I Train to maximize expected revenue Differentiable Economics 9 / 36
Manelli-Vincent setting value item 2 1 I Value 1 v1 ∼ U (0, 1) 2 (0, 1) (1, 1) 3 I Value 2 v2 ∼ U (0, 1) √ y 2− 2 3 (0, 0) (1, 0) value x item 1 Differentiable Economics 10 / 36
Manelli-Vincent setting value item 2 1 I Value 1 v1 ∼ U (0, 1) 2 (0, 1) (1, 1) 3 I Value 2 v2 ∼ U (0, 1) √ y 2− 2 3 (0, 0) (1, 0) value x item 1 Prob. of allocating item 1 Prob. of allocating item 2 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 1 1 0.6 0.6 0.6 0.6 v2 v2 0.4 0.4 0.4 0.4 0 0 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 v1 v1 Differentiable Economics 10 / 36
Architecture 2: RegretNet (Multi Bidder) m items, n additive bidders, bid bij of agent i for item j Allocation Payment tw : Rnm → Rn≥0 g w : Rnm → ∆1 × · · · × ∆m αi is fraction of value charged to i Differentiable Economics 11 / 36
RegretNet Training Problem (1 of 2) " n # X min −Ev tw i (v) w i=1 s.t. Ev [regret w i (v)] = 0, all bidders i regret w i (v) is maximum utility gain to bidder i from misrport at profile v. No training labels! Solve via (augmented) Lagrangian method: " n # n X X w min −Ev ti (v) + λi · Ev [regret w i (v)] + · · · w i=1 i=1 Differentiable Economics 12 / 36
RegretNet Training Problem (1 of 2) " n # X min −Ev tw i (v) w i=1 s.t. Ev [regret w i (v)] = 0, all bidders i regret w i (v) is maximum utility gain to bidder i from misrport at profile v. No training labels! Solve via (augmented) Lagrangian method: " n # n X X w min −Ev ti (v) + λi · Ev [regret w i (v)] + · · · w i=1 i=1 Differentiable Economics 12 / 36
RegretNet Training Problem (1 of 2) " n # X min −Ev tw i (v) w i=1 s.t. Ev [regret w i (v)] = 0, all bidders i regret w i (v) is maximum utility gain to bidder i from misrport at profile v. No training labels! Solve via (augmented) Lagrangian method: " n # n X X w min −Ev ti (v) + λi · Ev [regret w i (v)] + · · · w i=1 i=1 Differentiable Economics 12 / 36
RegretNet Training Problem (2 of 2) Adopt stochastic gradient descent to solve the (augmented) Lagrangian optimization problem: " n # n X X w min −Ev ti (v) + λi · Ev [regret w i (v)] + · · · w i=1 i=1 Key challenge: taking derivative through the inner maximization 0 regret w i (v) = max 0 [uw w i (vi , v−i ) − ui (vi , v−i )] vi Idea: fix a ‘defeating misreport’ for (i, v), found via gradient ascent on input Differentiable Economics 13 / 36
RegretNet Training Problem (2 of 2) Adopt stochastic gradient descent to solve the (augmented) Lagrangian optimization problem: " n # n X X w min −Ev ti (v) + λi · Ev [regret w i (v)] + · · · w i=1 i=1 Key challenge: taking derivative through the inner maximization 0 regret w i (v) = max 0 [uw w i (vi , v−i ) − ui (vi , v−i )] vi Idea: fix a ‘defeating misreport’ for (i, v), found via gradient ascent on input Differentiable Economics 13 / 36
RegretNet Training Problem (2 of 2) Adopt stochastic gradient descent to solve the (augmented) Lagrangian optimization problem: " n # n X X w min −Ev ti (v) + λi · Ev [regret w i (v)] + · · · w i=1 i=1 Key challenge: taking derivative through the inner maximization 0 regret w i (v) = max 0 [uw w i (vi , v−i ) − ui (vi , v−i )] vi Idea: fix a ‘defeating misreport’ for (i, v), found via gradient ascent on input Differentiable Economics 13 / 36
Gradient ascent to find defeating misreports I Green dot = true valuation, Red dots = misreports Differentiable Economics 14 / 36
Manelli-Vincent setting value item 2 1 I Value 1 v1 ∼ U (0, 1) 2 (0, 1) (1, 1) 3 I Value 2 v2 ∼ U (0, 1) √ y 2− 2 3 (0, 0) (1, 0) value x item 1 Differentiable Economics 15 / 36
RegretNet: 2 bidders, 2 items, discrete values Extends Yao (2017). For each bidder: I Item 1 vi,1 ∼unif {0.5, 1, 1.5} I Item 2 vi,2 ∼unif {0.5, 1, 1.5} Differentiable Economics 16 / 36
RegretNet: {3, 5}×10 problems I Additive U (0, 1) values I Item-wise Myerson is optimal in limit of large n (Palfrey’83) I Use a larger 5x100 neural network Differentiable Economics 17 / 36
Theory I Generalization bounds for expected revenue and regret I Discover provably optimal designs via RochetNet I Support conjectures: optimality for m ≥ 7 of “straight-jacket auction” for additive U (0, 1) items (GK’18): Items SJA (rev ) RochetNet (rev ) 2 0.549187 0.549175 6 1.943239 1.943216 7 2.318032 2.318032 9 3.086125 3.086125 10 3.477781 3.477722 Differentiable Economics 18 / 36
Theory I Generalization bounds for expected revenue and regret I Discover provably optimal designs via RochetNet I Support conjectures: optimality for m ≥ 7 of “straight-jacket auction” for additive U (0, 1) items (GK’18): Items SJA (rev ) RochetNet (rev ) 2 0.549187 0.549175 6 1.943239 1.943216 7 2.318032 2.318032 9 3.086125 3.086125 10 3.477781 3.477722 Differentiable Economics 18 / 36
Theory I Generalization bounds for expected revenue and regret I Discover provably optimal designs via RochetNet I Support conjectures: optimality for m ≥ 7 of “straight-jacket auction” for additive U (0, 1) items (GK’18): Items SJA (rev ) RochetNet (rev ) 2 0.549187 0.549175 6 1.943239 1.943216 7 2.318032 2.318032 9 3.086125 3.086125 10 3.477781 3.477722 Differentiable Economics 18 / 36
Talk Outline: Three Vignettes I Revenue-optimal auction design I Two-sided matching market design I Indirect mechanism design (sequential price mechanisms) Differentiable Economics 19 / 36
Part II: Two-sided matching markets Workers W , Firms F . Strict preferences. For example: f2 w1 f1 w1 f3 w1 f1 w3 f1 w2 f1 w2 f3 w2 f2 w3 f2 w1 f2 w2 f1 w3 f2 w3 f3 w1 f3 w3 f3 w2 Matching (a) is unstable; e.g., (w1 , f2 ) is a blocking pair. Matching (b) is stable. Differentiable Economics 20 / 36
Trading-off Stability and Strategy-proofness I S. S. Ravindranath, Z. Feng, S. Li, J. Ma, S. Kominers and D. C. Parkes, arXiv 2021 I Impossibility theorem (Roth’82): There is no mechanism that is stable and strategy-proof I But little is understood about how to make tradeoffs. We have only a point solution (deferred acceptance). Differentiable Economics 21 / 36
The Matching Network column − wise normalization β × s̄11 ... s̄1m p11 s11 σ+ r11 = min{ŝ11 , ŝ011 } (1) (R) .. h1 h1 .. .. .. . . . β s̄n1 ... s̄nm . ŝ pnm sn+1m σ+ rn1 = min{ŝn1 , ŝ0n1 } (1) (R) × h2 h2 s̄⊥1 .. s̄⊥m ... . .. ŝ0 . .. .. β q11 . . s011 σ+ r1m = min{ŝ1m , ŝ01m } × s̄011 ... s̄01m s̄01⊥ .. .. .. .. . (1) hJ1 (R) hJR . . β ... . qnm s0nm+1 σ+ ... rnm = min{ŝnm , ŝ0nm } × s̄0n1 s̄0nm s̄0n⊥ row − wise normalization 4 × 256 network + additional layers. Cardinal inputs; e.g., pw· = ( 31 , 23 , − 31 ) for f2 w f1 w ⊥ w f3 . Output is the marginal probabilities for pairwise matches (0-1 decomposition via Birkhoff von-Neumann) Differentiable Economics 22 / 36
The Training Problem Loss function: min λ · stv (g w ) + (1 − λ) · rgt(g w ), for λ ∈ [0, 1] w I stv (g w ) quantifies the violation of a generalization of stability that is suitable for randomized matchings I rgt(g w ) quantifies the expected regret, similarly to RegretNet (can extend to ordinal SP) Differentiable Economics 23 / 36
The Design Frontier Loss function: minw λ · stv (g θ ) + (1 − λ) · rgt(g θ ) Correlated preferences; 4x4 market. New targets for theory! pcorr=0.25 0.005 DA = 16 64 0.004 SP-violation 8 = 64 0.003 6 = 64 4 0.002 = 64 3 = 64 0.001 2 = 64 RSD 0.000 0.00 0.05 0.10 0.15 0.20 Stability Violation Differentiable Economics 24 / 36
Talk Outline: Three Vignettes I Revenue-optimal auction design I Two-sided matching market design I Indirect mechanism design (sequential price mechanisms) Differentiable Economics 25 / 36
Part III: Indirect Mechanism Design I Many inverse problems are indirect I Economic rules that induce a game that is played by participants I c.f., work on learning optimal taxation policies (Zheng et al., arXiv 2020) Joint work with G. Brero, D. Chakrabarti, A. Eden, M. Gerstgrasser, V. Li, and D. Rheingans-Yoo (AAAI’21, ICML’21 workshop, arXiv) Differentiable Economics 26 / 36
Illustration: Sequential Price Mechanisms Differentiable Economics 27 / 36
Illustration: Sequential Price Mechanisms $5 $2 ? Differentiable Economics 27 / 36
Illustration: Sequential Price Mechanisms $4 $1.5 ? Differentiable Economics 27 / 36
Illustration: Sequential Price Mechanisms $1 ? Differentiable Economics 27 / 36
Illustration: Sequential Price Mechanisms Differentiable Economics 27 / 36
Introducing Message passing Differentiable Economics 28 / 36
Introducing Message passing Differentiable Economics 28 / 36
Introducing Message passing Differentiable Economics 28 / 36
New Challenge: Agent Behavior I Two-level learning: agents learn to play the game, mechanism learns how to design the rules given this I Formulate the AMD problem as a MultiFollower Stackelberg game: I Mechanism is a policy network from observations (messages, purchases) to actions (next agent, prices); strategy of leader = design of mechanism I Agents are followers, learn to play the induced game I In the special case of no communication, agents have a dominant strategy equilibrium in SPMs Differentiable Economics 29 / 36
New Challenge: Agent Behavior I Two-level learning: agents learn to play the game, mechanism learns how to design the rules given this I Formulate the AMD problem as a MultiFollower Stackelberg game: I Mechanism is a policy network from observations (messages, purchases) to actions (next agent, prices); strategy of leader = design of mechanism I Agents are followers, learn to play the induced game I In the special case of no communication, agents have a dominant strategy equilibrium in SPMs Differentiable Economics 29 / 36
New Challenge: Agent Behavior I Two-level learning: agents learn to play the game, mechanism learns how to design the rules given this I Formulate the AMD problem as a MultiFollower Stackelberg game: I Mechanism is a policy network from observations (messages, purchases) to actions (next agent, prices); strategy of leader = design of mechanism I Agents are followers, learn to play the induced game I In the special case of no communication, agents have a dominant strategy equilibrium in SPMs Differentiable Economics 29 / 36
Max-Min Fairness (no communicaton) Setting: 9 agents, 2 different kinds of items (5 units each). Train via the PPO actor-critic, policy-gradient algorithm (S+17) Differentiable Economics 30 / 36
Two-Stage Learning I Adopt multiplicate weights (MW) for agents to learn a Bayesian coarse-correlated equilibrium I Formulate a Stackelberg MDP. A single-player POMDP with two phases: I A learning phase, with Markovian learning-dynamics (MW) representing adaptation of agents I A best-response phase, with the reward to the leader based on learned agent strategies Also with R. Trivedi Differentiable Economics 31 / 36
Two-Stage Learning I Adopt multiplicate weights (MW) for agents to learn a Bayesian coarse-correlated equilibrium I Formulate a Stackelberg MDP. A single-player POMDP with two phases: I A learning phase, with Markovian learning-dynamics (MW) representing adaptation of agents I A best-response phase, with the reward to the leader based on learned agent strategies Also with R. Trivedi Differentiable Economics 31 / 36
Two-Stage Learning I Adopt multiplicate weights (MW) for agents to learn a Bayesian coarse-correlated equilibrium I Formulate a Stackelberg MDP. A single-player POMDP with two phases: I A learning phase, with Markovian learning-dynamics (MW) representing adaptation of agents I A best-response phase, with the reward to the leader based on learned agent strategies Also with R. Trivedi Differentiable Economics 31 / 36
Illustrative results I 2 agents, 1 item. Agent 1’s value is {0, 1} w.p. {1/2, 1/2}, agent 2’s value is {1/2, 1/2} w.p. {1 − , }. I Optimal SPM visits 1 then 2 (price zero), and inefficient when v1 = 1, v2 = 1/2. Differentiable Economics 32 / 36
Discussion I Direct mechanisms I Scaling up; e.g., one agent for multiple market sizes (RJBW’21), combinatorial problems I Leveraging chatacterization results such as monotonicity I Robustness, for example certificates of low regret (CCGD’20) I Indirect mechanisms I Scaling up the Stackelberg MDP I Leveraging gradient dynamics convergence for potential games (BFHKS’21, MRS’20) I Derivatives through equilibrium (WXPRT’21) I Centralized learning, decentralized execution inspired approaches (FAFW’16) Differentiable Economics 33 / 36
Discussion I Direct mechanisms I Scaling up; e.g., one agent for multiple market sizes (RJBW’21), combinatorial problems I Leveraging chatacterization results such as monotonicity I Robustness, for example certificates of low regret (CCGD’20) I Indirect mechanisms I Scaling up the Stackelberg MDP I Leveraging gradient dynamics convergence for potential games (BFHKS’21, MRS’20) I Derivatives through equilibrium (WXPRT’21) I Centralized learning, decentralized execution inspired approaches (FAFW’16) Differentiable Economics 33 / 36
Conclusion I End-to-end learning for economic design with in-expectation objectives I Flexible: seems interesting for a myriad of design problems I Differentiable Economics Thank You! I Funding support: DARPA cooperative agreement HR00111920029; AWS gift I P. Dütting, Z. Feng, H. Narasimhan, D. C. Parkes, S. S. Ravindranath: “Optimal Auctions through Deep Learning,” ICML’19 (first version 2017); longer version arXiv 2020 I N. Golowich, H. Narasimhan, and D. C. Parkes: “Deep Learning for Multi-Facility Location Mechanism Design,” IJCAI’18 I Z. Feng, H. Narasimhan, and D. C. Parkes: “Deep Learning for Revenue-Optimal Auctions with Budgets,” AAMAS’18 I G. Brero, A. Eden, M. Gerstgrasser, D. C. Parkes, D. Rheingans-Yoo: “Reinforcement learning of simple indirect mechanisms”, AAAI’21 I S. Zheng, A. Trott, S. Srinivasa, N. Naik, M. Gruesbeck, D. C. Parkes, R. Socher: “The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies”, arXiv 2020 I Z. Feng, D. C. Parkes, S. S. Ravindranath: “Machine Learning for Matching Markets”, Online and Matching-Based Market Design, Echenique et al. (Eds) 2021 I G. Brero, D. Chakrabarti, A. Eden, M. Gerstgrasser, V. Li, D. C. Parkes, “Learning Stackelberg Equilibria in Sequential Price Mechanisms”, ICML 2021 Workshop on Reinforcement Learning Theory Differentiable Economics 34 / 36
Conclusion I End-to-end learning for economic design with in-expectation objectives I Flexible: seems interesting for a myriad of design problems I Differentiable Economics Thank You! I Funding support: DARPA cooperative agreement HR00111920029; AWS gift I P. Dütting, Z. Feng, H. Narasimhan, D. C. Parkes, S. S. Ravindranath: “Optimal Auctions through Deep Learning,” ICML’19 (first version 2017); longer version arXiv 2020 I N. Golowich, H. Narasimhan, and D. C. Parkes: “Deep Learning for Multi-Facility Location Mechanism Design,” IJCAI’18 I Z. Feng, H. Narasimhan, and D. C. Parkes: “Deep Learning for Revenue-Optimal Auctions with Budgets,” AAMAS’18 I G. Brero, A. Eden, M. Gerstgrasser, D. C. Parkes, D. Rheingans-Yoo: “Reinforcement learning of simple indirect mechanisms”, AAAI’21 I S. Zheng, A. Trott, S. Srinivasa, N. Naik, M. Gruesbeck, D. C. Parkes, R. Socher: “The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies”, arXiv 2020 I Z. Feng, D. C. Parkes, S. S. Ravindranath: “Machine Learning for Matching Markets”, Online and Matching-Based Market Design, Echenique et al. (Eds) 2021 I G. Brero, D. Chakrabarti, A. Eden, M. Gerstgrasser, V. Li, D. C. Parkes, “Learning Stackelberg Equilibria in Sequential Price Mechanisms”, ICML 2021 Workshop on Reinforcement Learning Theory Differentiable Economics 34 / 36
Conclusion I End-to-end learning for economic design with in-expectation objectives I Flexible: seems interesting for a myriad of design problems I Differentiable Economics Thank You! I Funding support: DARPA cooperative agreement HR00111920029; AWS gift I P. Dütting, Z. Feng, H. Narasimhan, D. C. Parkes, S. S. Ravindranath: “Optimal Auctions through Deep Learning,” ICML’19 (first version 2017); longer version arXiv 2020 I N. Golowich, H. Narasimhan, and D. C. Parkes: “Deep Learning for Multi-Facility Location Mechanism Design,” IJCAI’18 I Z. Feng, H. Narasimhan, and D. C. Parkes: “Deep Learning for Revenue-Optimal Auctions with Budgets,” AAMAS’18 I G. Brero, A. Eden, M. Gerstgrasser, D. C. Parkes, D. Rheingans-Yoo: “Reinforcement learning of simple indirect mechanisms”, AAAI’21 I S. Zheng, A. Trott, S. Srinivasa, N. Naik, M. Gruesbeck, D. C. Parkes, R. Socher: “The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies”, arXiv 2020 I Z. Feng, D. C. Parkes, S. S. Ravindranath: “Machine Learning for Matching Markets”, Online and Matching-Based Market Design, Echenique et al. (Eds) 2021 I G. Brero, D. Chakrabarti, A. Eden, M. Gerstgrasser, V. Li, D. C. Parkes, “Learning Stackelberg Equilibria in Sequential Price Mechanisms”, ICML 2021 Workshop on Reinforcement Learning Theory Differentiable Economics 34 / 36
Additional References (1 of 2) I J. Rahme, S. Jelassi, J. Bruna, M. Weinberg, “A Permutation-Equivariant Neural Network Architecture For Auction Design” AAAI’21 I M. J. Curry, P. Chiang, T. Goldstein, J. Dickerson, “Certifying Strategyproof Auction Networks” NeurIPS’20 I S. Zheng, A. Trott, S. Srinivasa, N. Naik, M. Gruesbeck, D. C. Parkes, R. Socher, “The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies” arXiv 2004.13332, 2020 I M. Bichler, M. Fichtl, S. Heidekrüger, N. Kohring, P. Sutterer, “Learning Equilibria in Symmetric Auction Games using Artificial Neural Networks,” Nature Machine Intelligence (forthcoming) I E. Mazumdar, L. J. Ratliff, S. S. Sastry, “On gradient-based learning in continuous games,” SIAM Journal on Mathematics of Data Science 2020. I K. Wang, L. Xu, A. Perrault, M. K. Reiter, M. Tambe, “Coordinating Followers to Reach Better Equilibria: End-to-End Gradient Descent for Stackelberg Games” arXiv 2021 I J. Foerster, I. Assael, N. de Freitas, S. Whiteson, “Learning to communicate with deep multi-agent reinforcement learning” NeurIPS 2016 I J. Hartline, V. Syrgkanis, E. Tardos, “No-regret learning in Bayesian games,” arXiv 2015 I J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, “Proximal Policy Optimization Algorithms”, arXiv 2017 I J. D. Hartline, B. Lucier, “Bayesian algorithmic mechanism design” STOC 2010 I Y. Cai, C. Daskalakis, M. S. Weinberg, “Optimal multi-dimensional mechanism design: Reducing revenue to welfare maximization” FOCS’12 I S. Alaei, H. Fu, N. Haghpanah, J. D. Hartline, A. Malekian, “Bayesian optimal auctions via multi- to single-agent reduction” EC’12 I V. Conitzer, T. Sandholm, “Complexity of mechanism design” UAI’02 Differentiable Economics 35 / 36
Additional References (2 of 2) I A. Manelli, D. Vincent, “Bundling as an optimal selling mechanism for a multiple-good monopolist” Journal of Economic Theory 2006. I A. C.-C. Yao, “Dominant-strategy versus bayesian multi-item auctions: Maximum revenue determination and comparison” EC’17 I T. Palfrey, “Bundling decisions by a multiproduct monopolist with incomplete information,” Econometrica 1983. I Y. Giannakopoulos, E. Koutsoupias, “Duality and optimality of auctions for uniform distributions” SIAM Journal on Computing 2018 I A. E. Roth, “The economics of matching: Stability and incentives” Mathematics of Operations Research 1982. I P. Dütting, Z. Feng, H. Narasimhan, D. C. Parkes, S. S. Ravindranath: “Optimal Auctions through Deep Learning,” arXiv 1706.03459, 2020 Differentiable Economics 36 / 36
You can also read