Differentiable Economics - David C. Parkes Harvard University - CityU CS

Page created by Chester Cruz

Current Events

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Differentiable Economics - David C. Parkes Harvard University - CityU CS

Differentiable Economics

                               David C. Parkes
                              Harvard University

Joint work with G. Brero, D. Chakrabarti, P. Dütting, A. Eden, Z. Feng, M. Gerstgrasser,
   N. Golowich, S. Li, V. Li, S. Kominers, J. Ma, H. Narasimhan, S. S. Ravindranath,
                             D. Rheingans-Yoo, and R. Trivedi

                                  July 13, 2021

Inverse Problems

 I   Revenue optimal auction design
 I   Matching market design
 I   Optimal taxation policy design
 I   Contract design
 I   ...

                                      Differentiable Economics   2 / 36

Typical Recipe
Solve via revelation principle (for valuation profile v, density
function f , exp utility U ):
                     Z X
             max              ti (v)f (v)dv
             g,t     v   i
             s.t. feasible g
                  Ui (g, t, vi ) ≥ 0
                  Ui (g, t, vi ) ≥ Ui (g, t, v̂i ),    ∀i, ∀vi , ∀v̂i

Allocation g : V → [0, 1]n , payment t : V → Rn

                                                  Differentiable Economics   3 / 36

An Optimal 1-Bidder, 2-item Auction
Two additive items i.i.d. U (0, 1). Manelli-Vincent (2006)

              value item 2

                  1
                      (0, 1)
                  2
                  3              (1, 1)
                  y
                √
              2− 2      (0, 0)
                3
                                    (1, 0)          value
                                      x             item 1

                                             Differentiable Economics   4 / 36

An Optimal 1-Bidder, 2-item Auction
Two additive items i.i.d. U (0, 1). Manelli-Vincent (2006)

              value item 2

                  1
                      (0, 1)
                  2
                  3              (1, 1)
                  y
                √
              2− 2      (0, 0)
                3
                                    (1, 0)          value
                                      x             item 1

                                             Differentiable Economics   4 / 36

A Role for Computation
 I Automated mechanism design (CS02)
 I Standard approaches:
    I Dominant strategy incentive compatible, but via LPs and fail
      to scale or handle continue types
    I Bayesian incentive compatible (HL10, CDW12, AFH+12,...)

                                        Differentiable Economics   5 / 36

A Role for Computation
 I Automated mechanism design (CS02)
 I Standard approaches:
    I Dominant strategy incentive compatible, but via LPs and fail
      to scale or handle continue types
    I Bayesian incentive compatible (HL10, CDW12, AFH+12,...)

                                        Differentiable Economics   5 / 36

This work: Use Neural Networks

Model the rules of an economic system as a flexible,
differentiable representation. Differentiable economics.

                                        Differentiable Economics   6 / 36

Talk Outline: Three Vignettes
 I Revenue-optimal auction design
 I Two-sided matching market design
 I Indirect mechanism design (sequential price mechanisms)

                                    Differentiable Economics   7 / 36

Part I: Revenue-Optimal Auction Design
“Optimal Auctions through Deep Learning,” Dütting, Feng,
Narasimhan, Parkes, Ravindranath, Proc. ICML’19

 I A seller with m distinct indivisible items
 I A set of n(≥ 1) additive buyers, with valuations
   vi = (vi1 , . . . , vim ), and vi ∼ Fi
 I Design an auction (g w , tw ) that maximizes expected revenue
   s.t. dominant-strategy IC
      I parameters w, allocation rule g w , payment rule tw

                                           Differentiable Economics   8 / 36

Part I: Revenue-Optimal Auction Design
“Optimal Auctions through Deep Learning,” Dütting, Feng,
Narasimhan, Parkes, Ravindranath, Proc. ICML’19

 I A seller with m distinct indivisible items
 I A set of n(≥ 1) additive buyers, with valuations
   vi = (vi1 , . . . , vim ), and vi ∼ Fi
 I Design an auction (g w , tw ) that maximizes expected revenue
   s.t. dominant-strategy IC
      I parameters w, allocation rule g w , payment rule tw

                                           Differentiable Economics   8 / 36

Architecture 1: RochetNet (Single Bidder)

 I Learn a menu set {hj }. Utility for choice j, hj (b) = αj · b − βj
      I Parameters for jth choice define a randomized allocation
        αj ∈ [0, 1]m and a payment βj ∈ R
 I Train to maximize expected revenue
                                           Differentiable Economics   9 / 36

Architecture 1: RochetNet (Single Bidder)

 I Learn a menu set {hj }. Utility for choice j, hj (b) = αj · b − βj
      I Parameters for jth choice define a randomized allocation
        αj ∈ [0, 1]m and a payment βj ∈ R
 I Train to maximize expected revenue
                                           Differentiable Economics   9 / 36

Architecture 1: RochetNet (Single Bidder)

 I Learn a menu set {hj }. Utility for choice j, hj (b) = αj · b − βj
      I Parameters for jth choice define a randomized allocation
        αj ∈ [0, 1]m and a payment βj ∈ R
 I Train to maximize expected revenue
                                           Differentiable Economics   9 / 36

Manelli-Vincent setting
                           value item 2

                               1
 I Value 1 v1 ∼ U (0, 1)       2
                                   (0, 1)
                                              (1, 1)
                               3

 I Value 2 v2 ∼ U (0, 1)     √
                              y
                           2− 2
                             3
                                     (0, 0)
                                                 (1, 0)   value
                                                   x      item 1

                             Differentiable Economics          10 / 36

Manelli-Vincent setting
                                                                               value item 2

                                                                                   1
 I Value 1 v1 ∼ U (0, 1)                                                           2
                                                                                       (0, 1)
                                                                                                     (1, 1)
                                                                                   3

 I Value 2 v2 ∼ U (0, 1)                                                         √
                                                                                  y
                                                                               2− 2
                                                                                 3
                                                                                           (0, 0)
                                                                                                          (1, 0)         value
                                                                                                            x            item 1

             Prob. of allocating item 1                                     Prob. of allocating item 2
       1.0                                                 1.0        1.0                                                         1.0

       0.8                                                 0.8        0.8                                                         0.8
                                           1                                                                   1
       0.6                                                 0.6        0.6                                                         0.6
  v2

                                                                 v2
       0.4                                                 0.4        0.4                                                         0.4
                      0                                                                0
       0.2                                                 0.2        0.2                                                         0.2

       0.0                                                 0.0        0.0                                                         0.0
          0.0   0.2       0.4        0.6       0.8   1.0                 0.0     0.2         0.4         0.6       0.8      1.0
                                v1                                                                  v1

                                                                                 Differentiable Economics                     10 / 36

Architecture 2: RegretNet (Multi Bidder)
m items, n additive bidders, bid bij of agent i for item j
            Allocation                           Payment

                                            tw : Rnm → Rn≥0
   g w : Rnm → ∆1 × · · · × ∆m    αi is fraction of value charged to i

                                          Differentiable Economics   11 / 36

RegretNet Training Problem (1 of 2)
                           "   n
                                           #
                               X
                min −Ev            tw
                                    i (v)
                 w
                            i=1
              s.t.   Ev [regret w
                                i (v)]   = 0,   all bidders i

regret w
       i (v) is maximum utility gain to bidder i from misrport at
profile v. No training labels!
Solve via (augmented) Lagrangian method:
               " n       #   n
                X           X
                    w
        min −Ev    ti (v) +    λi · Ev [regret w
                                               i (v)] + · · ·
         w
                     i=1             i=1

                                                Differentiable Economics   12 / 36

RegretNet Training Problem (1 of 2)
                           "   n
                                           #
                               X
                min −Ev            tw
                                    i (v)
                 w
                            i=1
              s.t.   Ev [regret w
                                i (v)]   = 0,   all bidders i

regret w
       i (v) is maximum utility gain to bidder i from misrport at
profile v. No training labels!
Solve via (augmented) Lagrangian method:
               " n       #   n
                X           X
                    w
        min −Ev    ti (v) +    λi · Ev [regret w
                                               i (v)] + · · ·
         w
                     i=1             i=1

                                                Differentiable Economics   12 / 36

RegretNet Training Problem (1 of 2)
                           "   n
                                           #
                               X
                min −Ev            tw
                                    i (v)
                 w
                            i=1
              s.t.   Ev [regret w
                                i (v)]   = 0,   all bidders i

regret w
       i (v) is maximum utility gain to bidder i from misrport at
profile v. No training labels!
Solve via (augmented) Lagrangian method:
               " n       #   n
                X           X
                    w
        min −Ev    ti (v) +    λi · Ev [regret w
                                               i (v)] + · · ·
         w
                     i=1             i=1

                                                Differentiable Economics   12 / 36

RegretNet Training Problem (2 of 2)
Adopt stochastic gradient descent to solve the (augmented)
Lagrangian optimization problem:
                " n       #   n
                 X           X
                     w
       min −Ev      ti (v) +     λi · Ev [regret w
                                                 i (v)] + · · ·
         w
                    i=1             i=1

Key challenge: taking derivative through the inner maximization
                                      0
             regret w
                    i (v) = max
                             0
                                [uw                w
                                  i (vi , v−i ) − ui (vi , v−i )]
                              vi

Idea: fix a ‘defeating misreport’ for (i, v), found via gradient
ascent on input

                                                Differentiable Economics   13 / 36

RegretNet Training Problem (2 of 2)
Adopt stochastic gradient descent to solve the (augmented)
Lagrangian optimization problem:
                " n       #   n
                 X           X
                     w
       min −Ev      ti (v) +     λi · Ev [regret w
                                                 i (v)] + · · ·
         w
                    i=1             i=1

Key challenge: taking derivative through the inner maximization
                                      0
             regret w
                    i (v) = max
                             0
                                [uw                w
                                  i (vi , v−i ) − ui (vi , v−i )]
                              vi

Idea: fix a ‘defeating misreport’ for (i, v), found via gradient
ascent on input

                                                Differentiable Economics   13 / 36

RegretNet Training Problem (2 of 2)
Adopt stochastic gradient descent to solve the (augmented)
Lagrangian optimization problem:
                " n       #   n
                 X           X
                     w
       min −Ev      ti (v) +     λi · Ev [regret w
                                                 i (v)] + · · ·
         w
                    i=1             i=1

Key challenge: taking derivative through the inner maximization
                                      0
             regret w
                    i (v) = max
                             0
                                [uw                w
                                  i (vi , v−i ) − ui (vi , v−i )]
                              vi

Idea: fix a ‘defeating misreport’ for (i, v), found via gradient
ascent on input

                                                Differentiable Economics   13 / 36

Gradient ascent to find defeating misreports

 I Green dot = true valuation, Red dots = misreports

                                     Differentiable Economics   14 / 36

Manelli-Vincent setting
                           value item 2

                               1
 I Value 1 v1 ∼ U (0, 1)       2
                                   (0, 1)
                                              (1, 1)
                               3

 I Value 2 v2 ∼ U (0, 1)     √
                              y
                           2− 2
                             3
                                     (0, 0)
                                                 (1, 0)   value
                                                   x      item 1

                             Differentiable Economics          15 / 36

RegretNet: 2 bidders, 2 items, discrete
values
Extends Yao (2017). For each bidder:
 I Item 1 vi,1 ∼unif {0.5, 1, 1.5}
 I Item 2 vi,2 ∼unif {0.5, 1, 1.5}

                                       Differentiable Economics   16 / 36

RegretNet: {3, 5}×10 problems
 I Additive U (0, 1) values
 I Item-wise Myerson is optimal in limit of large n (Palfrey’83)
 I Use a larger 5x100 neural network

                                        Differentiable Economics   17 / 36

Theory
 I Generalization bounds for expected revenue and regret
 I Discover provably optimal designs via RochetNet
 I Support conjectures: optimality for m ≥ 7 of “straight-jacket
   auction” for additive U (0, 1) items (GK’18):

               Items   SJA (rev )   RochetNet (rev )
                  2    0.549187       0.549175
                  6    1.943239       1.943216
                  7    2.318032       2.318032
                  9    3.086125       3.086125
                 10    3.477781       3.477722

                                          Differentiable Economics   18 / 36

Theory
 I Generalization bounds for expected revenue and regret
 I Discover provably optimal designs via RochetNet
 I Support conjectures: optimality for m ≥ 7 of “straight-jacket
   auction” for additive U (0, 1) items (GK’18):

               Items   SJA (rev )   RochetNet (rev )
                  2    0.549187       0.549175
                  6    1.943239       1.943216
                  7    2.318032       2.318032
                  9    3.086125       3.086125
                 10    3.477781       3.477722

                                          Differentiable Economics   18 / 36

Theory
 I Generalization bounds for expected revenue and regret
 I Discover provably optimal designs via RochetNet
 I Support conjectures: optimality for m ≥ 7 of “straight-jacket
   auction” for additive U (0, 1) items (GK’18):

               Items   SJA (rev )   RochetNet (rev )
                  2    0.549187       0.549175
                  6    1.943239       1.943216
                  7    2.318032       2.318032
                  9    3.086125       3.086125
                 10    3.477781       3.477722

                                          Differentiable Economics   18 / 36

Talk Outline: Three Vignettes
 I Revenue-optimal auction design
 I Two-sided matching market design
 I Indirect mechanism design (sequential price mechanisms)

                                    Differentiable Economics   19 / 36

Part II: Two-sided matching markets
Workers W , Firms F . Strict preferences. For example:
           f2 w1 f1 w1 f3              w1 f1 w3 f1 w2
           f1 w2 f3 w2 f2              w3 f2 w1 f2 w2
           f1 w3 f2 w3 f3              w1 f3 w3 f3 w2

Matching (a) is unstable; e.g., (w1 , f2 ) is a blocking pair.
Matching (b) is stable.
                                            Differentiable Economics   20 / 36

Trading-off Stability and Strategy-proofness
 I S. S. Ravindranath, Z. Feng, S. Li, J. Ma, S. Kominers and
   D. C. Parkes, arXiv 2021
 I Impossibility theorem (Roth’82): There is no mechanism
   that is stable and strategy-proof
 I But little is understood about how to make tradeoffs. We
   have only a point solution (deferred acceptance).

                                      Differentiable Economics   21 / 36

The Matching Network
                                                       column − wise
                                                       normalization
                                               β
                                                   ×    s̄11    ...    s̄1m
  p11                             s11     σ+                                                         r11 = min{ŝ11 , ŝ011 }
              (1)          (R)
        ..   h1           h1       ..      ..                                                  ..
         .                          .       . β         s̄n1    ...    s̄nm                     .
                                                                                         ŝ
 pnm                             sn+1m    σ+                                                         rn1 = min{ŝn1 , ŝ0n1 }
              (1)          (R)                     ×
             h2           h2                            s̄⊥1     ..    s̄⊥m
                    ...                                           .                            ..
                                                                                         ŝ0    .
              ..           ..                  β
  q11          .            .     s011    σ+                                                         r1m = min{ŝ1m , ŝ01m }
                                                   ×    s̄011   ...    s̄01m     s̄01⊥
        ..                          ..     ..                                                  ..
         .    (1)
             hJ1
                           (R)
                          hJR        .      . β                  ...                            .
 qnm                             s0nm+1   σ+
                                                                ...                                  rnm = min{ŝnm , ŝ0nm }
                                                   ×    s̄0n1          s̄0nm     s̄0n⊥

                                                        row − wise
                                                       normalization

4 × 256 network + additional layers. Cardinal inputs; e.g.,
pw· = ( 31 , 23 , − 31 ) for f2 w f1 w ⊥ w f3 .
Output is the marginal probabilities for pairwise matches (0-1
decomposition via Birkhoff von-Neumann)
                                                                               Differentiable Economics       22 / 36

The Training Problem
Loss function:

        min λ · stv (g w ) + (1 − λ) · rgt(g w ),     for λ ∈ [0, 1]
         w

 I stv (g w ) quantifies the violation of a generalization of stability
   that is suitable for randomized matchings
 I rgt(g w ) quantifies the expected regret, similarly to
   RegretNet (can extend to ordinal SP)

                                              Differentiable Economics   23 / 36

The Design Frontier
Loss function: minw λ · stv (g θ ) + (1 − λ) · rgt(g θ )
Correlated preferences; 4x4 market. New targets for theory!

                                   pcorr=0.25
                        0.005 DA
                                = 16
                                  64
                        0.004
         SP-violation

                                    8
                                  = 64
                        0.003         6
                                    = 64
                                        4
                        0.002        = 64
                                        3
                                     = 64
                        0.001               2
                                          = 64   RSD
                        0.000
                            0.00 0.05 0.10 0.15 0.20
                                Stability Violation
                                             Differentiable Economics   24 / 36

Talk Outline: Three Vignettes
 I Revenue-optimal auction design
 I Two-sided matching market design
 I Indirect mechanism design (sequential price mechanisms)

                                    Differentiable Economics   25 / 36

Part III: Indirect Mechanism Design
 I Many inverse problems are indirect
 I Economic rules that induce a game that is played by
   participants
 I c.f., work on learning optimal taxation policies (Zheng et al.,
   arXiv 2020)

Joint work with G. Brero, D. Chakrabarti, A. Eden, M. Gerstgrasser,
V. Li, and D. Rheingans-Yoo (AAAI’21, ICML’21 workshop, arXiv)

                                           Differentiable Economics   26 / 36

Illustration: Sequential Price Mechanisms

                           Differentiable Economics   27 / 36

Illustration: Sequential Price Mechanisms

      $5       $2

           ?

                           Differentiable Economics   27 / 36

Illustration: Sequential Price Mechanisms

                 $4       $1.5
                      ?

                             Differentiable Economics   27 / 36

Illustration: Sequential Price Mechanisms

                                            $1

                                             ?

                           Differentiable Economics   27 / 36

Illustration: Sequential Price Mechanisms

                           Differentiable Economics   27 / 36

Introducing Message passing

                         Differentiable Economics   28 / 36

Introducing Message passing

                         Differentiable Economics   28 / 36

Introducing Message passing

                         Differentiable Economics   28 / 36

New Challenge: Agent Behavior
 I Two-level learning: agents learn to play the game,
   mechanism learns how to design the rules given this
 I Formulate the AMD problem as a MultiFollower Stackelberg
   game:
     I Mechanism is a policy network from observations
       (messages, purchases) to actions (next agent, prices);
       strategy of leader = design of mechanism
     I Agents are followers, learn to play the induced game
 I In the special case of no communication, agents have a
   dominant strategy equilibrium in SPMs

                                        Differentiable Economics   29 / 36

New Challenge: Agent Behavior
 I Two-level learning: agents learn to play the game,
   mechanism learns how to design the rules given this
 I Formulate the AMD problem as a MultiFollower Stackelberg
   game:
     I Mechanism is a policy network from observations
       (messages, purchases) to actions (next agent, prices);
       strategy of leader = design of mechanism
     I Agents are followers, learn to play the induced game
 I In the special case of no communication, agents have a
   dominant strategy equilibrium in SPMs

                                        Differentiable Economics   29 / 36

New Challenge: Agent Behavior
 I Two-level learning: agents learn to play the game,
   mechanism learns how to design the rules given this
 I Formulate the AMD problem as a MultiFollower Stackelberg
   game:
     I Mechanism is a policy network from observations
       (messages, purchases) to actions (next agent, prices);
       strategy of leader = design of mechanism
     I Agents are followers, learn to play the induced game
 I In the special case of no communication, agents have a
   dominant strategy equilibrium in SPMs

                                        Differentiable Economics   29 / 36

Max-Min Fairness (no communicaton)
Setting: 9 agents, 2 different kinds of items (5 units each). Train
via the PPO actor-critic, policy-gradient algorithm (S+17)

                                         Differentiable Economics   30 / 36

Two-Stage Learning
 I Adopt multiplicate weights (MW) for agents to learn a
   Bayesian coarse-correlated equilibrium
 I Formulate a Stackelberg MDP. A single-player POMDP with
   two phases:
      I A learning phase, with Markovian learning-dynamics (MW)
        representing adaptation of agents
      I A best-response phase, with the reward to the leader based
        on learned agent strategies

Also with R. Trivedi

                                         Differentiable Economics   31 / 36

Two-Stage Learning
 I Adopt multiplicate weights (MW) for agents to learn a
   Bayesian coarse-correlated equilibrium
 I Formulate a Stackelberg MDP. A single-player POMDP with
   two phases:
      I A learning phase, with Markovian learning-dynamics (MW)
        representing adaptation of agents
      I A best-response phase, with the reward to the leader based
        on learned agent strategies

Also with R. Trivedi

                                         Differentiable Economics   31 / 36

Two-Stage Learning
 I Adopt multiplicate weights (MW) for agents to learn a
   Bayesian coarse-correlated equilibrium
 I Formulate a Stackelberg MDP. A single-player POMDP with
   two phases:
      I A learning phase, with Markovian learning-dynamics (MW)
        representing adaptation of agents
      I A best-response phase, with the reward to the leader based
        on learned agent strategies

Also with R. Trivedi

                                         Differentiable Economics   31 / 36

Illustrative results
 I 2 agents, 1 item. Agent 1’s value is {0, 1} w.p. {1/2, 1/2}, agent
    2’s value is {1/2, 1/2} w.p. {1 − , }.
 I Optimal SPM visits 1 then 2 (price zero), and inefficient when
   v1 = 1, v2 = 1/2.

                                                Differentiable Economics   32 / 36

Discussion
 I Direct mechanisms
    I Scaling up; e.g., one agent for multiple market sizes
      (RJBW’21), combinatorial problems
    I Leveraging chatacterization results such as monotonicity
    I Robustness, for example certificates of low regret
      (CCGD’20)
 I Indirect mechanisms
    I Scaling up the Stackelberg MDP
    I Leveraging gradient dynamics convergence for potential
      games (BFHKS’21, MRS’20)
    I Derivatives through equilibrium (WXPRT’21)
    I Centralized learning, decentralized execution inspired
      approaches (FAFW’16)

                                       Differentiable Economics   33 / 36

Discussion
 I Direct mechanisms
    I Scaling up; e.g., one agent for multiple market sizes
      (RJBW’21), combinatorial problems
    I Leveraging chatacterization results such as monotonicity
    I Robustness, for example certificates of low regret
      (CCGD’20)
 I Indirect mechanisms
    I Scaling up the Stackelberg MDP
    I Leveraging gradient dynamics convergence for potential
      games (BFHKS’21, MRS’20)
    I Derivatives through equilibrium (WXPRT’21)
    I Centralized learning, decentralized execution inspired
      approaches (FAFW’16)

                                       Differentiable Economics   33 / 36

Conclusion
 I End-to-end learning for economic design with
   in-expectation objectives
 I Flexible: seems interesting for a myriad of design problems
 I Differentiable Economics

                                    Thank You!
 I Funding support: DARPA cooperative agreement HR00111920029; AWS gift
 I P. Dütting, Z. Feng, H. Narasimhan, D. C. Parkes, S. S. Ravindranath: “Optimal Auctions through Deep Learning,”
   ICML’19 (first version 2017); longer version arXiv 2020
 I N. Golowich, H. Narasimhan, and D. C. Parkes: “Deep Learning for Multi-Facility Location Mechanism Design,” IJCAI’18
 I Z. Feng, H. Narasimhan, and D. C. Parkes: “Deep Learning for Revenue-Optimal Auctions with Budgets,” AAMAS’18
 I G. Brero, A. Eden, M. Gerstgrasser, D. C. Parkes, D. Rheingans-Yoo: “Reinforcement learning of simple indirect
   mechanisms”, AAAI’21
 I S. Zheng, A. Trott, S. Srinivasa, N. Naik, M. Gruesbeck, D. C. Parkes, R. Socher: “The AI Economist: Improving Equality
   and Productivity with AI-Driven Tax Policies”, arXiv 2020
 I Z. Feng, D. C. Parkes, S. S. Ravindranath: “Machine Learning for Matching Markets”, Online and Matching-Based
   Market Design, Echenique et al. (Eds) 2021
 I G. Brero, D. Chakrabarti, A. Eden, M. Gerstgrasser, V. Li, D. C. Parkes, “Learning Stackelberg Equilibria in Sequential
   Price Mechanisms”, ICML 2021 Workshop on Reinforcement Learning Theory

                                                                         Differentiable Economics       34 / 36

Conclusion
 I End-to-end learning for economic design with
   in-expectation objectives
 I Flexible: seems interesting for a myriad of design problems
 I Differentiable Economics

                                    Thank You!
 I Funding support: DARPA cooperative agreement HR00111920029; AWS gift
 I P. Dütting, Z. Feng, H. Narasimhan, D. C. Parkes, S. S. Ravindranath: “Optimal Auctions through Deep Learning,”
   ICML’19 (first version 2017); longer version arXiv 2020
 I N. Golowich, H. Narasimhan, and D. C. Parkes: “Deep Learning for Multi-Facility Location Mechanism Design,” IJCAI’18
 I Z. Feng, H. Narasimhan, and D. C. Parkes: “Deep Learning for Revenue-Optimal Auctions with Budgets,” AAMAS’18
 I G. Brero, A. Eden, M. Gerstgrasser, D. C. Parkes, D. Rheingans-Yoo: “Reinforcement learning of simple indirect
   mechanisms”, AAAI’21
 I S. Zheng, A. Trott, S. Srinivasa, N. Naik, M. Gruesbeck, D. C. Parkes, R. Socher: “The AI Economist: Improving Equality
   and Productivity with AI-Driven Tax Policies”, arXiv 2020
 I Z. Feng, D. C. Parkes, S. S. Ravindranath: “Machine Learning for Matching Markets”, Online and Matching-Based
   Market Design, Echenique et al. (Eds) 2021
 I G. Brero, D. Chakrabarti, A. Eden, M. Gerstgrasser, V. Li, D. C. Parkes, “Learning Stackelberg Equilibria in Sequential
   Price Mechanisms”, ICML 2021 Workshop on Reinforcement Learning Theory

                                                                         Differentiable Economics       34 / 36

Conclusion
 I End-to-end learning for economic design with
   in-expectation objectives
 I Flexible: seems interesting for a myriad of design problems
 I Differentiable Economics

                                    Thank You!
 I Funding support: DARPA cooperative agreement HR00111920029; AWS gift
 I P. Dütting, Z. Feng, H. Narasimhan, D. C. Parkes, S. S. Ravindranath: “Optimal Auctions through Deep Learning,”
   ICML’19 (first version 2017); longer version arXiv 2020
 I N. Golowich, H. Narasimhan, and D. C. Parkes: “Deep Learning for Multi-Facility Location Mechanism Design,” IJCAI’18
 I Z. Feng, H. Narasimhan, and D. C. Parkes: “Deep Learning for Revenue-Optimal Auctions with Budgets,” AAMAS’18
 I G. Brero, A. Eden, M. Gerstgrasser, D. C. Parkes, D. Rheingans-Yoo: “Reinforcement learning of simple indirect
   mechanisms”, AAAI’21
 I S. Zheng, A. Trott, S. Srinivasa, N. Naik, M. Gruesbeck, D. C. Parkes, R. Socher: “The AI Economist: Improving Equality
   and Productivity with AI-Driven Tax Policies”, arXiv 2020
 I Z. Feng, D. C. Parkes, S. S. Ravindranath: “Machine Learning for Matching Markets”, Online and Matching-Based
   Market Design, Echenique et al. (Eds) 2021
 I G. Brero, D. Chakrabarti, A. Eden, M. Gerstgrasser, V. Li, D. C. Parkes, “Learning Stackelberg Equilibria in Sequential
   Price Mechanisms”, ICML 2021 Workshop on Reinforcement Learning Theory

                                                                         Differentiable Economics       34 / 36

Additional References (1 of 2)
  I J. Rahme, S. Jelassi, J. Bruna, M. Weinberg, “A Permutation-Equivariant Neural Network Architecture For Auction
    Design” AAAI’21
  I M. J. Curry, P. Chiang, T. Goldstein, J. Dickerson, “Certifying Strategyproof Auction Networks” NeurIPS’20
  I S. Zheng, A. Trott, S. Srinivasa, N. Naik, M. Gruesbeck, D. C. Parkes, R. Socher, “The AI Economist: Improving Equality
    and Productivity with AI-Driven Tax Policies” arXiv 2004.13332, 2020
  I M. Bichler, M. Fichtl, S. Heidekrüger, N. Kohring, P. Sutterer, “Learning Equilibria in Symmetric Auction Games using
    Artificial Neural Networks,” Nature Machine Intelligence (forthcoming)
  I E. Mazumdar, L. J. Ratliff, S. S. Sastry, “On gradient-based learning in continuous games,” SIAM Journal on
    Mathematics of Data Science 2020.
  I K. Wang, L. Xu, A. Perrault, M. K. Reiter, M. Tambe, “Coordinating Followers to Reach Better Equilibria: End-to-End
    Gradient Descent for Stackelberg Games” arXiv 2021
  I J. Foerster, I. Assael, N. de Freitas, S. Whiteson, “Learning to communicate with deep multi-agent reinforcement
    learning” NeurIPS 2016
  I J. Hartline, V. Syrgkanis, E. Tardos, “No-regret learning in Bayesian games,” arXiv 2015
  I J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, “Proximal Policy Optimization Algorithms”, arXiv 2017
  I J. D. Hartline, B. Lucier, “Bayesian algorithmic mechanism design” STOC 2010
  I Y. Cai, C. Daskalakis, M. S. Weinberg, “Optimal multi-dimensional mechanism design: Reducing revenue to welfare
    maximization” FOCS’12
  I S. Alaei, H. Fu, N. Haghpanah, J. D. Hartline, A. Malekian, “Bayesian optimal auctions via multi- to single-agent
    reduction” EC’12
  I V. Conitzer, T. Sandholm, “Complexity of mechanism design” UAI’02

                                                                           Differentiable Economics       35 / 36

Additional References (2 of 2)
  I A. Manelli, D. Vincent, “Bundling as an optimal selling mechanism for a multiple-good monopolist” Journal of Economic
    Theory 2006.
  I A. C.-C. Yao, “Dominant-strategy versus bayesian multi-item auctions: Maximum revenue determination and
    comparison” EC’17
  I T. Palfrey, “Bundling decisions by a multiproduct monopolist with incomplete information,” Econometrica 1983.
  I Y. Giannakopoulos, E. Koutsoupias, “Duality and optimality of auctions for uniform distributions” SIAM Journal on
    Computing 2018
  I A. E. Roth, “The economics of matching: Stability and incentives” Mathematics of Operations Research 1982.
  I P. Dütting, Z. Feng, H. Narasimhan, D. C. Parkes, S. S. Ravindranath: “Optimal Auctions through Deep Learning,” arXiv
    1706.03459, 2020

                                                                           Differentiable Economics       36 / 36

You can also read