Beyond Counter-examples and the choice prediction competition

Page created by Cindy Flynn
 
CONTINUE READING
Beyond Counter-examples and the choice prediction competition
                      Ido Erev -- Technion
         Eyal Ert and Alvin E. Roth -- Harvard University

                Ernan Haruvy -- University of Texas
Stefan Herzog, Robin Hau, and Ralph Hertwig -- University of Basel
Terrence Stewart -- University of Waterloo, Robert West -- Carleton
  University, and Christian Lebiere -- Carnegie Mellon University
Mainstream experimental and behavioral economic research tends to focus
on counter-examples to rational decision theory.

The most influential papers present elegant counter-examples (e.g., the
Allais paradox, the framing effect, duration neglect) and simple models that
capture them.

This focus was very effective in establishing the importance of the behavioral
approach, but it has several limitations that impair the derivation of
quantitative predictions.
Counter-examples

The Allais/ certainty effect (from K&T,
1979, following Allais, 1953)
Problem 1:
S: 3000 with certainty
R: 4000 with p =0.80; 0 otherwise

Problem 2:
S: 3000 with p =0.25; 0 otherwise
R: 4000 with p =0.20; 0 otherwise

Buying insurance and lotteries
(from K&T, 1979)
Problem 3:
S: 5 with certainty
R: 5000 with p =1/1000; 0 otherwise

Problem 4:
S: -5 with certainty
R: -5000 with p =1/1000; 0 otherwise
Oversensitivity to rare events
Counter-examples, and counter to counter-examples

The Allais/ certainty effect (from K&T,   The reversed Allais/ certainty effect (from Barron
1979, following Allais, 1953)             & Erev, 2003)
Problem 1:                                Problem 1r:
S: 3000 with certainty                    S: 3 with certainty
R: 4000 with p =0.80; 0 otherwise         R: 4 with p =0.80; 0 otherwise

Problem 2:                                Problem 2r:
S: 3000 with p =0.25; 0 otherwise         S: 3 with p =0.25; 0 otherwise
R: 4000 with p =0.20; 0 otherwise         R: 4 with p =0.20; 0 otherwise

Buying insurance and lotteries            “It wont happen to me”
(from K&T, 1979)                          (from Barron & Erev, 2003)
Problem 3:                                Problem 3r:
S: 5 with certainty                       S: 3 with certainty
R: 5000 with p =1/1000; 0 otherwise       R: 32 with p =1/10; 0 otherwise

Problem 4:                                Problem 4r:
S: -5 with certainty                      S: -3 with certainty
R: -5000 with p =1/1000; 0 otherwise      R: -32 with p =1/10; 0 otherwise
Oversensitivity to rare events            Under-sensitivity to rare events
More counter-examples.                    More counter to counter examples.

2. Oversensitivity to losses              2r. Under-sensitivity to losses
The status quo effect                     The winner’s curse
Underinvestment in stocks                 Over investment in individual stocks
Loss Aversion:                            No Loss Aversion: (Ert & Erev 2008)

Imagine the you have the opportunity to   Please choose:
play a gamble that offers a 50% chance    a. 0 for sure     b. $2000 with p of 0.5
of winning $2000 and 50% chance of                             -$500 with p of 0.5
losing $500 would you play the gamble?”

45-55% Reject the offer.                  78% select the gamble

3. Oversensitivity to others              3r. Under-sensitivity to others
Cooperation in Prisoner dilemma           The mythical fixed pie syndrome
Rejections in the ultimatum game
Another limitation involves the possibility that the clearest counter-
examples may not reflect of the most important behavioral regularities. It is
possible that the most important behavioral regularities emerge in situations
in which rational economic theory cannot be decisively violated.

For example, the Allais paradox suggests (can be captured with a model that
assumes) overweighting of extreme rare events. This observation violates
EUT, but it is possible that the tendency to behave as if "it wont happen to
me“ is more important.
A third limitation involves the fact that the understanding of deviations from
rational choice is not always sufficient to derive quantitative predictions of behavior.

For example, rational decision theory does not provide clear predictions of
decisions when the decision makers can rely on personal experience.
Almost any decision can be rational depending on the decision maker’s beliefs.

The current research tries to address these limitations by extending the study of
the Allais paradox and similar problems along two dimensions: the parameters that
determine the incentive structure, and the source of the available information
(description or experience).
We believe that previous attempts to advance beyond counter-examples have been
slowed by a problematic incentive structure: The evaluation of quantitative
predictions tends to be more expensive and less interesting than the study of
counter-examples.

This problem is addressed here by the organization of three open choice prediction
competitions that can reduce the cost and increase the likelihood of interesting
outcomes.

Erev, Ert & Roth (EER) ran the necessary boring studies, and challenged other
researchers to predict the results.

All three competitions focus on binary choices of the type

Safe: M with certainty
Risk: H with probability Ph; L otherwise (with probability 1-Ph)
Condition Description: This experiment includes several games. In each game
you will be asked to select one of two alternatives.
           At the end of the experiment one of the games will be randomly drawn (all the
games are equally likely to be drawn), and the alternative selected in this game will be
realized. Your payoff for the experiment will be the outcome (in Sheqels) of this game.
Condition Experience-Sampling: This experiment includes several games.                      Each
game includes two stages: The sampling stage and the choice stage.
            At the choice stage (the second stage) you will be asked to select once between two
virtual decks cards (two buttons). Your choice will lead to a random draw of one card from this deck,
and the number written on the card will be the "game's outcome."
            During the sampling stage (the first stage) you will be able to sample the two
decks. When you feel that you have sampled enough press the "choice stage" key to move to the
choice stage.
            At the end of the experiment one of the games will be randomly drawn (all the games
are equally likely to be drawn). Your payoff for the experiment will be the outcome (in Sheqels) of this
game.
Condition Experience-Repeated: This experiment includes several
games. Each game includes several trials. You will receive a message before the
beginning of each game.
            In each trial you will be asked to select one of two buttons. Each press will
result with a payoff that will be presented on the selected button.
            At the end of the experiment one of the trials will be randomly drawn (all the
trials are equally likely to be drawn). Your payoff for the experiment will be the outcome
(in Sheqels) of this trial.
Two studies: Estimation and Competition

The estimation study was run in March 2008. 60 randomly selected problems.
Payoff between -30 to +30, rare events (Ph< .1 or Ph>.9) in about 2/3 of the
problems.

Each problem was played by 20 subjects.

The subjects received 30 sheqels plus the outcome of one
(randomly selected) trial.

EER posted the results on April 2008 with baseline models (popular models and
the best model that we could find) and challenged other researchers to submit
models to predict the results of a second study (the competition study)

To participate researchers had to implement their model in a computer program
that reads the input (the parameters of each of the problems: M, H, Ph and L) and
provides the predicted Proportion of Risky Choices (R-rate) as an output.
The ranking criterion was the Mean Squared Deviation (MSD) between the
observed and the predicted R-rate in the Competition set.

To clarify the interpretation we focus on the ENO (Equivalent Number of
Observations) order-maintaining transformation of the MSD scores (see Erev et
al., 2007).
Results:
The raw data highlight high correspondence (correlation above 0.8) between the
two experience conditions, and negative correlation between these conditions and
the description condition.                            R-rate
                 Risky            safe   Description   Sampling    Repeated

           3.3 with p =0.91;      2.7       25%           65%         60%
            -3.5 otherwise.
            2 with p = 0.1;       -4.6      65%          20%          11%
            -5.7 otherwise.

   Analysis of this difference
   reveals that it is driven by
   the effect of rare events
   (see Barron & Erev, 2003;
   Hertwig et al., 2004).
The submissions

23 models were submitted.
8 to the description condition,
7 to the E-sampling condition
and 8 to the E-repeated condition.

The submitted models involved a large span of methods ranging from logistic
regression, ACT-R, neural networks, and basic mathematical models.
Description

                                              Fitness scores based on the    Prediction scores based on the
                                               estimation set (S2=.1860)             competition set
                                                                                       (S2=.1636)
    Title      Team and idea   Parameters     Pagree     Corr      MSD      Pagree    Corr     MSD      ENO
                                α=β=.88,
 Interesting
                   CPT           λ=2.25,       91%       0.84      0.099     93%      0.87   0.0787     2.32
  baselines
                               γ=.61,δ=.69
                  Priority        s=0.1        91%       0.76     0.1158     81%      0.65   0.1437     1.21

    Best        SCPT with      α=.89,β=.98     89%       0.92     0.0116     95%      0.95   0.0102     80.99
  baseline     normalization          ,
                               λ=1.5,μ=2.1
                                 5 γ=δ=.7
  Winner          Haruvy:      β0=1,β1=.0      88%       0.92     0.0099     90%      0.94   0.0126     56.36
                  logistic           1,
                regression     β2=.07,β3=.
                                    41,
                               γ1=1.42,γ2
                                =.32 , γ3=-
                                   .621
The logistic choice model

Motivation: PT was suggested to capture counter example whereas regression
models are commonly used in application.

The tendency to prefer the risky prospect:

T(R) = β0 + β1*H + β2*L + β3*M + γ1*Ph + γ2*EV(R) + γ3*(Dummy1)

The values H, L, M and Ph are the parameters of the choice problem as defined
above. EV(R) is the expected payoff of the risky prospect, and Dummy1 is a
dummy variable that assumes the value 1 if the risky choice has higher expected
value than the safe choice and 0 otherwise.

A logistic choice rule:
             1
P(R) =
         1 + e −T ( R )
E- Sampling

                                        Fitness scores based on the      Prediction scores based on the
                                               estimation set                    competition set
                                                 (S2=.2023)                        (S2 =.2111)

              Team and      Paramater
    Title                               Pagree       Corr      MSD     Pagree     Corr     MSD       ENO
                idea            s
               Primed
    Best
             sampler with     k=9        95%         0.88      0.017    82%       0.8     0.0244     15.23
  baseline
              variability
  Winner     Herzog, Hau,   α=1.19,β=    95%         0.92     0.0099    83%       0.8     0.0187     25.92
               Hertwig.     1.35
             ENSEMBLE       γ=1.42,δ=
                Linear      1.54
             Combination    λ=1.19;
                            μ=.41

 Runner up   Ann, Picard:   x=2.07,y=    92%         0.9      0.0115    82%       0.82    0.0203     21.66
              Sample by     1.31
               CPT and      z=0.71,v=
              aspiration    7.53
                levels      r=12.64,m
                            =0.02
The Ensemble model

Motivation:
- People use different decision rules.
- Reduction of error by aggregation of prediction -“wisdom of (models ) crowd”

Aggregate 4 models:

1) 2 versions of primed sampler (small samples are considered – higher mean is
   selected)
2) CPT (with a “reversed” weighting function).
3) Priority Rule (compare L’s then Ph’s then H’s, and a cutoff rule of 0.1 in each
   comparison for the stopping rule).
E- Repeated

                                                Fitness scores based on the     Prediction scores based on the
                                                 estimation set(S2 = .0875)      competition set (S2 = .0928)

                                    Paramete
   Title        Team and idea                Pagree      Corr      MSD        Pagree   Corr     MSD       ENO
                                       rs
                   Normative          w=.15,
Interesting
                Reinforcement         λ=1.1   76%         .83     0.0092       84%      .84    0.0087     22.89
 Baselines
                    Learning
              Basic Reinforcement    w=.15,
                                      λ=1        56%      .67     0.0224       66%      .51    0.0263     4.28
                    Learning
                                      β=.10
   Best       Explorative sampler
                                      ε=.12,     82%     0.88     0.0075       86%     0.89    0.0066     47.22
 baseline        with recency
                                       k=8
 Winner         Stewart, West, &     s = .35,    77%     0.88     0.0094       87%     0.89    0.0075     32.50
                    Lebiere:         τ= -1.6
                 ACT – R with
                   sequential
               dependencies and
               blending memory
ACT - R

Motivation:
- “Atomic Components of Thought”: e.g., declarative/procedural memory.
- ACT-R was useful in other learning tasks (skill acquisition, casual and
category learning, and others)..

Declarative Memory with sequential dependencies.
Experience is coded to chunk that includes the context, choice, and the obtained
outcome. Context = two previous consecutive choices. Recall is based on
Activation level and a cutoff parameter.
The activation level of experience i :
(1)

 where tk is the amount of time since the kth appearance of this item, d is the decay rate,
and ε(s) is a random value chosen from a logistic distribution with variance π2s2/3.
General observations:

1) Models were very successful in predicting data (high ENO).
2) Small variants of existing models can make them much better.
3) It seems that description and experience based decisions are
     qualitatively different:
a. Hard to capture them under the same model.
b. The models relied on different assumptions.
      - Description is involved with weighting the described information.
     - Experienced based decisions are captured with reliance on small
          samples in short term memory.
“Asking the Right Question:” Simple vs.
Probabilistic Polls in Predicting Others Behavior

Alternative ways of predicting data rely on the “wisdom of crowds” for
making predictions (e.g., various polls, information markets).

1) How much accuracy can we get with people’s intuition?

2) What type of polls should we apply?
Predicting Others Behavior: Study1

 “This experiment includes several games that were played originally by 20 participants
 (referred to as the “former participants”) on May 2008.”

 Step 1:
 In the current experiment you will be presented with the games that were played by the
 former participants. For each game you will be asked to select the alternative that you
 think most former participants preferred.

 Step 2:
 Right afterwards you’ll be asked to estimate the proportion of the 20 former participants
 that chose alternative R in that game. Your estimate should be between 0 and 1. For
 example, if you think that 10 of the 20 former participants selected R -- choose "0.5."
Predicting Others Behavior: Results

                                               Fitness scores based on the    Prediction scores based on the
                                                estimation set (S2=.1860)             competition set
                                                                                        (S2=.1636)
    Title      Team and idea    Parameters     Pagree     Corr      MSD      Pagree    Corr     MSD      ENO
                                 α=β=.88,
 Interesting
                   CPT            λ=2.25,       91%       0.84      0.099     93%      0.87   0.0787     2.32
  baselines
                                γ=.61,δ=.69
                  Priority         s=0.1        91%       0.76     0.1158     81%      0.65   0.1437     1.21

    Best        SCPT with       α=.89,β=.98     89%       0.92     0.0116     95%      0.95   0.0102     80.99
  baseline     normalization           ,
                                λ=1.5,μ=2.1
                                  5 γ=δ=.7
  Winner           Haruvy:      β0=1,β1=.0      88%       0.92     0.0099     90%      0.94   0.0126     56.36
                   logistic           1,
                 regression     β2=.07,β3=.
                                     41,
                                γ1=1.42,γ2
                                 =.32 , γ3=-
                                    .621
               Choice of new                                                           .94    0.0131     32.61
                 students
                  Intuition                                                            .86     0.1149    1.88
               (Probabilistic
                    Poll)
Does it make sense?
Theoretically probability polls should give us more information as each individual
tell us not only her choice but also her confidence in that choice.

So why choices were so much better than probability estimates in
predicting behavioral data?

 1.   People have different biases when making predictions (e.g., projection bias,
      overconfidence). If these are systematic it might be better to look at their
      behavior instead.
 2.   It might be much harder to think reasonably about probabilities

 These hypotheses could be easily differentiated. Lets run
 simple polls instead of letting them choose and see what
 happens…
Predicting Others Behavior: Study2

 Similar to Study 1 only this time subjects were told:

 “This experiment includes several games that were played originally by 20 participants
 (referred to as the “former participants”) on May 2008.
 In each game, each participant had to choose one of two alternatives (called S and R).
 At the end of the original experiment one game was randomly selected played, and the
 participant’s payoff was realized according to their choice in the selected game.

 In the current experiment you will be presented with the games that were played by the
 former participants. For each game you will be asked to select the alternative that you
 think most former participants preferred…”

 The second stage (estimating the proportion who chose R) was identical to study1.
Predicting Others Behavior: Study2

 Similar to Study 1 only this time subjects were told:

 “This experiment includes several games that were played originally by 20 participants
 (referred to as the “former participants”) on May 2008.
 In each game, each participant had to choose one of two alternatives (called S and R).
 At the end of the original experiment one game was randomly selected played, and the
 participant’s payoff was realized according to their choice in the selected game.

 In the current experiment you will be presented with the games that were played by the
 former participants. For each game you will be asked to select the alternative that you
 think most former participants preferred…”

 The second stage (estimating the proportion who chose R) was identical to study1.
Predicting Others Behavior: Results

                                      Prediction scores based on the competition set
                                                        (S2=.1636)

     Title        Team and idea                      Corr         MSD          ENO

                                                                 0.0125        37.69
    Study2         Simple Poll                        .94

                 Probabilistic Poll                  .875        0.0598         3.16

    Study1            Choice                          .94        0.0131        32.61

                 Probabilistic Poll                   .86        0.1149         1.88

   - Choice and Simple polls look the same.
   - Probabilistic polls are consistently behind in terms of predictions.
Explaining the gap between probability and simple polls:

     Results
         1

        0.9

        0.8

        0.7

        0.6

        0.5                                                                 Simple
                                                                            Proportion
        0.4

        0.3

        0.2

        0.1

         0
              0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1   Predictions
Predicting Others Behavior: Discussion

 Counter-intuitively it seems that simple polls are better than probabilistic polls at least
 in the current settings.

 But reasonable in retrospect: it seems much harder to think of a probability than think
 whether the event occur (not just for psychologists ).

 The noisier response with probability estimates can facilitate regression to the mean
 effect. Easily captures by an error response model:
 estimate = belief + error (see Erev et al., 1994).

 But what are exactly the “current settings?”

 -Situations where people can imagine themselves making decisions?
 -Situations where people predict behavior or also events (e.g., will it rain tomorrow?)
 -Binary/multiple choice?
 -Others?
Summary

Predicting data is important both for theory as well as for applied research.

Theory:
Quantitative models can be wrong but seem more useful than “not even wrong”
assumptions, as at least they clarify the boundaries of assumed behavioral regularities.

The usefulness of models can be clearer if we move beyond capturing cute counter
examples.

Application:
Three main methods of predicting data: models (our intuition and/or data), polls, and
information markets (crowd intuition).

Clarifying the relation between the different methods can be important to derive more
useful predictions.
You can also read