Beyond Counter-examples and the choice prediction competition
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Beyond Counter-examples and the choice prediction competition Ido Erev -- Technion Eyal Ert and Alvin E. Roth -- Harvard University Ernan Haruvy -- University of Texas Stefan Herzog, Robin Hau, and Ralph Hertwig -- University of Basel Terrence Stewart -- University of Waterloo, Robert West -- Carleton University, and Christian Lebiere -- Carnegie Mellon University
Mainstream experimental and behavioral economic research tends to focus on counter-examples to rational decision theory. The most influential papers present elegant counter-examples (e.g., the Allais paradox, the framing effect, duration neglect) and simple models that capture them. This focus was very effective in establishing the importance of the behavioral approach, but it has several limitations that impair the derivation of quantitative predictions.
Counter-examples The Allais/ certainty effect (from K&T, 1979, following Allais, 1953) Problem 1: S: 3000 with certainty R: 4000 with p =0.80; 0 otherwise Problem 2: S: 3000 with p =0.25; 0 otherwise R: 4000 with p =0.20; 0 otherwise Buying insurance and lotteries (from K&T, 1979) Problem 3: S: 5 with certainty R: 5000 with p =1/1000; 0 otherwise Problem 4: S: -5 with certainty R: -5000 with p =1/1000; 0 otherwise Oversensitivity to rare events
Counter-examples, and counter to counter-examples The Allais/ certainty effect (from K&T, The reversed Allais/ certainty effect (from Barron 1979, following Allais, 1953) & Erev, 2003) Problem 1: Problem 1r: S: 3000 with certainty S: 3 with certainty R: 4000 with p =0.80; 0 otherwise R: 4 with p =0.80; 0 otherwise Problem 2: Problem 2r: S: 3000 with p =0.25; 0 otherwise S: 3 with p =0.25; 0 otherwise R: 4000 with p =0.20; 0 otherwise R: 4 with p =0.20; 0 otherwise Buying insurance and lotteries “It wont happen to me” (from K&T, 1979) (from Barron & Erev, 2003) Problem 3: Problem 3r: S: 5 with certainty S: 3 with certainty R: 5000 with p =1/1000; 0 otherwise R: 32 with p =1/10; 0 otherwise Problem 4: Problem 4r: S: -5 with certainty S: -3 with certainty R: -5000 with p =1/1000; 0 otherwise R: -32 with p =1/10; 0 otherwise Oversensitivity to rare events Under-sensitivity to rare events
More counter-examples. More counter to counter examples. 2. Oversensitivity to losses 2r. Under-sensitivity to losses The status quo effect The winner’s curse Underinvestment in stocks Over investment in individual stocks Loss Aversion: No Loss Aversion: (Ert & Erev 2008) Imagine the you have the opportunity to Please choose: play a gamble that offers a 50% chance a. 0 for sure b. $2000 with p of 0.5 of winning $2000 and 50% chance of -$500 with p of 0.5 losing $500 would you play the gamble?” 45-55% Reject the offer. 78% select the gamble 3. Oversensitivity to others 3r. Under-sensitivity to others Cooperation in Prisoner dilemma The mythical fixed pie syndrome Rejections in the ultimatum game
Another limitation involves the possibility that the clearest counter- examples may not reflect of the most important behavioral regularities. It is possible that the most important behavioral regularities emerge in situations in which rational economic theory cannot be decisively violated. For example, the Allais paradox suggests (can be captured with a model that assumes) overweighting of extreme rare events. This observation violates EUT, but it is possible that the tendency to behave as if "it wont happen to me“ is more important.
A third limitation involves the fact that the understanding of deviations from rational choice is not always sufficient to derive quantitative predictions of behavior. For example, rational decision theory does not provide clear predictions of decisions when the decision makers can rely on personal experience. Almost any decision can be rational depending on the decision maker’s beliefs. The current research tries to address these limitations by extending the study of the Allais paradox and similar problems along two dimensions: the parameters that determine the incentive structure, and the source of the available information (description or experience).
We believe that previous attempts to advance beyond counter-examples have been slowed by a problematic incentive structure: The evaluation of quantitative predictions tends to be more expensive and less interesting than the study of counter-examples. This problem is addressed here by the organization of three open choice prediction competitions that can reduce the cost and increase the likelihood of interesting outcomes. Erev, Ert & Roth (EER) ran the necessary boring studies, and challenged other researchers to predict the results. All three competitions focus on binary choices of the type Safe: M with certainty Risk: H with probability Ph; L otherwise (with probability 1-Ph)
Condition Description: This experiment includes several games. In each game you will be asked to select one of two alternatives. At the end of the experiment one of the games will be randomly drawn (all the games are equally likely to be drawn), and the alternative selected in this game will be realized. Your payoff for the experiment will be the outcome (in Sheqels) of this game.
Condition Experience-Sampling: This experiment includes several games. Each game includes two stages: The sampling stage and the choice stage. At the choice stage (the second stage) you will be asked to select once between two virtual decks cards (two buttons). Your choice will lead to a random draw of one card from this deck, and the number written on the card will be the "game's outcome." During the sampling stage (the first stage) you will be able to sample the two decks. When you feel that you have sampled enough press the "choice stage" key to move to the choice stage. At the end of the experiment one of the games will be randomly drawn (all the games are equally likely to be drawn). Your payoff for the experiment will be the outcome (in Sheqels) of this game.
Condition Experience-Repeated: This experiment includes several games. Each game includes several trials. You will receive a message before the beginning of each game. In each trial you will be asked to select one of two buttons. Each press will result with a payoff that will be presented on the selected button. At the end of the experiment one of the trials will be randomly drawn (all the trials are equally likely to be drawn). Your payoff for the experiment will be the outcome (in Sheqels) of this trial.
Two studies: Estimation and Competition The estimation study was run in March 2008. 60 randomly selected problems. Payoff between -30 to +30, rare events (Ph< .1 or Ph>.9) in about 2/3 of the problems. Each problem was played by 20 subjects. The subjects received 30 sheqels plus the outcome of one (randomly selected) trial. EER posted the results on April 2008 with baseline models (popular models and the best model that we could find) and challenged other researchers to submit models to predict the results of a second study (the competition study) To participate researchers had to implement their model in a computer program that reads the input (the parameters of each of the problems: M, H, Ph and L) and provides the predicted Proportion of Risky Choices (R-rate) as an output.
The ranking criterion was the Mean Squared Deviation (MSD) between the observed and the predicted R-rate in the Competition set. To clarify the interpretation we focus on the ENO (Equivalent Number of Observations) order-maintaining transformation of the MSD scores (see Erev et al., 2007).
Results: The raw data highlight high correspondence (correlation above 0.8) between the two experience conditions, and negative correlation between these conditions and the description condition. R-rate Risky safe Description Sampling Repeated 3.3 with p =0.91; 2.7 25% 65% 60% -3.5 otherwise. 2 with p = 0.1; -4.6 65% 20% 11% -5.7 otherwise. Analysis of this difference reveals that it is driven by the effect of rare events (see Barron & Erev, 2003; Hertwig et al., 2004).
The submissions 23 models were submitted. 8 to the description condition, 7 to the E-sampling condition and 8 to the E-repeated condition. The submitted models involved a large span of methods ranging from logistic regression, ACT-R, neural networks, and basic mathematical models.
Description Fitness scores based on the Prediction scores based on the estimation set (S2=.1860) competition set (S2=.1636) Title Team and idea Parameters Pagree Corr MSD Pagree Corr MSD ENO α=β=.88, Interesting CPT λ=2.25, 91% 0.84 0.099 93% 0.87 0.0787 2.32 baselines γ=.61,δ=.69 Priority s=0.1 91% 0.76 0.1158 81% 0.65 0.1437 1.21 Best SCPT with α=.89,β=.98 89% 0.92 0.0116 95% 0.95 0.0102 80.99 baseline normalization , λ=1.5,μ=2.1 5 γ=δ=.7 Winner Haruvy: β0=1,β1=.0 88% 0.92 0.0099 90% 0.94 0.0126 56.36 logistic 1, regression β2=.07,β3=. 41, γ1=1.42,γ2 =.32 , γ3=- .621
The logistic choice model Motivation: PT was suggested to capture counter example whereas regression models are commonly used in application. The tendency to prefer the risky prospect: T(R) = β0 + β1*H + β2*L + β3*M + γ1*Ph + γ2*EV(R) + γ3*(Dummy1) The values H, L, M and Ph are the parameters of the choice problem as defined above. EV(R) is the expected payoff of the risky prospect, and Dummy1 is a dummy variable that assumes the value 1 if the risky choice has higher expected value than the safe choice and 0 otherwise. A logistic choice rule: 1 P(R) = 1 + e −T ( R )
E- Sampling Fitness scores based on the Prediction scores based on the estimation set competition set (S2=.2023) (S2 =.2111) Team and Paramater Title Pagree Corr MSD Pagree Corr MSD ENO idea s Primed Best sampler with k=9 95% 0.88 0.017 82% 0.8 0.0244 15.23 baseline variability Winner Herzog, Hau, α=1.19,β= 95% 0.92 0.0099 83% 0.8 0.0187 25.92 Hertwig. 1.35 ENSEMBLE γ=1.42,δ= Linear 1.54 Combination λ=1.19; μ=.41 Runner up Ann, Picard: x=2.07,y= 92% 0.9 0.0115 82% 0.82 0.0203 21.66 Sample by 1.31 CPT and z=0.71,v= aspiration 7.53 levels r=12.64,m =0.02
The Ensemble model Motivation: - People use different decision rules. - Reduction of error by aggregation of prediction -“wisdom of (models ) crowd” Aggregate 4 models: 1) 2 versions of primed sampler (small samples are considered – higher mean is selected) 2) CPT (with a “reversed” weighting function). 3) Priority Rule (compare L’s then Ph’s then H’s, and a cutoff rule of 0.1 in each comparison for the stopping rule).
E- Repeated Fitness scores based on the Prediction scores based on the estimation set(S2 = .0875) competition set (S2 = .0928) Paramete Title Team and idea Pagree Corr MSD Pagree Corr MSD ENO rs Normative w=.15, Interesting Reinforcement λ=1.1 76% .83 0.0092 84% .84 0.0087 22.89 Baselines Learning Basic Reinforcement w=.15, λ=1 56% .67 0.0224 66% .51 0.0263 4.28 Learning β=.10 Best Explorative sampler ε=.12, 82% 0.88 0.0075 86% 0.89 0.0066 47.22 baseline with recency k=8 Winner Stewart, West, & s = .35, 77% 0.88 0.0094 87% 0.89 0.0075 32.50 Lebiere: τ= -1.6 ACT – R with sequential dependencies and blending memory
ACT - R Motivation: - “Atomic Components of Thought”: e.g., declarative/procedural memory. - ACT-R was useful in other learning tasks (skill acquisition, casual and category learning, and others).. Declarative Memory with sequential dependencies. Experience is coded to chunk that includes the context, choice, and the obtained outcome. Context = two previous consecutive choices. Recall is based on Activation level and a cutoff parameter. The activation level of experience i : (1) where tk is the amount of time since the kth appearance of this item, d is the decay rate, and ε(s) is a random value chosen from a logistic distribution with variance π2s2/3.
General observations: 1) Models were very successful in predicting data (high ENO). 2) Small variants of existing models can make them much better. 3) It seems that description and experience based decisions are qualitatively different: a. Hard to capture them under the same model. b. The models relied on different assumptions. - Description is involved with weighting the described information. - Experienced based decisions are captured with reliance on small samples in short term memory.
“Asking the Right Question:” Simple vs. Probabilistic Polls in Predicting Others Behavior Alternative ways of predicting data rely on the “wisdom of crowds” for making predictions (e.g., various polls, information markets). 1) How much accuracy can we get with people’s intuition? 2) What type of polls should we apply?
Predicting Others Behavior: Study1 “This experiment includes several games that were played originally by 20 participants (referred to as the “former participants”) on May 2008.” Step 1: In the current experiment you will be presented with the games that were played by the former participants. For each game you will be asked to select the alternative that you think most former participants preferred. Step 2: Right afterwards you’ll be asked to estimate the proportion of the 20 former participants that chose alternative R in that game. Your estimate should be between 0 and 1. For example, if you think that 10 of the 20 former participants selected R -- choose "0.5."
Predicting Others Behavior: Results Fitness scores based on the Prediction scores based on the estimation set (S2=.1860) competition set (S2=.1636) Title Team and idea Parameters Pagree Corr MSD Pagree Corr MSD ENO α=β=.88, Interesting CPT λ=2.25, 91% 0.84 0.099 93% 0.87 0.0787 2.32 baselines γ=.61,δ=.69 Priority s=0.1 91% 0.76 0.1158 81% 0.65 0.1437 1.21 Best SCPT with α=.89,β=.98 89% 0.92 0.0116 95% 0.95 0.0102 80.99 baseline normalization , λ=1.5,μ=2.1 5 γ=δ=.7 Winner Haruvy: β0=1,β1=.0 88% 0.92 0.0099 90% 0.94 0.0126 56.36 logistic 1, regression β2=.07,β3=. 41, γ1=1.42,γ2 =.32 , γ3=- .621 Choice of new .94 0.0131 32.61 students Intuition .86 0.1149 1.88 (Probabilistic Poll)
Does it make sense? Theoretically probability polls should give us more information as each individual tell us not only her choice but also her confidence in that choice. So why choices were so much better than probability estimates in predicting behavioral data? 1. People have different biases when making predictions (e.g., projection bias, overconfidence). If these are systematic it might be better to look at their behavior instead. 2. It might be much harder to think reasonably about probabilities These hypotheses could be easily differentiated. Lets run simple polls instead of letting them choose and see what happens…
Predicting Others Behavior: Study2 Similar to Study 1 only this time subjects were told: “This experiment includes several games that were played originally by 20 participants (referred to as the “former participants”) on May 2008. In each game, each participant had to choose one of two alternatives (called S and R). At the end of the original experiment one game was randomly selected played, and the participant’s payoff was realized according to their choice in the selected game. In the current experiment you will be presented with the games that were played by the former participants. For each game you will be asked to select the alternative that you think most former participants preferred…” The second stage (estimating the proportion who chose R) was identical to study1.
Predicting Others Behavior: Study2 Similar to Study 1 only this time subjects were told: “This experiment includes several games that were played originally by 20 participants (referred to as the “former participants”) on May 2008. In each game, each participant had to choose one of two alternatives (called S and R). At the end of the original experiment one game was randomly selected played, and the participant’s payoff was realized according to their choice in the selected game. In the current experiment you will be presented with the games that were played by the former participants. For each game you will be asked to select the alternative that you think most former participants preferred…” The second stage (estimating the proportion who chose R) was identical to study1.
Predicting Others Behavior: Results Prediction scores based on the competition set (S2=.1636) Title Team and idea Corr MSD ENO 0.0125 37.69 Study2 Simple Poll .94 Probabilistic Poll .875 0.0598 3.16 Study1 Choice .94 0.0131 32.61 Probabilistic Poll .86 0.1149 1.88 - Choice and Simple polls look the same. - Probabilistic polls are consistently behind in terms of predictions.
Explaining the gap between probability and simple polls: Results 1 0.9 0.8 0.7 0.6 0.5 Simple Proportion 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Predictions
Predicting Others Behavior: Discussion Counter-intuitively it seems that simple polls are better than probabilistic polls at least in the current settings. But reasonable in retrospect: it seems much harder to think of a probability than think whether the event occur (not just for psychologists ). The noisier response with probability estimates can facilitate regression to the mean effect. Easily captures by an error response model: estimate = belief + error (see Erev et al., 1994). But what are exactly the “current settings?” -Situations where people can imagine themselves making decisions? -Situations where people predict behavior or also events (e.g., will it rain tomorrow?) -Binary/multiple choice? -Others?
Summary Predicting data is important both for theory as well as for applied research. Theory: Quantitative models can be wrong but seem more useful than “not even wrong” assumptions, as at least they clarify the boundaries of assumed behavioral regularities. The usefulness of models can be clearer if we move beyond capturing cute counter examples. Application: Three main methods of predicting data: models (our intuition and/or data), polls, and information markets (crowd intuition). Clarifying the relation between the different methods can be important to derive more useful predictions.
You can also read