Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League

Page created by Gregory Gibbs
 
CONTINUE READING
AMMARWRIGHT
Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the
                                                      NBA League

Comparing the Impact of Star Rookies Carmelo
 Anthony and Lebron James: An Example on
 Simulating Team Performances in the NBA
                 League
                                                  Salwa Ammar
                                                  Ronald Wright
                                         Department of Business Administration
                                                  Le Moyne College
                                                 Syracuse, New York
                                               Ammar@lemoyne.edu
                                               Wright@lemoyne.edu

Abstract
This paper describes a simulation exercise designed for introductory quantitative method classes both at the
MBA and undergraduate level. The exercise tracks the performances of teams in the National Basketball Asso-
ciation (NBA) during the season of 2004. It is designed as a spreadsheet model and is developed in stages
throughout the academic semester. The example is a significant illustration of the use of sports as a vehicle for
teaching OR topics, specifically simulation. It also incorporates many spreadsheet modeling skills such as the
use of Excel functions and Data Tables. The model provided a good mix of the ingredients for an effective
simulation example including a variety of answers, challenges and surprises.

Editor's note: This is a pdf copy of an html document which resides at http://ite.pubs.informs.org/Vo5No1/
AmmarWright/

1. Introduction                                                 the number one draft in the NBA for 2003 and gener-
                                                                ated much media attention. The second rookie,
For years, sports have been used in teaching Statistics         Carmelo Anthony, led his college team for Syracuse
(Lock, 1997 and Nettleton, 1998). More recently, sports         University to its first national championship in basket-
have been used as a motivating context for introducing          ball. The championship was by far the most important
simulation (Ammar and Wright, 2001). The probabilis-            sporting result for the central New York region in re-
tic nature of the competitive outcomes, the desire to           cent years. Thus following the immediate progress of
predict these outcomes, and the variety among the               this local 'hero' guaranteed for us the broad student
relationships between the outcomes, provide a rich              interest. Student interest and curiosity were essential
array of applications for simulation models. Excel              in sustaining their enthusiasm as we explored various
spreadsheets give instructors the ability to move away          modeling tools and concepts for the simulation model.
from 'toy' examples and introduce, in a manageable              Although specific to these rookies and their draft
format, real and meaningful illustrations (Evans, 2000).        teams, the exercise can easily be generalized and up-
The most effective sports related examples capitalize           dated to predict performances of any teams in the
on current events of general interest that stimulate            NBA.
curiosity and inquisitiveness beyond those who may
be considered diehard fans.                                     In this paper we demonstrate the details of the simu-
                                                                lation model using Carmelo Anthony's team, the
This paper describes an exercise that predicts the im-          Denver Nuggets. In the previous year and prior to his
pact of two rookies in the National Basketball Associ-          draft the Nuggets won only 17 out of the season's 82
ation (NBA). One of the rookies, LeBron James, was              games. By the all − star game of 2004 and with the help
INFORMS Transactions on Education 5:1(67-74)               67                              © INFORMS ISSN: 1532-0545
AMMARWRIGHT
Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the
                                                      NBA League
of Anthony, the Nuggets had already won 31 games                  2. Denver Simulation Model
and seem to be well on their way to the playoffs. The
example described in this paper joins the league at the           The model involves an attempt to simulate the remain-
point of the all star game (after 55 games) and includes          ing 27 games for the Denver Nuggets. Once the out-
a Monte − Carlo simulation for the performance of the             come of these games is simulated the next step is to
Nuggets in the remaining games of the season. In de-              assess the Nuggets' chances of making the playoffs.
termining the playoff chances of the Nuggets, the                 This process is designed in several stages. The first
simulation includes the performances of the team's                stage is that of estimating the winning probabilities
nearest competitors. Similar assessments are also per-            (percentage) by team in the league. It is important to
formed for the Cleveland Cavaliers, the team that                 recognize that these probabilities vary for each team
drafted LeBron James.                                             depending on whether the game is played at home or
                                                                  away. The second stage is to simulate the number of
The exercise is designed to help achieve several objec-           Denver wins for the remaining scheduled season based
tives. To run a successful simulation the first step tends        on the estimated probabilities. Finally in assessing
to focus on defining the relevant probabilities. The              Denver's chances of reaching the playoffs, the model
example introduces a method for estimating probabil-              examines Denver's nearest (slightly above or slightly
ities that is intuitively acceptable and follows the rules        below) competitors by simulating each of their perfor-
of probability. In this paper the process of estimating           mances and comparing Denver's performance.
the probabilities is simple (by design) in order to avoid
the need for extensive discussion and coverage of ad-             2.1. Estimating Winning Probabilities
vance topics in probability theory. Another objective
of this example is to introduce students to the simula-           The probabilities could be estimated by using informa-
tion capabilities of Excel including the use of data ta-          tion on teams' performances in the previously played
bles to replicate observations. Also, this example intro-         games. In the first 55 games of the season, Denver's
duces the concept of simulating events that are related.          winning percent was .582. (Note: percent is typically
For example simulating the outcome of the Seattle at              referred to on sport pages as a number between 0 and
Denver game determines the outcome of the Denver                  1. For convenience we maintain this convention for
home game as well as the outcome of the Seattle away              the data in this paper). We could use .582 as the prob-
game. Another important objective of the example is               ability of winning each game. However, in the NBA
to demonstrate for students how simulation can be                 there is a significant difference between winning per-
used to provide answers to questions beyond the                   cents for home games and for games on the road. Table
specific simulated events. Simulating the outcomes of             1 shows Denver's home and road winning percentages
games for the season allows us to explore the chances             as well as the average of the entire league. It also shows
of the team returning for the playoffs. The final part            the same percentages for two select teams (for purpose
of this paper focuses on the process of assessing and             of illustration).
updating the estimated probabilities as the season
progresses and games are won and lost.
                                                                            Table 1: Select Winning Percentages
This example is used in our management science class
to introduce concepts and basic skills in spreadsheet
simulation. This is a core junior level class required of
students majoring in business and accounting. Stu-
dents entering this class are expected to have complet-
ed the introductory statistics requirement. The simula-
tion is done entirely in Excel. Large numbers of trials
are executed using data tables and other useful func-
tions in Excel. The use of add − ins such as Crystal Ball
and @Risk is introduced at the higher (or senior) level           Since Denver has won 72% of its home games we could
simulation class and are not needed for this particular           begin by assigning a probability of .72 to Denver
exercise.                                                         winning a future home game. However, the quality
                                                                  of the opponent would raise or lower that probability

INFORMS Transactions on Education 5:1(67-74)                 68                               © INFORMS ISSN: 1532-0545
AMMARWRIGHT
 Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the
                                                       NBA League
for a particular game. For example it might be reason-
able to assign a probability of .72 to Denver beating                  .72 − (.67 − .38) = .43, a lower value since Indiana is a better than
New York at home since New York's road winning                         average road team.
percentage matches that of the league. That is New
York is an average road team. If the probability that                  It is important to check that the estimates for probabil-
Denver beats New York at home is .72, then the prob-                   ities of complementary events add to one. If we add
ability that New York wins that game would be 1 −                      the probability that the road team wins (as calculated
.72 or .28. This is lower than New York's road average,                in (1)) to the probability that the home team wins (as
indicating that Denver is a better than average home                   calculated in (2)) we get:
team. In fact the extent to which New York's probabil-
                                                                          Rper − (Hper − LHper) + Hper − (Rper − LRper) = LHper+ LRper= 1.
ity of winning is reduced (.38 to .28) of course matches
the extent to which Denver is a better than average
home team (.72 − .62). A possible generalization fol-                  This relationship suggests that equations (1) and (2)
lows.                                                                  provide possible estimates for winning probabilities.
                                                                       These estimates take into account the relative position
Consider two teams playing a particular game, team                     of any particular team in the league.
A, the home team and team B, the road team.
                                                                       2.2. Modeling the Number of Denver Wins
Let Hper= proportion Team A wins when playing at home.
                                                                       Figure 1 shows Denver's simulation sheet(1). It includes
Let Rper= proportion Team B wins when playing on the road.             Denver's schedule and the appropriate home or road
                                                                       records of each opponent. Average league results are
Let LHper= average proportion all league teams win when playing        also included. Denver's schedule and the league
at home.                                                               standings at the time of this example were downloaded
                                                                       from a popular sports site. Denver's win percentages
Let LRper= average proportion all league teams win when playing        and the league win percentages were included. Using
on the road.                                                           equations (1) and (2) (and IF statements for home or
                                                                       road), the probability of Denver winning each game
We can then estimate the probability that the road                     is calculated.
team wins as (team B over Team A):

                      Rper − (Hper − LHper). (1)

In our example, the estimated probability that New
York beats Denver on the road is:

                      .38 − (.72 − .62) = .28 .

Similarly we can estimate the probability that the home
teams wins as:

                      Hper − (Rper − LRper). (2)
                                                                                        Figure 1: Denver Simulation
In our example, the estimated probability that Denver
beats New York at home is:                                             The outcome for each game is simulated by using
                                                                       RAND, the Excel random number function for a uni-
                      .72 − (.38 − .38) = .72.                         form distribution between 0 and 1. If the random
                                                                       number is less than the probability of Denver winning
Also, if Denver is playing Indiana at home the proba-                  that game a "win" is recorded. Otherwise "lose" is en-
bility of Denver winning is:                                           tered in the simulation column. (Results in highlighted

(1)   http://ite.pubs.informs.org/Vol5No1/AmmarWright/denver.xls
INFORMS Transactions on Education 5:1(67-74)                      69                                     © INFORMS ISSN: 1532-0545
AMMARWRIGHT
 Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the
                                                       NBA League
cells are explained in the following section.) A                      was to run similar simulations for these three teams
COUNTIF statement is used to count the number of                      and compare the number of wins (Simulation File(2),
simulated wins. The total number of wins is the simu-                 Note: when running simulations only one spreadsheet
lated number of wins plus the actual wins at the time                 should be open at a time.). Running a simulation for
of the example.                                                       a new team merely requires entering a new schedule
                                                                      (a lookup functions will check records and recalculate
Each time a recalculation is done, new random num-                    probabilities). However we need to take into account
bers are generated and new simulation results                         instances in which each of these teams plays one an-
recorded. The results of multiple simulations can be                  other. For example Denver plays Utah at Utah . Al-
recorded using Excel's Data Table (Evans and Olson,                   though the probabilities that Denver wins and Utah
2002). For a 1000 run simulation we found the average                 loses add to one, we don't want to use two different
number of wins to be 46 with a minimum of 39 and a                    random numbers to simulate the outcome of a single
maximum of 55. Table 2 shows the number of times a                    game. Here we choose to randomly generate outcomes
range of wins occurred in the 1000 runs.                              for home games and use the results to determine those
                                                                      of the road games. For example, for the Denver at Utah
                                                                      game, the outcome on the Denver sheet is linked to
        Table 2: Range of Simulated Total Season Wins
                                                                      the outcome on the Utah sheet. If a "lose" shows up
                                                                      for Utah , a "win" will be entered for Denver and vice
                                                                      versa. (See the lower highlighted cell in Figure 1.) The
                                                                      spreadsheet(3) contains simulation sheets for each of
                                                                      the four teams. Each sheet duplicates the basic struc-
                                                                      ture of the Denver sheet described above.

                                                                      When the season for all four teams is simulated, an IF
                                                                      statement is used to record a "yes" if Denver's win total
                                                                      equals the maximum of the four win totals and a "no"
                                                                      otherwise (ignoring ties). As shown in Figure 2, a Data
                                                                      Table is used to perform a 1000 replications and a
                                                                      COUNTIF statement is used to count the percent of
                                                                      "yes" results.

At this point we have some sense of how many games
the Denver Nuggets might win for the season. In more
than 90 percent of the runs Denver wins at least 43
games. Is this enough to make the playoffs? In the
previous year it took 44 wins to make the playoffs in
the Western Conference while 38 would have been
enough in the East. To better assess Denver's chances
for the playoffs we may need to know how the number
of Denver wins compared to the competing teams.

2.3. Modeling the Top Performer from Four                                              Figure 2: First of Four
Teams
                                                                      In one simulation of 1000 runs we observed that Den-
At the time of this exercise, Denver was in eighth place              ver's wins exceeded the number of wins of the other
in the West, the final playoff spot. Teams in spots 9,                three teams 98% of the time. Our confidence that
10, and 11 (Seattle, Utah and Portland ) could be re-                 Denver will make the playoffs has increased.
garded as threats to Denver's position. Our next step

(2)   http://ite.pubs.informs.org/Vol5No1/AmmarWright/fourteams.xls
(3)   http://ite.pubs.informs.org/Vol5No1/AmmarWright/fourteams.xls
INFORMS Transactions on Education 5:1(67-74)                   70                                 © INFORMS ISSN: 1532-0545
AMMARWRIGHT
 Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the
                                                       NBA League

2.4. Modeling the Top Three Performers of Seven                     Also, Table 3 shows the summary of the simulated
Teams                                                               results. Since the top three teams will qualify for the
                                                                    playoffs (along with the assumed first five) Denver
Can we be even more confident? Currently Denver is                  misses the playoffs only 1% of the time (actually 9 out
actually tied with Memphis and Houston for the 6th,                 of 1000). By all three models Denver's and Carmelo
7th, and 8th places. Hence there is a chance that one of            Anthony's chances of making the playoffs seem very
these teams might falter, improving Denver's odds.                  high.
Our third simulation includes the 6th through 12th place
teams and attempts to determine whether Denver                          Table 3: Denver's Ranks as a Percent of 1000 Runs
would place in the top three of the last seven teams(4).
We are assuming the current top five teams will be in
the playoffs and only three spots are yet to be deter-
mined.

We can count the wins for each of the seven teams as
we did before in the four teams simulation. In seven
teams.xls(5) the sheets for the individual teams (other
than Denver ) are hidden to simplify the readers inter-
action with the spreadsheet. The sheets are not protect-
ed and can be easily unhiden to show a structure                    3. The Other Guy
identical to those in the four teams' sheets. Once the
seven team performance is simulated the RANK                        Syracuse and Denver fans must admit that there is a
function is used to determine the rank of Denver                    second super rookie this year by the name of LeBron
within these seven teams. A Data Table is then used                 James. What are his chances of leading the Cleveland
to simulate 1000 replications with the rank of Denver               Cavaliers to the playoffs? At the time of the exercise
as the table output. Figure 3 includes these seven team             the Cavaliers were in 11th place in the East but talking
calculations.                                                       confidently of rising to the 8th and final spot. The 8th
                                                                    through 11th spots were held by Boston, Miami,
                                                                    Philadelphia and Cleveland, in that order. A new
                                                                    spreadsheet was created for these four teams and a
                                                                    thousand endings to the seasons were simulated. It
                                                                    was assumed that only the top team would make the
                                                                    playoffs from these four teams (the seven higher teams
                                                                    are substantially above Boston, the current 8th and last
                                                                    qualifier). Table 4 contains the percent of trials (out of
                                                                    1000) in which each of the four teams gained that last
                                                                    spot.

                                                                       Table 4: Percent of Trials each Team Made the Playoffs

                  Figure 3: Top Three of Seven                      As modeled, Cleveland actually has very little chance
                                                                    of making the playoffs. Also the current 8thplace team,

(4)   http://ite.pubs.informs.org/Vol5No1/AmmarWright/seventeams.xls
(5)   http://ite.pubs.informs.org/Vol5No1/AmmarWright/seventeams.xls
INFORMS Transactions on Education 5:1(67-74)                   71                                © INFORMS ISSN: 1532-0545
AMMARWRIGHT
Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the
                                                      NBA League
Boston , does not have the best shot. Miami is the fa-            so no fan should give up yet. And that too is a lesson
vorite to gain the eighth position. Some of the students          from simulations. A lot of things can happen.
observe that this is consistent with Boston's poor play
after all their recent trades and the resignation of their
                                                                                Table 6: Cleveland's Chances
coach. Of course, our model knows nothing of that.
What it does know, as summarized in Table 5, is that
Miami has a better home record than the other teams
and plays more home games down the stretch than
the other three teams.

            Table 5: Home Game Advantages
                                                                  4. Comparison of Model Prediction with
                                                                  Actual Outcomes
                                                                  One of the many values of simulating sporting events
                                                                  is that we can compare our model predictions to actual
                                                                  outcomes within a relatively brief period of time. In
The students' observations however allow for an ap-               our case Denver began losing some key games (includ-
propriate discussion about what our models do and                 ing a game lost due to acknowledged referee error).
do not take into account. If all teams play the remain-           Students started wondering about the validity of our
ing games at the level they played the first part of the          model. This created the opportunity to discuss the
season our results are likely to be a very fair represen-         nature of probability and the extent to which a simu-
tation. However, this analysis was performed just be-             lation gives a range of possible outcomes any of which
fore the deadline for teams to make trades. As the                could occur (and others as well). Our simulation did
students point out, the Boston team playing after the             include events in which Denver did not make it. We
all − star break is a very different one from the one that        began periodically entering the actual results to date,
played the first half of the season. This is also the case        recalculating probabilities, and re − simulating the rest
for other teams including Cleveland . Our model ig-               of the season. The first recalculation dropped the per-
nores all this and assumes the teams will play in a               cent of times Denver qualified to around 70%. As
manner consistent with the way they played the first              Denver started losing more games than predicted (on
fifty some games. Obviously if the probabilities are              average), the recalculated lowered probabilities of
based on historical data and the future is sharply dif-           winning produced lower likelihoods of making the
ferent form the past than the results are less reliable.          playoffs. Figure 4 contains a chart showing those
We could attempt to make the results better by reduc-             changing odds over the last part of the season.
ing the probabilities that Boston will win games (at
least based on their last 10 games) and we could in-
crease the probabilities that Cleveland will win based
on the improved team and perhaps the maturing of
LeBron James. Unfortunately these changed probabil-
ities might represent biased preferences.

What we can do is ascertain how much better Cleve-
land will have to play to have a reasonable chance at
the playoffs. To do this we increase the probability of
winning both home and road games until Cleveland
makes the playoffs more than half the time. These in-
cremental results are shown in Table 6. As the table
shows, Cleveland's performance has to improve to                   Figure 4: Predicted probabilities that Denver makes the
well above the league average in order to have a rea-                           playoff as season progresses.
sonable chance. Of course a 4% chance is still a chance,

INFORMS Transactions on Education 5:1(67-74)                 72                               © INFORMS ISSN: 1532-0545
AMMARWRIGHT
Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the
                                                      NBA League
Eventually the regular season ended. To our relief               Nonetheless, the anecdotal evidence points to a very
Denver did make the playoffs, Cleveland did not, and             useful approach in introducing simulation and its
Miami did, all as predicted. As we were prepared to              various components. Students were very enthusiastic
argue (if Denver didn't make it), no single outcome              about the model and its results. The basketball example
says a great deal about the validity of a simulation             coupled with the local interest in the lead player (An-
model. In this exercise however we were actually                 thony) contributed greatly to students' continued in-
simulating the outcome of over 150 games (in the                 terest and desire to explore the model further. Beyond
Western Conference alone) and then keeping track of              the basic simulation model we were able to maintain
the playoff outcome. One way to evaluate the model               a flexible agenda for the exercise. The four teams and
is to compare the number of games won by each team               seven teams' analyses were a direct result of further
with the expected number of wins based on the esti-              probing initiated by the students. Also all subsequent
mated probabilities. Furthermore, rather than compar-            analyses including tracking and updating Denver's
ing point estimates we can look at a distribution of             chances, developing confidence intervals for the sim-
predicted number of wins. This distribution can be               ulation results, and evaluating the Cleveland team,
estimated by using the average probability of each               were instigated by students' inquiries. We were able
team winning its remaining games as the probability              on several occasions to demonstrate the limitations of
for a binomial random variable with the number of                the model as well as our ability to interpret the results.
games as the number of trials. Figure 5 contains a 95%           Finally, where students were inclined to explain the
confidence interval for the predicted number of wins             outcome using factors not included in the model we
for each team along with the actual number of wins               were able to clearly point that out.
(marker). In every case the actual number of wins falls
within the estimated distribution. It is true that Den-          Overall, this exercise allows the instructor to reinforce
ver's number of wins was below the expected value                important aspects of simulation modeling and model-
and Portland's and Utah's wins were above the expect-            ing in general. Specifically issues related to probabilis-
ed average. However, more often that not, results are            tic modeling, validity of simulation models, and what
going to be either above of below the average. In real-          if analyses can all be address with this example.
ity Portland and Utah did compete with Denver to the
very end of the season for that final spot.                      In most simulation models the assumed probability
                                                                 distributions are estimated based on historical data.
                                                                 Whether the future is in fact well represented by this
                                                                 historical data is always a concern. This exercise allows
                                                                 students to fully understand this in a familiar context.
                                                                 The exercise also provides an opportunity to evaluate
                                                                 the quality of these estimated probabilities after a sig-
                                                                 nificant amount of actual data becomes available. The
                                                                 model can be used to reinforce the fact that the proba-
                                                                 bilities used in defining simulation models can have
                                                                 considerable impact on the results and validity of the
                                                                 model. As the probabilities of winning changed over
                                                                 the season, the likelihood of Denver making the play-
                                                                 offs changed considerably.
 Figure 5: Actual # of wins versus a predicted 95% confi-
                      dence interval.                            There is also room for meaningful student discussion
                                                                 about the extent to which these models fail to describe
5. Conclusions                                                   the real world exactly but still give useful information.
                                                                 For example, we could try to improve the calculation
This exercise has proved to be a very useful experience          of probabilities to include the impact of a team's
in the classroom. It is important to note that it was            schedule in determining their record for the first part
designed and used only in one semester (Spring of                of the season. Did some teams play a weaker early
2004). The effectiveness or the impact on student                schedule? If we attempted to use the existing data to
learning has not been measured in any formal way.                ascertain this would we be using smaller and smaller

INFORMS Transactions on Education 5:1(67-74)                73                               © INFORMS ISSN: 1532-0545
AMMARWRIGHT
Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the
                                                      NBA League
samples to determine our probabilities and would we
face diminishing returns for our efforts? Would it make
any difference if we included all 29 teams in our
model?

What − if analysis is an important part of any modeling
exercise. We were able to demonstrate the usefulness
of varying the probabilities used in the model to see
if any reasonable variation in the probabilities really
gave Cleveland a good chance of making the playoffs.

With this exercise students can see the value of Monte
− Carlo simulation in a way that is fully transparent
and in a context they understand. At the same time,
and just as important, they get to experience using
data tables and other useful Excel functions. Some
students will get excited about the basketball results,
some about the power of simulations, and some about
what they can do with Excel. Hopefully we have im-
proved the chances that some students will get excited
about something.

References
Ammar, A and Wright, R. (2001), "What Chance Does
     the USA Have of Going to the World Cup?: An
     Example of Spreadsheet Monte − Carlo Simu-
     lation using Visual Basic," Proceedings of Deci-
     sion Sciences Institute National Meeting.
Evans, J. (2000), "Spreadsheets as a Tool for Teaching
       Simulations," INFORMS Transactions on Educa-
       tion, http://ite.pubs.in-
       forms.org/Vol1No1/Evans/index.php
Evans, J. and Olson D. (2002), Introduction to Simulation
        and Risk Analysis, 2nd Edition, Prentice Hall,
        New Jersey.
Lock, R. (1997), "NFL Scores and Point Spreads," Jour-
       nal of Statistics Education, Vol. 5.
Nettleton, D. (1998), "Investigating Home Court Ad-
       vantage," Journal of Statistics Education, Vol. 6.

INFORMS Transactions on Education 5:1(67-74)                74                             © INFORMS ISSN: 1532-0545
You can also read