Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
AMMARWRIGHT Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League Salwa Ammar Ronald Wright Department of Business Administration Le Moyne College Syracuse, New York Ammar@lemoyne.edu Wright@lemoyne.edu Abstract This paper describes a simulation exercise designed for introductory quantitative method classes both at the MBA and undergraduate level. The exercise tracks the performances of teams in the National Basketball Asso- ciation (NBA) during the season of 2004. It is designed as a spreadsheet model and is developed in stages throughout the academic semester. The example is a significant illustration of the use of sports as a vehicle for teaching OR topics, specifically simulation. It also incorporates many spreadsheet modeling skills such as the use of Excel functions and Data Tables. The model provided a good mix of the ingredients for an effective simulation example including a variety of answers, challenges and surprises. Editor's note: This is a pdf copy of an html document which resides at http://ite.pubs.informs.org/Vo5No1/ AmmarWright/ 1. Introduction the number one draft in the NBA for 2003 and gener- ated much media attention. The second rookie, For years, sports have been used in teaching Statistics Carmelo Anthony, led his college team for Syracuse (Lock, 1997 and Nettleton, 1998). More recently, sports University to its first national championship in basket- have been used as a motivating context for introducing ball. The championship was by far the most important simulation (Ammar and Wright, 2001). The probabilis- sporting result for the central New York region in re- tic nature of the competitive outcomes, the desire to cent years. Thus following the immediate progress of predict these outcomes, and the variety among the this local 'hero' guaranteed for us the broad student relationships between the outcomes, provide a rich interest. Student interest and curiosity were essential array of applications for simulation models. Excel in sustaining their enthusiasm as we explored various spreadsheets give instructors the ability to move away modeling tools and concepts for the simulation model. from 'toy' examples and introduce, in a manageable Although specific to these rookies and their draft format, real and meaningful illustrations (Evans, 2000). teams, the exercise can easily be generalized and up- The most effective sports related examples capitalize dated to predict performances of any teams in the on current events of general interest that stimulate NBA. curiosity and inquisitiveness beyond those who may be considered diehard fans. In this paper we demonstrate the details of the simu- lation model using Carmelo Anthony's team, the This paper describes an exercise that predicts the im- Denver Nuggets. In the previous year and prior to his pact of two rookies in the National Basketball Associ- draft the Nuggets won only 17 out of the season's 82 ation (NBA). One of the rookies, LeBron James, was games. By the all − star game of 2004 and with the help INFORMS Transactions on Education 5:1(67-74) 67 © INFORMS ISSN: 1532-0545
AMMARWRIGHT Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League of Anthony, the Nuggets had already won 31 games 2. Denver Simulation Model and seem to be well on their way to the playoffs. The example described in this paper joins the league at the The model involves an attempt to simulate the remain- point of the all star game (after 55 games) and includes ing 27 games for the Denver Nuggets. Once the out- a Monte − Carlo simulation for the performance of the come of these games is simulated the next step is to Nuggets in the remaining games of the season. In de- assess the Nuggets' chances of making the playoffs. termining the playoff chances of the Nuggets, the This process is designed in several stages. The first simulation includes the performances of the team's stage is that of estimating the winning probabilities nearest competitors. Similar assessments are also per- (percentage) by team in the league. It is important to formed for the Cleveland Cavaliers, the team that recognize that these probabilities vary for each team drafted LeBron James. depending on whether the game is played at home or away. The second stage is to simulate the number of The exercise is designed to help achieve several objec- Denver wins for the remaining scheduled season based tives. To run a successful simulation the first step tends on the estimated probabilities. Finally in assessing to focus on defining the relevant probabilities. The Denver's chances of reaching the playoffs, the model example introduces a method for estimating probabil- examines Denver's nearest (slightly above or slightly ities that is intuitively acceptable and follows the rules below) competitors by simulating each of their perfor- of probability. In this paper the process of estimating mances and comparing Denver's performance. the probabilities is simple (by design) in order to avoid the need for extensive discussion and coverage of ad- 2.1. Estimating Winning Probabilities vance topics in probability theory. Another objective of this example is to introduce students to the simula- The probabilities could be estimated by using informa- tion capabilities of Excel including the use of data ta- tion on teams' performances in the previously played bles to replicate observations. Also, this example intro- games. In the first 55 games of the season, Denver's duces the concept of simulating events that are related. winning percent was .582. (Note: percent is typically For example simulating the outcome of the Seattle at referred to on sport pages as a number between 0 and Denver game determines the outcome of the Denver 1. For convenience we maintain this convention for home game as well as the outcome of the Seattle away the data in this paper). We could use .582 as the prob- game. Another important objective of the example is ability of winning each game. However, in the NBA to demonstrate for students how simulation can be there is a significant difference between winning per- used to provide answers to questions beyond the cents for home games and for games on the road. Table specific simulated events. Simulating the outcomes of 1 shows Denver's home and road winning percentages games for the season allows us to explore the chances as well as the average of the entire league. It also shows of the team returning for the playoffs. The final part the same percentages for two select teams (for purpose of this paper focuses on the process of assessing and of illustration). updating the estimated probabilities as the season progresses and games are won and lost. Table 1: Select Winning Percentages This example is used in our management science class to introduce concepts and basic skills in spreadsheet simulation. This is a core junior level class required of students majoring in business and accounting. Stu- dents entering this class are expected to have complet- ed the introductory statistics requirement. The simula- tion is done entirely in Excel. Large numbers of trials are executed using data tables and other useful func- tions in Excel. The use of add − ins such as Crystal Ball and @Risk is introduced at the higher (or senior) level Since Denver has won 72% of its home games we could simulation class and are not needed for this particular begin by assigning a probability of .72 to Denver exercise. winning a future home game. However, the quality of the opponent would raise or lower that probability INFORMS Transactions on Education 5:1(67-74) 68 © INFORMS ISSN: 1532-0545
AMMARWRIGHT Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League for a particular game. For example it might be reason- able to assign a probability of .72 to Denver beating .72 − (.67 − .38) = .43, a lower value since Indiana is a better than New York at home since New York's road winning average road team. percentage matches that of the league. That is New York is an average road team. If the probability that It is important to check that the estimates for probabil- Denver beats New York at home is .72, then the prob- ities of complementary events add to one. If we add ability that New York wins that game would be 1 − the probability that the road team wins (as calculated .72 or .28. This is lower than New York's road average, in (1)) to the probability that the home team wins (as indicating that Denver is a better than average home calculated in (2)) we get: team. In fact the extent to which New York's probabil- Rper − (Hper − LHper) + Hper − (Rper − LRper) = LHper+ LRper= 1. ity of winning is reduced (.38 to .28) of course matches the extent to which Denver is a better than average home team (.72 − .62). A possible generalization fol- This relationship suggests that equations (1) and (2) lows. provide possible estimates for winning probabilities. These estimates take into account the relative position Consider two teams playing a particular game, team of any particular team in the league. A, the home team and team B, the road team. 2.2. Modeling the Number of Denver Wins Let Hper= proportion Team A wins when playing at home. Figure 1 shows Denver's simulation sheet(1). It includes Let Rper= proportion Team B wins when playing on the road. Denver's schedule and the appropriate home or road records of each opponent. Average league results are Let LHper= average proportion all league teams win when playing also included. Denver's schedule and the league at home. standings at the time of this example were downloaded from a popular sports site. Denver's win percentages Let LRper= average proportion all league teams win when playing and the league win percentages were included. Using on the road. equations (1) and (2) (and IF statements for home or road), the probability of Denver winning each game We can then estimate the probability that the road is calculated. team wins as (team B over Team A): Rper − (Hper − LHper). (1) In our example, the estimated probability that New York beats Denver on the road is: .38 − (.72 − .62) = .28 . Similarly we can estimate the probability that the home teams wins as: Hper − (Rper − LRper). (2) Figure 1: Denver Simulation In our example, the estimated probability that Denver beats New York at home is: The outcome for each game is simulated by using RAND, the Excel random number function for a uni- .72 − (.38 − .38) = .72. form distribution between 0 and 1. If the random number is less than the probability of Denver winning Also, if Denver is playing Indiana at home the proba- that game a "win" is recorded. Otherwise "lose" is en- bility of Denver winning is: tered in the simulation column. (Results in highlighted (1) http://ite.pubs.informs.org/Vol5No1/AmmarWright/denver.xls INFORMS Transactions on Education 5:1(67-74) 69 © INFORMS ISSN: 1532-0545
AMMARWRIGHT Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League cells are explained in the following section.) A was to run similar simulations for these three teams COUNTIF statement is used to count the number of and compare the number of wins (Simulation File(2), simulated wins. The total number of wins is the simu- Note: when running simulations only one spreadsheet lated number of wins plus the actual wins at the time should be open at a time.). Running a simulation for of the example. a new team merely requires entering a new schedule (a lookup functions will check records and recalculate Each time a recalculation is done, new random num- probabilities). However we need to take into account bers are generated and new simulation results instances in which each of these teams plays one an- recorded. The results of multiple simulations can be other. For example Denver plays Utah at Utah . Al- recorded using Excel's Data Table (Evans and Olson, though the probabilities that Denver wins and Utah 2002). For a 1000 run simulation we found the average loses add to one, we don't want to use two different number of wins to be 46 with a minimum of 39 and a random numbers to simulate the outcome of a single maximum of 55. Table 2 shows the number of times a game. Here we choose to randomly generate outcomes range of wins occurred in the 1000 runs. for home games and use the results to determine those of the road games. For example, for the Denver at Utah game, the outcome on the Denver sheet is linked to Table 2: Range of Simulated Total Season Wins the outcome on the Utah sheet. If a "lose" shows up for Utah , a "win" will be entered for Denver and vice versa. (See the lower highlighted cell in Figure 1.) The spreadsheet(3) contains simulation sheets for each of the four teams. Each sheet duplicates the basic struc- ture of the Denver sheet described above. When the season for all four teams is simulated, an IF statement is used to record a "yes" if Denver's win total equals the maximum of the four win totals and a "no" otherwise (ignoring ties). As shown in Figure 2, a Data Table is used to perform a 1000 replications and a COUNTIF statement is used to count the percent of "yes" results. At this point we have some sense of how many games the Denver Nuggets might win for the season. In more than 90 percent of the runs Denver wins at least 43 games. Is this enough to make the playoffs? In the previous year it took 44 wins to make the playoffs in the Western Conference while 38 would have been enough in the East. To better assess Denver's chances for the playoffs we may need to know how the number of Denver wins compared to the competing teams. 2.3. Modeling the Top Performer from Four Figure 2: First of Four Teams In one simulation of 1000 runs we observed that Den- At the time of this exercise, Denver was in eighth place ver's wins exceeded the number of wins of the other in the West, the final playoff spot. Teams in spots 9, three teams 98% of the time. Our confidence that 10, and 11 (Seattle, Utah and Portland ) could be re- Denver will make the playoffs has increased. garded as threats to Denver's position. Our next step (2) http://ite.pubs.informs.org/Vol5No1/AmmarWright/fourteams.xls (3) http://ite.pubs.informs.org/Vol5No1/AmmarWright/fourteams.xls INFORMS Transactions on Education 5:1(67-74) 70 © INFORMS ISSN: 1532-0545
AMMARWRIGHT Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League 2.4. Modeling the Top Three Performers of Seven Also, Table 3 shows the summary of the simulated Teams results. Since the top three teams will qualify for the playoffs (along with the assumed first five) Denver Can we be even more confident? Currently Denver is misses the playoffs only 1% of the time (actually 9 out actually tied with Memphis and Houston for the 6th, of 1000). By all three models Denver's and Carmelo 7th, and 8th places. Hence there is a chance that one of Anthony's chances of making the playoffs seem very these teams might falter, improving Denver's odds. high. Our third simulation includes the 6th through 12th place teams and attempts to determine whether Denver Table 3: Denver's Ranks as a Percent of 1000 Runs would place in the top three of the last seven teams(4). We are assuming the current top five teams will be in the playoffs and only three spots are yet to be deter- mined. We can count the wins for each of the seven teams as we did before in the four teams simulation. In seven teams.xls(5) the sheets for the individual teams (other than Denver ) are hidden to simplify the readers inter- action with the spreadsheet. The sheets are not protect- ed and can be easily unhiden to show a structure 3. The Other Guy identical to those in the four teams' sheets. Once the seven team performance is simulated the RANK Syracuse and Denver fans must admit that there is a function is used to determine the rank of Denver second super rookie this year by the name of LeBron within these seven teams. A Data Table is then used James. What are his chances of leading the Cleveland to simulate 1000 replications with the rank of Denver Cavaliers to the playoffs? At the time of the exercise as the table output. Figure 3 includes these seven team the Cavaliers were in 11th place in the East but talking calculations. confidently of rising to the 8th and final spot. The 8th through 11th spots were held by Boston, Miami, Philadelphia and Cleveland, in that order. A new spreadsheet was created for these four teams and a thousand endings to the seasons were simulated. It was assumed that only the top team would make the playoffs from these four teams (the seven higher teams are substantially above Boston, the current 8th and last qualifier). Table 4 contains the percent of trials (out of 1000) in which each of the four teams gained that last spot. Table 4: Percent of Trials each Team Made the Playoffs Figure 3: Top Three of Seven As modeled, Cleveland actually has very little chance of making the playoffs. Also the current 8thplace team, (4) http://ite.pubs.informs.org/Vol5No1/AmmarWright/seventeams.xls (5) http://ite.pubs.informs.org/Vol5No1/AmmarWright/seventeams.xls INFORMS Transactions on Education 5:1(67-74) 71 © INFORMS ISSN: 1532-0545
AMMARWRIGHT Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League Boston , does not have the best shot. Miami is the fa- so no fan should give up yet. And that too is a lesson vorite to gain the eighth position. Some of the students from simulations. A lot of things can happen. observe that this is consistent with Boston's poor play after all their recent trades and the resignation of their Table 6: Cleveland's Chances coach. Of course, our model knows nothing of that. What it does know, as summarized in Table 5, is that Miami has a better home record than the other teams and plays more home games down the stretch than the other three teams. Table 5: Home Game Advantages 4. Comparison of Model Prediction with Actual Outcomes One of the many values of simulating sporting events is that we can compare our model predictions to actual outcomes within a relatively brief period of time. In The students' observations however allow for an ap- our case Denver began losing some key games (includ- propriate discussion about what our models do and ing a game lost due to acknowledged referee error). do not take into account. If all teams play the remain- Students started wondering about the validity of our ing games at the level they played the first part of the model. This created the opportunity to discuss the season our results are likely to be a very fair represen- nature of probability and the extent to which a simu- tation. However, this analysis was performed just be- lation gives a range of possible outcomes any of which fore the deadline for teams to make trades. As the could occur (and others as well). Our simulation did students point out, the Boston team playing after the include events in which Denver did not make it. We all − star break is a very different one from the one that began periodically entering the actual results to date, played the first half of the season. This is also the case recalculating probabilities, and re − simulating the rest for other teams including Cleveland . Our model ig- of the season. The first recalculation dropped the per- nores all this and assumes the teams will play in a cent of times Denver qualified to around 70%. As manner consistent with the way they played the first Denver started losing more games than predicted (on fifty some games. Obviously if the probabilities are average), the recalculated lowered probabilities of based on historical data and the future is sharply dif- winning produced lower likelihoods of making the ferent form the past than the results are less reliable. playoffs. Figure 4 contains a chart showing those We could attempt to make the results better by reduc- changing odds over the last part of the season. ing the probabilities that Boston will win games (at least based on their last 10 games) and we could in- crease the probabilities that Cleveland will win based on the improved team and perhaps the maturing of LeBron James. Unfortunately these changed probabil- ities might represent biased preferences. What we can do is ascertain how much better Cleve- land will have to play to have a reasonable chance at the playoffs. To do this we increase the probability of winning both home and road games until Cleveland makes the playoffs more than half the time. These in- cremental results are shown in Table 6. As the table shows, Cleveland's performance has to improve to Figure 4: Predicted probabilities that Denver makes the well above the league average in order to have a rea- playoff as season progresses. sonable chance. Of course a 4% chance is still a chance, INFORMS Transactions on Education 5:1(67-74) 72 © INFORMS ISSN: 1532-0545
AMMARWRIGHT Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League Eventually the regular season ended. To our relief Nonetheless, the anecdotal evidence points to a very Denver did make the playoffs, Cleveland did not, and useful approach in introducing simulation and its Miami did, all as predicted. As we were prepared to various components. Students were very enthusiastic argue (if Denver didn't make it), no single outcome about the model and its results. The basketball example says a great deal about the validity of a simulation coupled with the local interest in the lead player (An- model. In this exercise however we were actually thony) contributed greatly to students' continued in- simulating the outcome of over 150 games (in the terest and desire to explore the model further. Beyond Western Conference alone) and then keeping track of the basic simulation model we were able to maintain the playoff outcome. One way to evaluate the model a flexible agenda for the exercise. The four teams and is to compare the number of games won by each team seven teams' analyses were a direct result of further with the expected number of wins based on the esti- probing initiated by the students. Also all subsequent mated probabilities. Furthermore, rather than compar- analyses including tracking and updating Denver's ing point estimates we can look at a distribution of chances, developing confidence intervals for the sim- predicted number of wins. This distribution can be ulation results, and evaluating the Cleveland team, estimated by using the average probability of each were instigated by students' inquiries. We were able team winning its remaining games as the probability on several occasions to demonstrate the limitations of for a binomial random variable with the number of the model as well as our ability to interpret the results. games as the number of trials. Figure 5 contains a 95% Finally, where students were inclined to explain the confidence interval for the predicted number of wins outcome using factors not included in the model we for each team along with the actual number of wins were able to clearly point that out. (marker). In every case the actual number of wins falls within the estimated distribution. It is true that Den- Overall, this exercise allows the instructor to reinforce ver's number of wins was below the expected value important aspects of simulation modeling and model- and Portland's and Utah's wins were above the expect- ing in general. Specifically issues related to probabilis- ed average. However, more often that not, results are tic modeling, validity of simulation models, and what going to be either above of below the average. In real- if analyses can all be address with this example. ity Portland and Utah did compete with Denver to the very end of the season for that final spot. In most simulation models the assumed probability distributions are estimated based on historical data. Whether the future is in fact well represented by this historical data is always a concern. This exercise allows students to fully understand this in a familiar context. The exercise also provides an opportunity to evaluate the quality of these estimated probabilities after a sig- nificant amount of actual data becomes available. The model can be used to reinforce the fact that the proba- bilities used in defining simulation models can have considerable impact on the results and validity of the model. As the probabilities of winning changed over the season, the likelihood of Denver making the play- offs changed considerably. Figure 5: Actual # of wins versus a predicted 95% confi- dence interval. There is also room for meaningful student discussion about the extent to which these models fail to describe 5. Conclusions the real world exactly but still give useful information. For example, we could try to improve the calculation This exercise has proved to be a very useful experience of probabilities to include the impact of a team's in the classroom. It is important to note that it was schedule in determining their record for the first part designed and used only in one semester (Spring of of the season. Did some teams play a weaker early 2004). The effectiveness or the impact on student schedule? If we attempted to use the existing data to learning has not been measured in any formal way. ascertain this would we be using smaller and smaller INFORMS Transactions on Education 5:1(67-74) 73 © INFORMS ISSN: 1532-0545
AMMARWRIGHT Comparing the Impact of Star Rookies Carmelo Anthony and Lebron James: An Example on Simulating Team Performances in the NBA League samples to determine our probabilities and would we face diminishing returns for our efforts? Would it make any difference if we included all 29 teams in our model? What − if analysis is an important part of any modeling exercise. We were able to demonstrate the usefulness of varying the probabilities used in the model to see if any reasonable variation in the probabilities really gave Cleveland a good chance of making the playoffs. With this exercise students can see the value of Monte − Carlo simulation in a way that is fully transparent and in a context they understand. At the same time, and just as important, they get to experience using data tables and other useful Excel functions. Some students will get excited about the basketball results, some about the power of simulations, and some about what they can do with Excel. Hopefully we have im- proved the chances that some students will get excited about something. References Ammar, A and Wright, R. (2001), "What Chance Does the USA Have of Going to the World Cup?: An Example of Spreadsheet Monte − Carlo Simu- lation using Visual Basic," Proceedings of Deci- sion Sciences Institute National Meeting. Evans, J. (2000), "Spreadsheets as a Tool for Teaching Simulations," INFORMS Transactions on Educa- tion, http://ite.pubs.in- forms.org/Vol1No1/Evans/index.php Evans, J. and Olson D. (2002), Introduction to Simulation and Risk Analysis, 2nd Edition, Prentice Hall, New Jersey. Lock, R. (1997), "NFL Scores and Point Spreads," Jour- nal of Statistics Education, Vol. 5. Nettleton, D. (1998), "Investigating Home Court Ad- vantage," Journal of Statistics Education, Vol. 6. INFORMS Transactions on Education 5:1(67-74) 74 © INFORMS ISSN: 1532-0545
You can also read