RANDOMNESS OF EUROMILLIONS DRAWS
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Centre for the Study of Gambling University of Salford Greater Manchester, M5 4WT, United Kingdom RANDOMNESS OF EUROMILLIONS DRAWS A Report for the National Lottery Commission January 2010
Randomness of EuroMillions Draws: Executive Summary The National Lottery Commission invited the Centre for the Study of Gambling (University of Salford) to establish whether there are any elements of non-randomness within EuroMillions. Concentrating on the first four years of EuroMillions draws, from 13th February 2004 to 8th February 2008 inclusive, specific objectives were to test whether: (a) there is equality of frequency for each EuroMillions number drawn; (b) each EuroMillions draw is independent of preceding draws; (c) there is any bias for or against tickets sold in particular countries; (d) the frequency of top tier winners is what would be expected from random draws. Players in any particular draw select five main numbers from 1 to 50 and two “Lucky Star” numbers from 1 to 9. They may win various prizes by matching their chosen numbers with selected combinations drawn. 50% of ticket sales are allocated to the Common Prize Fund, which is distributed among different prize tiers according to pre-specified percentages. There are no fixed prize tiers. Our investigations began with the results of an exploratory data analysis. The findings here were that there appear to be no obvious discrepancies and that all entries in the database appear to be correct. Perhaps the most important hypothesis tests are whether the frequencies of the five main numbers drawn agree with the null hypothesis that the five main balls are drawn at random and whether the frequencies of the Lucky Star numbers drawn agree with the null hypothesis that the Lucky Star balls are drawn at 2
random. Neither of our modified chi-square goodness-of-fit tests was significant at the 5% level, providing evidence in support of random EuroMillions drawings. We next conducted a set of tests for the sequential independence of EuroMillions draws. Based on the gaps between successive selections of the specific main numbers 1 to 50, we employed chi-square goodness-of-fit tests to compare observed and expected frequencies of possible gap sizes. Using the Bonferroni method for multiple comparisons to adjust the critical p-value and allow for multiplicity, the overall test was not significant at the 5% level and we conclude that successive EuroMillions main number draws appear to be independent of one another. We performed a similar analysis for the nine Lucky Star numbers and this overall test was not significant at the 5% level, so we conclude that successive EuroMillions Lucky Star number draws appear to be independent of one another. As a further check for the possibility of serial dependence, we performed runs tests for low valued balls and neither of the runs tests for main numbers and Lucky Stars was significant. This again provides evidence that the outcomes of specific draws do not depend on preceding draw results. Next, we generated graphical comparisons of observed and expected frequency distributions for the order statistics of the five main numbers and for the order statistics of the two Lucky Stars. The observed frequencies resemble the expected frequencies in each case, further supporting our formal tests of randomness. Then we performed several tests for randomness of the five main numbers and two Lucky Stars, based upon the means and standard deviations of the sums of numbers in each draw, and upon the parities of the numbers in each draw. None of these tests was significant at the 5% level, providing further evidence in support of random drawings. 3
We then investigated whether the first four years of EuroMillions draw history provide any evidence of bias for or against tickets sold in particular countries. By calculating sample correlation coefficients between the sales figures for each country and (i) the numbers of jackpot winners, (ii) the total numbers of winners and (iii) the total prize monies, we were able to measure the associations between these pairs of variables. As illustrated by scatter plots and regression fits, these correlations are all close to one with no clear outliers, providing evidence to support the hypothesis of no bias. The final stage was to consider whether the frequency of top tier winners (match five main numbers plus two Lucky Stars) agrees with expectation under the assumption that all EuroMillions procedures operate randomly. Knowledge of the coverage for each draw provided sufficient information to analyse the frequency of draws with no top tier winners, separately for the U.K. game only and for the aggregate game across all participating countries. Both chi-square goodness-of-fit tests showed that there were no significant differences between the observed and expected frequencies. These results provide supporting evidence that the process of generating winning numbers is independent of the pattern of numbers chosen by the playing population. Although we followed the convention of testing for significance at the 5% level throughout this investigation, it is worth noting that none of the results we obtained would be significant even at the 10% level. From all of these results, it is clear that the first four years of draw history provide no evidence that there are any elements of non-randomness within EuroMillions. 4
Randomness of EuroMillions Draws: Detailed Findings ABSTRACT The researchers conducted a variety of hypothesis tests for randomness of the first four years of EuroMillions draws up to and including 8th February 2008. These cover aspects relating to the frequencies of numbers drawn, possible dependencies between draws, comparisons across participating countries and frequencies of top tier winners. Overall, our analysis supports the hypothesis of randomness. That is, there is no statistical evidence of non-randomness in the EuroMillions draws. 1. Introduction The U.K. National Lottery Commission (N.L.C.) invited the Centre for the Study of Gambling (University of Salford) to establish whether there are any elements of non-randomness within EuroMillions. The EuroMillions draws generally take place in Paris on Friday evenings and there have been 209 draws since the inception of the game on 13th February 2004 up to and including 8th February 2008. The N.L.C. provided the researchers with data relating to these first four years of operation and specific objectives were to test whether: (a) there is equality of frequency for each EuroMillions number drawn; (b) each EuroMillions draw is independent of preceding draws; (c) there is any bias for or against tickets sold in particular countries; (d) the frequency of top tier winners is what would be expected from random draws. 5
We begin with an exploratory analysis of the data in order to check that there are no unusual, suspicious or incorrect observations. Then we perform a range of suitable hypothesis tests for randomness to answer the above questions. Some of these procedures are modifications of standard statistical methods described in textbooks such as Miller and Miller (2004) and Rice (2007). Others derive from specific recommendations made by Joe (1993) and Haigh (1997), and from the Royal Statistical Society’s early reports into the randomness of the lottery (2000, 2002). For consistency, we follow the general methodology adopted for previous National Lottery Commission reports on the randomness of Lotto Draws (2004), Lotto Lucky Dip (2005), Thunderball Draws (2005), Lotto Extra Draws (2005) and EuroMillions Lucky Dip (2010) produced by the Centre for the Study of Gambling (University of Salford). The sample sizes available to us are sufficient to enable standard, powerful tests of all the above hypotheses and so to produce statistically valid and reliable conclusions. We choose to work with the conventional 5% significance level, corresponding with a 95% confidence level, thereby maintaining consistency with most published statistical analyses and ensuring ease of interpretation. The tests for randomness that we use in this report complement each other and cover the most important features relating to the randomness of EuroMillions draws. Of primary importance, we use a standard chi-square goodness-of-fit test, modified to allow for sampling without replacement, to assess whether the winning balls are equally likely to be chosen and whether any observed variability is acceptable under the assumption of random sampling. To monitor for any serial dependence among the draws, we use standard chi-square goodness-of-fit tests based on the numbers of draws (gaps) between successive appearances of each number, separately for the main draw numbers and the Lucky Stars. These tests are capable of identifying any irregularities 6
arising due to time trends. As a separate test is needed for each number, we apply multiplicity adjustments to the test results in order to avoid false alarms. We then conduct a variety of other analyses and tests that are capable of detecting other patterns of non-randomness. An infinite number of possibilities exist and we select several convenient and complementary techniques that permit a good overall assessment of the data. Based on the results of each draw, these include monitoring and testing: the ordered ball values; the sum of the ball values; the odd/even spread of ball values. We also check for any bias across the different participating countries. We do this using sample correlation coefficients and fitted regression lines for the numbers of jackpot winners, numbers of other winners and values of prizes. Finally, we use a standard binomial test to assess whether the frequencies of jackpot winners in the U.K. and overall agree with what would be expected under random drawings, using knowledge of the coverage rates for each draw. 2. Exploratory Data Analysis The EuroMillions game involves nine European countries: Austria; Belgium; France; Ireland; Luxembourg; Portugal; Spain; Switzerland; U.K. (and the Isle of Man). In the U.K., a customer pays an amount in pounds sterling (to date £1.50) that reasonably approximates the equivalent of two Euros, in order to enter the EuroMillions game. Any excess payments (or any deficiencies) associated with this not being an exact conversion at the exchange rate prevailing on the day of the draw is returned to (or collected from) U.K. players via an adjustment to amounts paid out as lower tier prizes in the U.K. 7
Winning Approximate Prize Tier Combination Probability Allocations Match 5 and 2 Lucky Stars 1 / 76,275,360 32.0% Match 5 and 1 Lucky Star 1 / 5,448,240 7.4% Match 5 1 / 3,632,160 2.1% Match 4 and 2 Lucky Stars 1 / 339,002 1.5% Match 4 and 1 Lucky Star 1 / 24,214 1.0% Match 4 1 / 16,143 0.7% Match 3 and 2 Lucky Stars 1 / 7,705 1.0% Match 3 and 1 Lucky Star 1 / 550 5.1% Match 2 and 2 Lucky Stars 1 / 538 4.4% Match 3 1 / 367 4.7% Match 1 and 2 Lucky Stars 1 / 102 10.1% Match 2 and 1 Lucky Star 1 / 38 24.0% Aggregate 1 / 24 94% Table 1: winning combinations, probabilities and prize allocations. Players in any particular draw select five main numbers from 1 to 50 and two “Lucky Star” numbers from 1 to 9. They may win various prizes by matching their chosen numbers with selected combinations drawn, as shown in Table 1. 50% of ticket sales are allocated to the Common Prize Fund. 6% of the Common Prize Fund is allocated to the Reserve Fund to supplement the jackpot pool. The remainder (94%) of the Common Prize Fund is allocated to different prize tiers according to the percentages set out in Table 1. There are no fixed prize tiers. The equation for calculating the probability of a “Match x and y Lucky Stars” combination is 8
45 5 7 2 5 − x x 2 − y y p( x, y ) = × ; x = 0,1, K ,5 ∩ y = 0,1,2 (1) 50 9 5 2 where n n! = (2) m m!(n − m )! is the number of combinations of m items that can be selected from a total of n items. Any player who matches all five main numbers and both Lucky Star numbers wins a share of the top-tier jackpot prize. Table 2 and Figure 1 present a tally chart and bar chart illustrating the observed frequencies of occurrence f i for each of the numbers i of the five main balls selected during the first 209 draws up to and including 8th February 2008. A cursory glance at these data is sufficient to show that there are no obviously false records in this part of the database: only the natural numbers from one to fifty are recorded, none of the observed frequencies are unreasonable values and the observed frequencies sum to 5 × 209 = 1,045 . The expected frequency for each of the numbers is 1,045 ÷ 50 = 20 ⋅ 9 and the observed frequencies in the bar chart display a good scatter about this value. Although only nine of the first 209 draws selected ball 46, we shall later verify that this is in accordance with natural random variation. 9
i fi i fi i fi i fi i fi 1 28 11 24 21 25 31 17 41 24 2 18 12 27 22 17 32 19 42 19 3 28 13 16 23 25 33 18 43 21 4 21 14 23 24 17 34 21 44 24 5 19 15 24 25 22 35 21 45 21 6 23 16 20 26 22 36 24 46 9 7 21 17 16 27 17 37 26 47 21 8 24 18 17 28 14 38 21 48 18 9 22 19 24 29 20 39 15 49 23 10 23 20 16 30 20 40 20 50 30 Table 2: observed frequencies of the five main numbers drawn. 30 25 20 Frequency 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 Number Figure 1: observed frequencies of the five main numbers drawn. 10
Table 3 and Figure 2 present a tally chart and bar chart illustrating the observed frequencies of occurrence f i for each of the numbers i of the two Lucky Star balls selected during the first 209 draws up to and including 8th February 2008. These frequencies sum to 2 × 209 = 418 and a cursory glance at these data is sufficient to show that there are no obviously false records in this part of the database: only the natural numbers from one to nine are recorded and none of the observed frequencies are unreasonable values. The expected frequency for each of the numbers is 418 ÷ 9 = 46 ⋅ 4& and the observed frequencies in the bar chart display a good scatter about this value. i 1 2 3 4 5 6 7 8 9 fi 53 40 50 37 49 52 47 47 43 Table 3: observed frequencies of the two Lucky Star numbers drawn. 50 40 Frequency 30 20 10 0 0 1 2 3 4 5 6 7 8 9 Number Figure 2: observed frequencies of the two Lucky Star numbers drawn. The histogram in Figure 3 displays the sales per draw in millions of pounds for the U.K. EuroMillions game. Clearly, a large majority of draws attracted sales of below ten million pounds, though the sales increased to a peak of over fifty million pounds when there were consecutive rollovers and event draws on offer. 11
This is also evident from the subsequent scatter plot in Figure 4, where some outliers are evident. 100 80 Number of Draws 60 40 20 0 0 5 10 15 20 25 30 35 40 45 50 55 U.K. Sales (£m) Figure 3: sales per draw for the U.K. EuroMillions game. 50 40 U.K. Sales (£m) 30 20 10 0 0 1 2 3 4 5 6 7 8 9 10 11 Number of Consecutive Rollovers Figure 4: the relationship between U.K. sales and consecutive rollovers. 12
The most prominent of these are two particularly large U.K. sales amounts for first time draws (no rollovers) of about £34m on 28th September 2007 and about £17m on 9th February 2007. On further inspection, these both correspond to event draws, with jackpot pools of 130m Euros and 100m Euros respectively guaranteed by the Reserve Fund. An even more outlying U.K. sales amount of about £49m occurred on 8th February 2008 after one rollover. This corresponds to an event draw with a guaranteed jackpot pool of 130m Euros, on a draw that would otherwise have represented a single rollover. Apart from these instances, there are no particularly unusual observations that require closer inspection. Three graphs follow to display the coverage rates and their relationships to sales. Coverage refers to the proportion of all possible combinations that at least one player selects in any particular draw. 60 50 Number of Draws 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Total Coverage (%) Figure 5: total coverage across all EuroMillions countries. Figure 5 is a histogram, which displays the aggregate coverage rates across all participating countries for the first four years of draws, expressed as percentages. This is useful for assessing whether the frequency of rollovers is 13
consistent with the hypothesis of random drawings. Conscious selection is likely to be present, whereby many players do not select numbers and combinations randomly. However, this does not affect the randomness of the draws and the Lucky Dip facility helps to spread the players’ choices across a large proportion of the possible combinations. 40 30 U.K. Coverage (%) 20 10 0 0 10 20 30 40 50 60 U.K. Sales (£m) Figure 6: relationship between coverage and sales for U.K. EuroMillions. The expected coverage for any draw correlates positively with the number of tickets sold for that draw. Figure 6 presents a scatter plot of U.K. coverage against U.K. sales. The relationship between the variables is strikingly regular: near linear with a hint of concavity. The close approximation to linearity arises because the number of tickets sold in the U.K. is well below the number of possible combinations. 14
60 Sales Coverage 50 U.K. Coverage (%) 40 U.K. Sales (£m) 30 20 10 0 1 21 41 61 81 101 121 141 161 181 201 Draw Num ber Figure 7: time trends of sales and coverage for U.K. EuroMillions. Finally, Figure 7 presents two time series plots, which illustrate how the U.K. sales (£m) and U.K. coverage (%) variables have progressed over time. The obvious peaks coincide with consecutive rollovers and event draws. None of these graphs identifies any unusual observations that cast doubt on the accuracy of the recorded data. It is interesting to note that U.K. sales were relatively low for the first year, as only three countries participated during for the first thirty-four weeks and prize values were correspondingly less. We conclude this section with another time series plot, in Figure 8, which displays the numbers of Match 5 plus 2 Lucky Stars jackpot winners over the first 209 draws. During these first four years, there were 89 jackpot winners across all nine countries, including 10 from the U.K. Total sales during this period amounted to about 15,066 million Euros, compared with the U.K. sales of about 1,898 million Euros. 15
5 U.K. Total 4 Number of Jackpot Winners 3 2 1 0 1 21 41 61 81 101 121 141 161 181 201 Draw Num ber Figure 8: time series plots of jackpot winners. This is a sales ratio of about 8:1, which roughly equates with the ratio of about 9:1 for jackpot winners and so indicates that the U.K. is rewarded in accord with mathematical expectation. We present a formal statistical analysis of the frequency of jackpot winners later in this report. However, the time series plot in Figure 8 appears to be compatible with what one might expect from chance alone. Note that there were no draws with multiple jackpot winners early on, when fewer countries participated. 3. Testing whether there is Equality of Frequency for each EuroMillions Number Drawn Perhaps the most important tests for the EuroMillions game are whether the observed frequencies of the fifty main numbers and nine Lucky Star numbers accord with what would be expected from random drawings. 16
We test the equality of marginal frequencies for the five main numbers chosen in EuroMillions draws using a chi-square goodness-of-fit test modified to allow for sampling without replacement. Defining m = 5 , M = 50 and D = 209 , the test statistic is M 2 m2D2 (M − 1)M ∑ f i − i =1 M T= (3) (M − m)Dm where ball i is selected f i times for i = 1,2,K , M . This is equivalent to (M −1) ÷ (M − m) multiplied by the usual chi-square goodness-of-fit statistic; the expected frequencies in this case are all m × D ÷ M = 20 ⋅ 9 . We compare this test statistic with the critical value for the rejection region in the upper tail of the χ 2 (M − 1) distribution. The null hypothesis is that the observed frequencies accord with random drawings and the alternative hypothesis is that they do not. For the first 209 draws, we have T ≈ 40 ⋅ 7 on 49 degrees of freedom, corresponding to a p-value of p ≈ 0 ⋅ 796 . As this is greater than 0 ⋅ 05 , the test is not significant and we do not reject the null hypothesis at the 5% level of significance. We conclude that there is no evidence to suggest the possibility of bias in the selection of the five main numbers in the EuroMillions draws. It is important to realize that the EuroMillions draw operators use a variety of machines and ball sets to avoid systematic bias. Furthermore, p ≈ 0 ⋅ 796 means that when no bias is present, the probability of observing a frequency distribution at least as non-uniform as the one actually observed is approximately four fifths. Even though only nine of the first 209 draws selected ball 46, this modified chi-square goodness-of-fit test clearly demonstrates that this is in accordance with natural random variation. 17
Next, we consider the Lucky Star numbers chosen in EuroMillions draws and test the null hypothesis that the draw procedures are equally likely to select each of the nine numbers as Lucky Stars. If all balls are equally likely for selection as Lucky Stars, the expected frequency is 2 × 209 ÷ 9 = 46 ⋅ 4& for each of the nine possible numbers. We again compare the observed frequencies with these expected frequencies by means of a modified chi-square goodness-of-fit test, using Equation (3) after re-defining m = 2 and M = 9 . The observed test statistic is T ≈ 5 ⋅ 81 on 8 degrees of freedom, corresponding to a p-value of p ≈ 0 ⋅ 668 . Thus when no bias is present, the probability of observing a frequency distribution at least as non-uniform as the one actually observed is more than two thirds. Consequently, the test is not significant at the 5% level and we conclude that there is no evidence to suggest that the selection of the Lucky Star numbers is anything but random. 4. Testing whether each EuroMillions Draw is Independent of Preceding Draws For any fixed number i = 1,2, K , M let g 1 denote the number of draws until ball i first appears. Similarly, let g 2 , g 3 , K be the numbers of draws between later successive appearances of ball i . Under the null hypothesis that there is independence between the draws, these gaps are independent geometric random variables, with probability mass function g −1 m m p ( g ) = 1 − ; g = 1,2,3, K . (4) M M 18
This result enables us to perform a standard chi-square goodness-of-fit test of the null hypothesis for each fixed number. This involves comparing the observed gap frequencies with those expected under the null hypothesis using Equation (4), by evaluating the test statistic T =∑ (obs. − exp.)2 (5) exp. where summation is over the number of gap categories considered. For each of the M possible numbers, we calculate the expected frequencies for the chosen categories by scaling the corresponding probabilities from Equation (4) by the total number of observed gaps for that number, ignoring the final censored gap interval unless it clearly belongs to the uppermost category with an open interval. Under the null hypothesis of independent draws, the test statistic in Equation (5) asymptotically has a chi-square distribution with degrees of freedom equal to the number of categories minus one. The test is one sided and we reject the null hypothesis if the test statistic lies in the upper tail of this distribution. Based on the draw history available at this time and to avoid unnecessary complexity, we define only two categories and do so by referring to the median gap length assuming independent draws. The resulting categories correspond to gap lengths in the ranges g = 1,2, K , G and g = G + 1, G + 2, G + 3, K where we select the value of G to minimise the absolute difference metric G 1 d= ∑ p(g ) − 2 g =1 (6) from Equation (4). This ensures that the two categories are approximately equally likely, which maximises the power of this test. 19
Furthermore, as there are only two categories, we replace the approximate chi-square goodness-of-fit tests in Equation (5) by exact binomial tests for improved accuracy, based on two-tailed probabilities from the binomial distribution with probability mass function t p( y ) = q y (1 − q ) ; t−y y = 0,1,K, t (7) y where y is the observed number of gaps in category 1, t is the total number of gaps observed in categories 1 and 2, and G q = ∑ p( g ) (8) g =1 is the cumulative probability of gap lengths based on Equation (4). These tests are dependent for i = 1,2, K , M and there is clear multiplicity. The dependence means that these are only approximate tests of the null hypothesis, though they are still relevant and informative. The multiplicity problem is easily resolved by applying multiple comparisons procedures. We choose the Bonferroni adjustment for this purpose because of its simple interpretation, though other less-conservative procedures are available. To apply this adjustment, a test for independence of draws at the 5% level of significance involves comparing each of the M p-values with 0 ⋅ 05 ÷ M rather than with the unadjusted value of 0 ⋅ 05 . For the m = 5 main number balls selected in any EuroMillions draw, we set M = 50 as before, in which case the median gap length satisfies G = 7 by using Equation (4) to minimise the metric in Equation (6). Consequently, we define our two categories as g = 1,2, K ,7 and g = 8,9,10, K . We present the results of our 20
binomial gap tests for serial dependence of the main numbers drawn in Table 4, which gives details of the observed frequencies y i in category 1, the total observed frequencies t i in categories 1 and 2, and the p-value p i for each fixed number i . i y i (t i ) pi i y i (t i ) pi i y i (t i ) pi i y i (t i ) pi 1 20(29) 0.102 14 13(23) 0.837 27 7(18) 0.373 40 6(20) 0.077 2 7(18) 0.373 15 17(25) 0.164 28 7(15) 0.864 41 16(25) 0.325 3 21(29) 0.043 16 7(20) 0.189 29 7(21) 0.130 42 11(20) 0.979 4 9(22) 0.399 17 8(16) 1.000 30 10(20) 1.000 43 10(22) 0.676 5 9(20) 0.675 18 6(18) 0.172 31 8(17) 0.856 44 15(24) 0.420 6 10(23) 0.531 19 15(24) 0.420 32 9(20) 0.675 45 12(21) 0.815 7 14(22) 0.389 20 9(16) 0.942 33 9(19) 0.848 46 0(9) 0.003 8 12(24) 0.991 21 15(25) 0.562 34 11(23) 0.833 47 12(21) 0.815 9 10(22) 0.676 22 9(17) 1.000 35 10(22) 0.676 48 7(19) 0.268 10 14(23) 0.533 23 13(25) 1.000 36 16(24) 0.222 49 14(23) 0.533 11 14(25) 0.857 24 10(17) 0.763 37 13(27) 0.820 50 19(30) 0.298 12 15(27) 0.876 25 14(22) 0.389 38 10(21) 0.840 13 6(17) 0.250 26 12(23) 1.000 39 6(16) 0.355 Table 4: observed frequencies and p-values of gap tests for main numbers. To interpret the results following a Bonferroni adjustment for multiple comparisons, we compare the p-values with the adjusted significance level 0 ⋅ 05 ÷ 50 = 0 ⋅ 001 and reject the null hypothesis at the 5% level if any of the p- values is less than this value. From Table 4, we see that none of the p-values is less than 0 ⋅ 001 , so the test is not significant at the 5% level and we conclude that there is no evidence of serial dependence among the main numbers drawn in the EuroMillions game. 21
For the m = 2 Lucky Star number balls selected in any EuroMillions draw, we set M = 9 , in which case the median gap length satisfies G = 3 by using Equation (4) to minimise the metric in Equation (6). Consequently, we define our two categories as g = 1,2,3 and g = 4,5,6, K . We present the results of our binomial gap tests for serial dependence of the Lucky Star numbers drawn in Table 5, which gives details of the observed frequencies y i in category 1, the total observed frequencies t i in categories 1 and 2, and the p-value p i for each fixed number i . i 1 2 3 4 5 6 7 8 9 y i (t i ) 26(53) 20(41) 32(51) 21(37) 24(50) 30(53) 24(47) 28(47) 22(43) pi 0.666 0.704 0.206 0.767 0.575 0.695 0.908 0.446 0.932 Table 5: observed frequencies and p-values of gap tests for Lucky Star numbers. To interpret the results following a Bonferroni adjustment for multiple comparisons, we now compare the p-values with the adjusted significance level 0 ⋅ 05 ÷ 9 ≈ 0 ⋅ 006 and reject the null hypothesis at the 5% level if any of the p- values is less than this value. From Table 5, we see that none of the p-values is less than 0 ⋅ 006 , so the test is not significant at the 5% level and we conclude that there is no evidence of serial dependence among the Lucky Star numbers drawn in the EuroMillions game. We conclude this section with two further tests for serial dependence of EuroMillions draws, based on runs of small and large numbers. One of these is for the main numbers drawn and the other is for the Lucky Stars numbers drawn. Consider first the main numbers. For each draw, we count the number of low valued balls, which we define as 1 to 25. Then we classify each draw as low or high valued, depending on whether or not it comprises at least three low valued 22
balls. Now define L and H to be the total frequencies of low and high valued draws, with L + H = D where D = 209 as before. Then, the total number of runs (successions of identical classifications) R has the asymptotic normal distribution 2 LH 2 LH (2 LH − D ) R ~& N + 1, (9) D D 2 (D − 1) and so we can perform a standard normal hypothesis test for trends based on the test statistic R − µR z= (10) σR where µ R and σ R are the mean and standard deviation of R from Relation (9). For the available draw history, we observe L = 111 , H = 98 and R = 103 , corresponding to the test statistic z ≈ 0 ⋅ 292 from Equation (10) and a p-value of p ≈ 0 ⋅ 770 . As the latter exceeds 0 ⋅ 05 , the test is not significant at the 5% level and so this test provides no evidence against serial independence of the main numbers drawn. Now consider the Lucky Stars. For each draw, we count the number of low valued balls, which we define as 1 to 5. Then we classify each draw as low or high valued, depending on whether or not it comprises two low valued balls. Based on the available draw history, we observe L = 58 , H = 151 and R = 82 , corresponding to the test statistic z ≈ −0 ⋅ 486 from Equation (10) and a p-value of p ≈ 0 ⋅ 627 . As the latter exceeds 0 ⋅ 05 , the test is not significant at the 5% level and so this test provides no evidence against serial independence of the Lucky Star numbers drawn. 23
5. Other Tests for Randomness of the Five Main Numbers and the Lucky Stars Drawn We now present several further tests for randomness of the five main EuroMillions numbers and the two Lucky Star numbers drawn. We first conduct five complementary analyses of the EuroMillions main number combinations generated, based upon the theoretical probability distributions of the order statistics within each draw under the hypothesis of randomness. That is, we derive the actual distributions for the smallest (first order statistic), the next smallest (second order statistic) and so on, of the five main numbers generated. Then we compare the histograms based on actual observed order statistics with the corresponding hypothetical distributions. Whereas the tests in Section 3 are concerned with frequencies of the individual numbers drawn, the tests in this section are able to detect patterns and clustering within the combinations drawn. In D = 209 random draws of five numbers from 1 to 50, the frequency of appearances X by any particular number has probability mass function D p (x ) = p x (1 − p ) ; x = 0,1, K , D D− x (11) x where p = 5 ÷ 50 = 0 ⋅ 1 . Although this binomial distribution forms the basis of the test constructed in Section 3, it is not suitable for the order statistics, so we derive the corresponding hypothetical distributions from first principles. The first order statistic X (1) in a combination of five main EuroMillions numbers is the smallest of those five numbers. The second order statistic X (2 ) is the next smallest of those five numbers and so on. By considering elementary 24
combinatorics, we find that a general formula for the probability mass function of the k th order statistic takes the form x − 1 50 − x k − 1 5 − k P (X ( k ) = x) = ; x = k , k + 1, K , k + 45 (12) 50 5 for each of the values k = 1,2, K ,5 . Based upon this probability distribution, we can determine the expected frequencies for all five order statistics under the null hypothesis that the EuroMillions main number combinations are random. We then compare our observed frequencies with these by means of the graphs in Figures 9 to 13, which display the observed frequencies as markers and the expected frequencies as bars. Although we could perform formal chi-square goodness-of-fit tests to assess whether the order statistics accord with the assumption of random drawings, the graphs clearly illustrate that there are no consistent patterns that might indicate unusual behaviour and so we do not proceed with a formal analysis of this aspect. 30 25 number of draws 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 first order statistic Figure 9: obs. and exp. frequencies for first order statistic of main numbers. 25
30 25 number of draws 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 second order statistic Figure 10: obs. and exp. frequencies for second order statistic of main numbers. 30 25 number of draws 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 third order statistic Figure 11: obs. and exp. frequencies for third order statistic of main numbers. 30 25 number of draws 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 fourth order statistic Figure 12: obs. and exp. frequencies for fourth order statistic of main numbers. 26
30 25 number of draws 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 fifth order statistic Figure 13: obs. and exp. frequencies for fifth order statistic of main numbers. We now repeat this analysis for the EuroMillions Lucky Stars. The probability mass function of the k th order statistic in a combination of two Lucky Star numbers drawn takes the form x − 1 9 − x k − 1 2 − k P (X ( k ) = x) = ; x = k , k + 1, K , k + 7 (13) 9 2 for k = 1,2 . Based upon this probability distribution, we can determine the expected frequencies for both order statistics under the null hypothesis that the EuroMillions Lucky Star number combinations are random. We then compare our observed frequencies with these by means of the graphs in Figures 14 and 15, which display the observed frequencies as markers and the expected frequencies as bars. Although we could perform formal chi- square goodness-of-fit tests to assess whether the order statistics accord with the assumption of random drawings, the graphs clearly illustrate that there are no consistent patterns that might indicate unusual behaviour and so we do not proceed with a formal analysis of this aspect. 27
50 40 number of draws 30 20 10 0 0 1 2 3 4 5 6 7 8 9 first order statistic Figure 14: obs. and exp. frequencies for first order statistic of Lucky Stars. 50 40 number of draws 30 20 10 0 0 1 2 3 4 5 6 7 8 9 second order statistic Figure 15: obs. and exp. frequencies for second order statistic of Lucky Stars. We now consider several tests based on the sum of the EuroMillions numbers drawn. Firstly, define n ij to be the main number associated with ball i in draw j and the sum of the numbers selected in draw j by m s j = ∑ nij . (14) i =1 Under the null hypothesis of random draws, the mean of s j is 28
m(M + 1) µ= (15) 2 and the variance of s j is m(M + 1)(M − m ) σ2 = . (16) 12 We now use the central limit theorem to derive asymptotic sampling distributions for the sample mean U and sample variance V of s j for j = 1,2, K , D . These sampling distributions provide two two-sided tests of the null hypothesis that the EuroMillions numbers are drawn at random: U −µ ~& N (0,1) ; (17) σ D (D − 1)V ~& χ 2 (D − 1) . (18) σ 2 With D = 209 , M = 50 and m = 5 for the main numbers drawn, we have µ = 127 ⋅ 5 and σ 2 = 956 ⋅ 25 , and observed sample statistics U ≈ 125 and V ≈ 944 , corresponding to p-values of p ≈ 0 ⋅ 300 and p ≈ 0 ⋅ 923 from Relations (17) and (18) respectively. Consequently, neither of these tests is significant at the 5% level, again providing evidence in favour of randomness of the main numbers drawn in the EuroMillions game. Similarly, with D = 209 , M = 9 and m = 2 for the Lucky Star numbers drawn, we have µ = 10 and σ 2 = 11 ⋅ 6& , and observed sample statistics U ≈ 9 ⋅ 95 and V ≈ 11⋅ 9 , corresponding to p-values of p ≈ 0 ⋅ 840 and p ≈ 0 ⋅ 813 from Relations (17) and (18) respectively. Consequently, neither of these tests is significant at the 5% level, again providing evidence in favour of randomness of the Lucky Star numbers drawn in the EuroMillions game. 29
We now consider two tests based upon the observed odd and even combinations that occur in given EuroMillions draws. For the first of these, define e j to be the number of even numbers among the m = 5 main numbers selected in draw j based on M = 50 possible numbers. Under the assumption of random draws, the sampling distribution for e j is hypergeometric with probability mass function r M − r e m − e p(e ) = ; e = 0,1, K , m (19) M m where r = 25 , the number of even numbers between 1 and 50 inclusive. Consequently, we can perform a chi-square goodness-of-fit test using the test statistic T =∑ (obs. − exp.)2 (20) exp. to see whether our observed frequencies of even numbers per draw agree with what is expected by chance alone. This time, there are m = 5 degrees of freedom for the test. Based on the first 209 draws, our observed test statistic is T ≈ 3⋅ 16 corresponding to a p-value of p ≈ 0 ⋅ 675 . As this value exceeds 0 ⋅ 05 , the result is not significant at the 5% level and this test again supports the null hypothesis of randomness of the main numbers in EuroMillions draws. Repeating this test for the Lucky Stars numbers, we set M = 9 and m = 2 so r = 4 and we refer to the χ 2 (2 ) sampling distribution. Our observed test statistic is now T ≈ 1⋅ 09 corresponding to a p-value of p ≈ 0 ⋅ 580 . As this value exceeds 0 ⋅ 05 , the result is not significant at the 5% level and this test again 30
supports the null hypothesis of randomness of the Lucky Star numbers in EuroMillions draws. 6. Testing whether there is any Bias for or against Tickets Sold in Particular Countries In Table 6, we present details of the total EuroMillions sales over the first 209 draws in millions of Euros, divided up into totals for each of the nine participating countries. In order to assess whether inhabitants of each country can expect a proportion of winning tickets similar to the proportion of total sales they contribute, we present details of the numbers of jackpot winners, total numbers of winners in millions and total prize monies awarded in millions of Euros. Total number Total value Total sales Number of of winners of prizes Country (mEuros) jackpot winners (millions) (mEuros) U.K. 1,898 10 40 987 France 3,966 27 84 2,067 Spain 3,141 19 66 1,617 Belgium 941 4 20 563 Ireland 344 1 7 257 Portugal 3,302 21 69 1,620 Luxembourg 96 0 2 27 Austria 520 2 11 226 Switzerland 857 5 18 476 Total 15,066 89 318 7,841 Table 6: breakdown of EuroMillions sales, winners and prizes across countries. Table 7 presents the same information expressed as percentages of the totals across all participating countries, rather than as absolute values. This enables 31
easier comparisons to assess whether there is any bias for or against particular countries. These percentages are consistent for all countries, providing no evidence of any preferential bias. Number of Total number Total value Total sales jackpot winners of winners of prizes Country (%) (%) (%) (%) U.K. 13 11 13 13 France 26 30 26 26 Spain 21 21 21 21 Belgium 6 4 6 7 Ireland 2 1 2 3 Portugal 22 24 22 21 Luxembourg 1 0 1 0 Austria 3 2 3 3 Switzerland 6 6 6 6 Total 100 100 100 100 Table 7: percentages of EuroMillions sales, winners and prizes across countries. Our statistical analysis involves calculating the sample correlation coefficients between each of the three outcome measures and the sales figures by country in Table 6. Correlations must lie in the closed interval [− 1, 1] and, if the outcomes of EuroMillions draws treat tickets randomly (irrespective of where they were bought), these correlations should all be close to unity. In the order presented above, these correlations are 0 ⋅ 996 , 1⋅ 000 and 0 ⋅ 998 respectively, which clearly support the claims of no bias. Figures 16, 17 and 18 display these close relationships graphically, together with superimposed regression lines. All regression fits are highly significant, which we would expect if all participating countries are treated randomly. Consequently, we conclude that the first four years of EuroMillions draw history 32
provide no evidence of any bias for or against tickets sold in particular countries. Specifically, the EuroMillions game procedures appear to treat the U.K. on an equal basis as any other participating country. 25 jackpot winners 20 15 U.K. 10 5 0 0 1000 2000 3000 4000 sales (mEuros) Figure 16: scatter plot of jackpot winners against sales across countries. 80 total winners (millions) 60 U.K. 40 20 0 0 1000 2000 3000 4000 sales (mEuros) Figure 17: scatter plot of total winners against sales across countries. 33
2000 total prizes (mEuros) 1500 U.K. 1000 500 0 0 1000 2000 3000 4000 sales (mEuros) Figure 18: scatter plot of total prizes against sales across countries. 7. Testing whether the Frequency of Top Tier Winners is as Expected In a draw with N random entries, the number of top tier winners w has a binomial distribution with probability mass function N p(w) = p w (1 − p ) N −w ; w = 0,1, K , N (21) w where p = 1 ÷ 76,275,360 from Section 2. From this, the probability that a given draw has no top tier winners is p (0 ) = (1 − p ) . N For illustration, the mean sales per draw of U.K. EuroMillions over the first four years is about £6,809,653 , which equates to about N = 4,539,769 entries per draw. This suggests that the probability of no U.K. top tier winners should be roughly 0 ⋅ 940 , equivalent to about 94% of draws on average, except that some players might possibly select the same number combinations. Over the course of the first 209 draws, there was at least one U.K. jackpot winner on eight occasions. 34
This corresponds to about 96% of draws having no U.K. top tier winners, which would be perfectly acceptable under the assumption that the spatial distribution of jackpot winners is random. Similarly, the mean aggregate sales per draw of EuroMillions over the first four years is about 72,084,098 Euros, which equates to about N = 36,042,049 entries per draw. This suggests that the probability of no top tier winners from any country should be roughly 0 ⋅ 527 , equivalent to about 53% of draws on average, except that some players would likely select the same number combinations. Over the course of the first 209 draws, there was at least one aggregate jackpot winner on sixty-four occasions. This corresponds to about 69% of draws having no aggregate top tier winners, which would be acceptable if the distribution of jackpot winners were random. 160 140 number of draws 120 100 80 60 40 20 0 -1 0 1 2 3 4 5 6 number of top tier winners Figure 19: observed and expected frequencies of aggregate jackpot winners. However, it is well known that most players do not in fact select their lottery numbers at random and this “conscious selection” has the effect of clustering their entries, thereby resulting in more frequent draws with no top tier winners than would otherwise be expected. We illustrate this effect in Figure 19, 35
which plots the number of draws on the vertical axis against the aggregate number of top tier winners on the horizontal axis. The bars indicate the expected frequencies assuming the binomial distribution in Equation (21) without conscious selection, whereas the markers represent the observed frequencies. In order to assess whether the observed frequencies of draws with no top tier winners are what one would expect given the tendency of players to cluster in particular combinations of numbers, thereby providing evidence for or against randomness, we need to take account of conscious selection. We are able to do this by exploiting information on the coverage rate for each draw. This is the proportion of combinations selected by at least one player and is sometimes expressed as a percentage. The coverage values ci are available for all i = 1,2, K , D draws from 13th February 2004 to 8th February 2008 inclusive, where D = 209 as before. From this information, we can calculate the expected number of draws that should produce at least one top tier winner as D exp .(≥ 1) = ∑ ci (22) i =1 and the expected number of draws that should produce no top tier winners as D D exp .(0) = ∑ (1 − ci ) = D − ∑ ci . (23) i =1 i =1 Consequently, we can perform a chi-square goodness-of-fit test to assess whether our observed frequencies of jackpot winners agree with the expected frequencies by chance alone. As there are only two categories, we incorporate a continuity correction for improved accuracy and the test statistic becomes 36
2 1 obs. − exp. − T = ∑ 2 (24) exp. on one degree of freedom. The binomial test does not apply here, as the coverage varies across draws. According to the draw history available to us, the expected frequencies for U.K. EuroMillions coverage only are exp .(0) ≈ 197 ⋅ 5 and exp .(≥ 1) ≈ 11 ⋅ 5 draws, compared with the observed frequencies of obs.(0) = 201 and obs.(≥ 1) = 8 draws respectively. The corresponding chi-square test statistic and p-value are T ≈ 0 ⋅ 831 and p ≈ 0 ⋅ 362 . As the latter exceeds 0 ⋅ 05 , the test is not significant at the 5% level. We conclude that the number of draws generating at least one U.K. jackpot winner is consistent with mathematical expectation. Similarly, the expected frequencies for the aggregate EuroMillions coverage of all participating countries are exp .(0) ≈ 139 ⋅ 4 and exp .(≥ 1) ≈ 69 ⋅ 6 draws, compared with the observed frequencies of obs.(0 ) = 145 and obs.(≥ 1) = 64 draws respectively. The corresponding chi-square test statistic and p-value are T ≈ 0 ⋅ 564 and p ≈ 0 ⋅ 452 . As the latter exceeds 0 ⋅ 05 , the test is not significant at the 5% level. We conclude that the number of draws generating at least one jackpot winner across all participating countries is consistent with mathematical expectation. 8. Conclusions The National Lottery Commission invited the Centre for the Study of Gambling (University of Salford) to establish whether there are any elements of non-randomness within EuroMillions. Concentrating on the first four years of 37
EuroMillions draws, from 13th February 2004 to 8th February 2008 inclusive, specific objectives were to test whether: (a) there is equality of frequency for each EuroMillions number drawn; (b) each EuroMillions draw is independent of preceding draws; (c) there is any bias for or against tickets sold in particular countries; (d) the frequency of top tier winners is what would be expected from random draws. Players in any particular draw select five main numbers from 1 to 50 and two “Lucky Star” numbers from 1 to 9. They may win various prizes by matching their chosen numbers with selected combinations drawn. 50% of ticket sales are allocated to the Common Prize Fund, which is distributed among different prize tiers according to pre-specified percentages. There are no fixed prize tiers. We present the results of our investigations in this report, beginning with an exploratory data analysis in Section 2. The findings here are that there appear to be no obvious discrepancies and all entries in the database appear to be correct. In Section 3, we perform two important significance tests. One tests whether the frequencies of the five main numbers drawn agree with the null hypothesis that the five main balls are drawn at random and another tests whether the frequencies of the Lucky Star numbers drawn agree with the null hypothesis that the Lucky Star balls are drawn at random. Neither of our modified chi-square goodness-of-fit tests is significant at the 5% level, providing evidence in support of random EuroMillions drawings. Section 4 presents a set of tests for the sequential independence of EuroMillions draws. Based on the gaps between successive selections of the specific main numbers 1 to 50, we employ chi-square goodness-of-fit tests to compare observed and expected frequencies of possible gap sizes. Using the Bonferroni method for multiple comparisons to adjust the critical p-value and 38
allow for multiplicity, the overall test is not significant at the 5% level and we conclude that successive EuroMillions main number draws appear to be independent of one another. We perform a similar analysis for the nine Lucky Star numbers and this overall test is not significant at the 5% level, so we conclude that successive EuroMillions Lucky Star number draws appear to be independent of one another. As a further check for the possibility of serial dependence, we perform runs tests for low valued balls and neither of the runs tests for main numbers and Lucky Stars is significant. This again provides evidence that the outcomes of specific draws do not depend on preceding draw results. Section 5 presents graphical comparisons of observed and expected frequency distributions for the order statistics of the five main numbers and for the order statistics of the two Lucky Stars. The observed frequencies resemble the expected frequencies in each case, further supporting our formal tests of randomness. Then we perform several tests for randomness of the five main numbers and two Lucky Stars, based upon the means and standard deviations of the sums of numbers in each draw, and upon the parities of the numbers in each draw. None of these tests is significant at the 5% level, providing further evidence in support of random drawings. Section 6 investigates whether the first four years of EuroMillions draw history provide any evidence of bias for or against tickets sold in particular countries. By calculating sample correlation coefficients between the sales figures for each country and (i) the numbers of jackpot winners, (ii) the total numbers of winners and (iii) the total prize monies, we are able to measure the association between these pairs of variables. As illustrated by scatter plots and regression fits, these correlations are all close to one with no clear outliers, providing evidence to support the claims of no bias. 39
In Section 7, we consider whether the frequency of top tier winners (match five main numbers plus two Lucky Stars) agrees with expectation under the assumption that all EuroMillions procedures operate randomly. Knowledge of the coverage for each draw provides sufficient information to analyse the frequency of draws with no top tier winners, separately for the U.K. game only and for the aggregate game across all participating countries. Both chi-square goodness-of-fit tests show that there are no significant differences between the observed and expected frequencies. These results provide supporting evidence that the process of generating winning numbers is independent of the pattern of numbers chosen by the playing population. Although we have followed the convention of testing for significance at the 5% level throughout this investigation, it is worth noting that none of the tests we conducted would be significant even at the 10% level. From all of these results, it is clear that the first four years of draw history provide no evidence that there are any elements of non-randomness within EuroMillions. REFERENCES Haigh, J. (1997) The statistics of the National Lottery. Journal of the Royal Statistical Society series A, 160, 187-206. Joe, H. (1993) Tests of uniformity for sets of lotto numbers. Statistics & Probability Letters, 16, 181-188. Miller, I. and Miller, M. (2004) John E. Freund’s Mathematical Statistics with Applications (seventh edition). Prentice Hall. Rice, J.A. (2007) Mathematical Statistics and Data Analysis (third edition). Duxbury Press. Royal Statistical Society (2000, 2002) Reports on the randomness of the lottery. National Lottery Commission website: http://www.natlotcomm.gov.uk/ University of Salford (2004, 2005, 2010) Reports on the randomness of lottery games. National Lottery Commission website: http://www.natlotcomm.gov.uk/ 40
You can also read