Estimating the IPL Winner using Machine Learning - Ijaresm
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com Estimating the IPL Winner using Machine Learning Dr. Vanitha.K1, Bhargav Reddy.K2, Govardhan Reddy.N3, Jameel Basha.S4, Chinmay Sai.Y5 1 Assistant Professor, Department of Computer Science and Engineering, Madanapalle Institute of Technology and Science, Madanapalle 2,3,4,5 B. Tech IV Year, Department of Computer Science and Engineering, Madanapalle Institute of Technology and Science, Madanapalle -----------------------------------------------------------------*****************---------------------------------------------------------------- ABSTRACT Cricket could be a popular sport all around the world, notably in India. Cricket tournaments and competitions, like the IPL, require a big amount of resources, effort, and time to execute (Indian Premier League). As a result, players, coaches, and club management are under tremendous pressure to perform well with such high stakes at stake. As a result, we worked on developing a victory prediction system during this research., which expects the probability of a team winning during a match, through various parameters (features), supported their past matches. The input parameters are changed, in order that the best probability of winning will be attained, in this match. Thus, it helps team captains, coaches, and management to settle on those constraints (players), therein match, to extend their win probability. In addition, the strengths and weaknesses of the team's bowling and batting orders are identified in order to improve team performance. Predictive analytics is one of the domains of machine learning in which the probability of a particular lead, in the long run, is forecasted based on historical data. Before making any predictions, we need to thoroughly investigate and examine the data Keywords: Machine Learning, IPL, Data Analysis, Model Classifiers, Prediction, Prediction Models INTRODUCTION Machine Learning is the subdivision of Artificial Intelligence where the real-world problems can be resolved in Real world Engineering. This Procedure does not need any programming whereas only depends on data learning where the machine learns from the pastdata and predict the result accordingly. Machine Learning approaches have advantage of using decision trees, heuristic learning, knowledge acquisition, and mathematical models. It is a Twenty-20 cricket competition league which is played in India for inspiring the young and dynamic players. Since technology is improving at a faster rate, and because there is such a vast market for betting and such a great demand for cricket, the general public has been influenced to utilise machine learning calculations to predict the outcomes of cricket matches. Machine learning and data science make life easier in every way; for example, applying machine learning and forecasting the outcome before a match will assist players and coaches in identifying weak areas. It has strong ties to numerical improvement, which allows it to communicate techniques, hypotheses, and application areas to the industry.Machine learning and data processing are sometimes confused, however the latter domain focuses more on exploratory data analysis and is referred to as supervised learning. Predicting the outcome of a match has become so simple thanks to advancements in technology and, more recently, in sports. To train the algorithms, we use career statistics as well as team performances such as batting and bowling. As a result, we use supervised learning algorithms to forecast the outcome of the sport. Machine learning employs a variety of approaches, each of which is tailored to the datasets and parameters employed and predicts the outcomes accordingly. The 756 records that mostly matched have been taken into account and fitted to the modelling techniques that fit in it appropriately. The noisy data is separated, and the data is pre-processed before the models are trained. Some of the data is used to create a training set, and models are trained using that datasets, while the remaining data is utilized to test the models. The accuracy is one statistic that could be used to evaluate whether model has generated effective predictive results. LITERATURE SURVEY Indian PremierLeague has huge popularity there is a lot of associated work that are done on the estimation of the outcome of the match. Random Forest Classifier, Support Vector Machine, KNN, Logistic Regression, Gaussian NB, Gradient Boosting Classifier, Decision Tree Classifier, and more models have been utilised in articles.There arevariousresearch papers related IPL the preciseoutcome was not produced due to the discrepancy of the data. In the paper [2] the complete weight of a team is measured by taking each player performance. Seven types of machine learning models were trained accordingly and used for predicting the result. Among them Decision Tree Classifier and IJARESM Publication, India >>>> www.ijaresm.com Page 1725
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com Random Forest has given the highest accuracy. The paper focuseson analysing and predicting the winner using some machine learning techniques. [3] By using the existed data mining algorithms the outcome of an IPL is measured of both balanced and imbalanced datasets. For the inconsistent datasets oversampling technique is used and then procedures are applied to it. Here the precision for outcome is used as the performance metric and algorithms are used for calculating. The previous [3] IPLinformation is taken and analysed and classified accordingly. By using larger dataset, the model efficiency can be increased.The probability for the last few years matches determine that which team is going to win for the upcoming match. Seven variables of datasets were taken to fit the model and results are predicted according. Different models of machine learning techniques are used in this paper.Here the research is based on the previous information that we have taken from Kaggle resource. Decision Tree Classifier and Random Forest has given accurate values for this research paper. Problem Statement A cricket match has two outcomes: either the team wins or loses. However, focusing solely on winning or losing does not provide an exact assessment. Other elements to consider include home grounds, venue, toss decision, toss winner, city, and so on. Considering the other elements will aid in deciding the match prediction outcome, as well as the strength that supports the decision that was previously predicted. In general, a T20 match has a variety of characteristics that influence the game's outcome; in this project, we've focused on all of these characteristics that have a chance of becoming the match's decision-making feature; thus, by including such characteristics, we've increased the efficiency of our analysis. If there are two teams, X and Y, the outcome will not be that either X or Y will win the match, but this analysis will provide us with the predicted winner as well as some accuracy, which is the strength that we have gotten. Process Flo The process flow Contains Certain Steps Data Collection Data Cleaning Data visualization IJARESM Publication, India >>>> www.ijaresm.com Page 1726
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com Fig 1: Process of Predicting the Winning Team Data Collection The information (data) from the years (2008-2017) has been taken into consideration for analysing the data and the variables are selected from the data. The data is taken from the Kaggle repository. A Library named pandas is used for the transformation of data into numerical data for prediction. A total of 756 match records were taken into consideration for analysing and estimating the result of the match. In the Collected data there will be some attributes that were irrelevant for the prediction of the outcome of the match. These attributes must be removed from Prediction in order to maximize its performance. Fig 2: Collection of data with various attributes for prediction IJARESM Publication, India >>>> www.ijaresm.com Page 1727
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com Data Cleaning The practice of removing inconsistencies and replacing them with genuine values is known as data cleaning. The datasets collected contain noisy data which consists of null values and irrelevant values for some rows in the dataset that must be removed. So, the null values will be replaced with 0 and irrelevant values with appropriate values in the data set so that analysis can be made efficiently. with the removal of null values and replacing them with correct values and removal of irrelevant attributes rises the precision of the match outcome Data Cleaning Steps Removing Unwanted Observations Missing Data Handling Structural error solving Outliers management Fig 3: Removal of Null values from collected Data Fig 4: Removal of Irrelevant attributes for prediction (figure shows relevant Attributes) IJARESM Publication, India >>>> www.ijaresm.com Page 1728
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com Data Visualization The collected data is used for visualizing for better understanding of the information. Python contains Matplotlib library used for visualizing the graphs. Fig 5: Team winning the toss and winning the match IJARESM Publication, India >>>> www.ijaresm.com Page 1729
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com The above graph shows if the team wins the toss there is more probability that team wins match Fig 6: Distribution of Runs The graph shows there were more than 120 instances where teams has won the match with less than nearly 15 run difference. Fig 7: Team batting First and Winning the Match IJARESM Publication, India >>>> www.ijaresm.com Page 1730
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com Fig 8: Team batting Second and Winning the Match Predicting the results with the help of models The outcome of a match is predicted after fitting the appropriate data in the required models for the prediction. The models that are used in here are Random Forest, support vector machine,DecisionTreeClassifier, KNN, Naive Bayes which random forest, Decision Tree Classifier has given better result rather than support vector machine. Random Forest Classifier: Random Forest classifier is used for regression and classification which is a supervised learning procedure where the model learns from the former information and predicts the outcome of the match. Random forest Classifier work with the decision trees on data samples and lastly gets the best result among the predicted ones. In this project the random forest has given the best accuracy for the variables that have been taken. Support Vector Machine The Support Vector Machine (SVM) is a type of supervised learning problem that can be used to handle both regression and classification issues. It is mostly used in Machine Learning to solve Classification difficulties. Every data item is a point as a spot in n-dimensional space in the SVM algorithm, with the value of each character being the value of a coordinate. Then we conduct analysis by locating the hyper-plane that best distinguishes the two classes. Decision Tree Classifier The Decision Tree is a supervised training technique applied to classification and regression problems, though it is most commonly employed to solve classification problems. It's a tree-structured predictor with interior nodes reflecting data set properties, branches reflecting rule base, and each leaf node reflecting the conclusion. The Node Represents and the Leaf Node are the two types of nodes of a Decision tree. Leaf nodes are indeed the result of those conclusions and do not contain any further branches, while Decision nodes are being used to make decision and have multiple branches. KNN The full training dataset is used as the model representation for KNN. There is no training required since KNN seems to have no strategy other than holding the complete dataset. To make glance and matching for patterns during prediction efficient, efficient implementations could store the data using advanced data structures as k-d trees. IJARESM Publication, India >>>> www.ijaresm.com Page 1731
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com Since this entire training data is saved, you should consider the consistency of the training samples thoroughly. Curating it, upgrading it regularly as new data arrives, and eliminating erroneous and abnormal data could be a good idea. Naive Bayes The Naive Bayes process is effective to build and is especially useful for huge data sets. Naive Bayes is renowned to outperform even the most advanced classification systems due to its simplicity. Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below: P(x|c)p(c) P(c|x)= ( ) Fig 9: comparison of Naïve bayes vs SVC vs Random Forest Vs KNN vs Decision Tree RESULTS The algorithms that we used are random forest, support vector machine, Naïve Bayes, KNN, Decision Tree. Among them random forest and Decision tree has given the best result for the parameters that taken into consideration. The random forest has given accuracy of86.640% whereas the support vector modelhas also given the accuracy of 86.508%and KNN has given accuracy 66.66% and Decision Tree with accuracy of 86.640%.The variables that we taken here are team1, team2, city, toss decision, toss winner and venue. Id for Each Team Team Name Short Form 1 Mumbai Indians MI 2 Kolkata Night Riders KKR 3 Royal Challengers Bangalore RCB 4 Deccan Chargers DC 5 Chennai Super Kings CSK 6 Rajasthan Royals RR 7 Delhi Daredevils DD 8 Gujarat Lions GL 9 Kings XII Punjab KXIP 10 Sunrises Hyderabad SRH IJARESM Publication, India >>>> www.ijaresm.com Page 1732
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com 11 Rising PuneSupergiants RPS 12 Kochi Tuskers Kerala KTK 13 Pune Warriors PW 14 Draw Here eachteam of categorical form is encoded into the numeric value for better understanding and the teams are listed in the table. The parameter venue was also encoded to the numerical format. Each venue has different values. Fig 10: The data encoded in the numerical Format Fig 11: Result of Actual Winner vs Predicted Winner CONCLUSION The result of the match mainly depends on the selection of the team and the player performances in the match. Not only the performance but also depends on some other factors like toss wining, toss decision,venue, team1, team2, city where the match is played. Predicting the IPL is not so easy because the game depends on so many factors. The main source of this paper is that the predicting the winner according to the past data from 2008 to 2017. In this paper five types of classification algorithms were used and predict the results. The tools that are used in implementation are python programming. Among the two classification algorithms random forest gave the highest accuracy of 86.640% and next support vector system gave the accuracy of 66.667%. This information will be used in the future predictionof winner and team selections accordingly so that there will be more chance for winning the next match. IJARESM Publication, India >>>> www.ijaresm.com Page 1733
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com REFERENCES [1] R. P. Schumaker, O. K. Solieman and H. Chen, "Predictive Modeling for Sports and Gaming” in Sports Data Mining, vol. 26, Boston, Massachusetts: Springer, 2010. [2] Bunker, Rory &Thabtah, Fadi. (2017) “A Machine Learning Framework for Sport Result Prediction. Applied Computing and Informatics”, 15. 10.1016/j.aci.2017.09.005. [6] Ramon Diaz-Uriarte and Sara, “Gene selection and classification of microarray data using random forest, BMC Bioinformatics”, doi:10.1186/1471- 2105-7-3. [3] Akhil Nimmagadda et. Al, “Cricket score and winning prediction using data mining”, IJARnD Vol.3, Issue3. [4] A. L. Samuel, “Some studies in machine learning using the game of checkers. iirecent progress,” in Computer Games I,pp. 366–400, Springer, 1988. [5] S. Kampakis and W. Thomas, “Using machine learning to predict the outcome of English county twenty over cricket matches,” arXiv preprint arXiv:1511.05837, 2015 [6] . Bandulasiri, “Predicting the winner in one day international cricket,” Journal of Mathematical Sciences & Mathematics Education, vol. 3, no. 1, pp. 6–17, 2008 [7] Vistro, Daniel, Leo Gertrude David, “The cricket winner prediction with application ofmachine learning and data analytics,” International Journal of Scientific and TechnologyResearch, Volume 8, Issue 09, 2019. [8] Jhanwar, Vikram, “Predicting the Outcome of ODI Cricket Matches: A Team CompositionBased Approach," International Institution of Information Technology, Hyderabad, 2016. [9] Lokhande,Chawan, Pramila, “Prediction of Live Cricket Score and Winning,” InternationalJournal of Trend in Research and Development, Volume 5(1),(2018). [10] Jaishankar, Rajkumar, “A review paper on cricket predictions using various machine learningalgorithms and comparisons among them,” International Journal for Research in AppliedScience and Engineering Technology, 2018. [11] Rory, Fadi. “A Machine Learning Framework for Sport Result Prediction,” AppliedComputing and Informatics, Volume 15, Issue 1 2017. [12] Jayanth, Sandesh Bananki, Akas Anthony, GududuruAbhilasha, Noorni , Gowri Srinivasa,“A team recommendation system and outcome prediction for the game of cricket,” Journal ofsports analytics, vol 4, pp. 263-273, 2018. [13] Akhil, Venkata, Venkatesh, Sai, Chavali, "Cricket score and winning prediction using datamining," International journal of advance research idea and innovations in technology, 2018. [14] Tejinder, Vishal, Parteek, “Score and Winning Prediction in Cricket through Data Mining,”International Conference on Soft Computing Techniques and Implementations, 2015. [15] Stylianos, William, “Using machine learning to predict the outcome of English county twentyover cricket matches,” Cornell university, 2015. IJARESM Publication, India >>>> www.ijaresm.com Page 1734
You can also read