Prediction of Stock Market Index based on Neural Networks, Genetic Algorithms, and Data Mining Using SVD
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The Proceedings of the International Conference on Digital Information Processing, Data Mining, and Wireless Communications, Dubai, UAE, 2015 Prediction of Stock Market Index based on Neural Networks, Genetic Algorithms, and Data Mining Using SVD Dr. Mohammad V. Malakooti Amir AghaSharif Faculty and Head of Department of Computer Engineering Student of Department of Computer Engineering Islamic Azad University, UAE branch, Dubai, UAE Islamic Azad University, UAE branch, Dubai, UAE malakooti@iau.ae Agha.sharif@gmail.com ABSTRACT research we want to develop new software based on mathematical rules and prediction Nowadays, most of the investors are interested to algorithms to help affiliates for a better use of predicting tools for obtaining the accurate decision. information about the stock market indices and to They can obtain the predicted values of the make a wise decision based in the precise market stock market price indices are unpredictable price. The prediction of the stock market index is an and buy or sell the stock with more confidence. attractive research area that needs to be done with especial tools and with accurate algorithms. Since the stock market price indices are In this research we have uses the Neural Network unpredictable and not only depend upon the (NN) for the learning and curve fitting process, economic events but also will be affected by Genetic Algorithms (GA) for the path search and political events. Thus, we cannot easily fit a optimization process, Decision Tree and Data mathematical model to this unpredicted, non- Mining, using SVD to obtain the maximum linear, and non-parametric rime series. accuracy of the prediction. The maximum accuracy The main concern of the broker is to get into of the prediction rate obtained for DJIA by using the market at right time and either buy or sell machine learning techniques is about 77.8%. the stock based on the reliable information. We Our focus on this research is to improve the have followed the work of researchers [2], [6], decision tree, dada mining and neural network [13], and have used the fundamental analysis, techniques by using the Eigen System Analysis, Mean value, and SVD. data mining, machine learning, decision tree and neural networks to reach our prediction KEYWORDS goals. Fundamental analysis can be used to obtain the stock trading, risk, decision tree, machine learning, price of stock by using natural values and neural networks, genetic algorithms, data mining, attended return on buy or sell of the share [12], data classification, future stock, SVM, Eigen value [7]. There are two kinds of analysis on the stock and SVD. market: I. INTRODUCTION 1) Technical Analysis: Manuscript In recent decade, many researchers We have not focused on the technical focused on the stock market predication in analysis because it has been used for which we can predict future of stock market short-term strategy on the market. In price index based on the previous information some cases, researchers may have used and the relationships exist between them. In this the technical analysis for stock market ISBN: 978-1-941968-05-5 ©2015 SDIWC 29
The Proceedings of the International Conference on Digital Information Processing, Data Mining, and Wireless Communications, Dubai, UAE, 2015 based on the historical data of volume possible actions such as buy or sell of and trading price. We can use the past share to achieve more benefit [8]. value of the stock market price information and predict its feature value Six Major Risks in the Stock Market based on the historical marketing for Traders: information and volume [9][12]. Therefore, with machine learning and 1) Trade Risk: what you put on analysis of charts and models we can trade: for example if you put one show the direction of the market. thousand dollars in a trade that is your Trade Risk. 2) Fundamental Analysis: 2) Market Risk: What can happen We have focused on the fundamental in the market, something that analysis because it has been used for happen to the global economy, long-term strategy on the market and possibly to your country or concentered on the mathematical model. where you are trading Artificial Intelligent (AI) and Data Mining (DM) techniques which are very 3) Margin Risk: If you are tough approach are similar to decision borrowing money on margin. tree. One can use the artificial neural For example if you are network to perform fundamental borrowing money from a broker analysis in this scope [14]. Data Mining, and you don’t pay that money an interdisciplinary subfield of back in a certain amount of time computer science, is the computational or you don’t close out some process of discovering pattern in the positions did you have this large data sets involving methods at the margin risk eventually, it will intersection of Artificial Intelligence. catch up to you and you do have We need to know all possible outcomes to pay that money back or close and chart representation and the out your position, otherwise you directions to make a good decision that will be forced to close those it comes from decision tree, one of the positions. greatest ways to data classification. For machine learning algorithm we use the 4) Liquidity Risk: If you cannot decision tree and Artificial Neural get out of stock market quickly. network method. In this decade, That typically, you don’t have researchers focus on predicting the liquidity issues because you stock from the historical data and find trade big amount of stock. the useful rules from raw data in database investors. They cannot extract 5) Overnight Risk: If you hold these rules from raw data easily. the position overnight or for Prevalently, in real world, it is multiple days because you don’t impossible to conclude from data in know what’s going happen case of huge databases. As we overnight. You don’t know what mentioned before, data mining helps will happened to the company, investors to classify the historical data what news will come out, and predict the future of market for any something overseas may happened to the company. You ISBN: 978-1-941968-05-5 ©2015 SDIWC 30
The Proceedings of the International Conference on Digital Information Processing, Data Mining, and Wireless Communications, Dubai, UAE, 2015 don’t know really what’s going 5) To appraise the model by using the happen. famous method (evolution method). 6) Spread out the model in the market for 6) Volatility Risk: is the range of predicting the suitable action like buy or the magnitude that the stock is sell a share. moving it. Think of volatility is the range, it could be up ten 7) Realize the reason and goals of model. dollars one day, down ten dollars another day, those are more After collecting data, we should use decision wallet of stock [15]. tree for classification. There are three main advantages for decision tree: it is fast, simple and accurate. The parameters in this model are, II. RELATED WORKS previous, open, max, min, last and action. Decision Tree: Genetic Algorithm: One of the best methodologies in Decision Tree Another algorithm, which is used for prediction is Data Mining in order to collect the data from of stock marketing, is Genetic Algorithm. One the stock market with this method and find firm of the reasons that we choose this technique is model to extract issues as well as related to find accurate solutions for our issues. This solutions. There are different Data Mining algorithm is referred to evolutionary biology methodologies to show us how to manage the like inheritance, mutation, selection and collecting data, analyzing data and issue of the crossover. In Genetic Algorithm, the first step information, executing information and finally is to choose a set of chromosomes, which is a control the progressive of the result [5]. To possible solution for issues in different make the model for analyzing the stock market, situations. After that, one solution should be we use the CRISP-DM (Cross-Identity tested and become better. Finally the better Standard Process for Data Mining) in decision solution has more chance to solve the problem. Tree Method. This method is a result of the These steps should be continued until we get European consortium of companies in mid the optimal solution [4,10]. 1990s to achieve a non-dedicated standard process model for Data Mining methodology. Evolution Strategies: This model involves 7 steps: For continuous parameter optimization, there is 1) Comprehend the goals of extracting Evolution Strategy. We can show the gene as a stock prices. vector and in this algorithm the intermediate recombination strategy is used. In the other 2) Find out the collected data and words, the average of selected parent values is formation of that. the child and randomly other parents are selected. At the end, two individual can go to 3) Provide the data, which is placed in the the next generation. We have to follow 5 steps in this algorithm: classification model. 1) Build an initial population of 4) Choose the technique for making model. individuals randomly. ISBN: 978-1-941968-05-5 ©2015 SDIWC 31
The Proceedings of the International Conference on Digital Information Processing, Data Mining, and Wireless Communications, Dubai, UAE, 2015 2) Use reproduction operator for making Learning, which is used in various field. children from current population. Especially we use the different method such as Support Vector Machine (SVM) and 3) Conclude the suitability of each reinforcement learning. To reach the goal of individual. SVM method, we collect the global stock 4) Choose the best individuals and ignore market and various financial products to predict the future stock trend. As a result of SVM other ones. method, we can predict of 74.4% in NAZDAQ, 77.6% in DJIA and 76.0% in S&P500. In 5) Continue to step 2 until the number of machine learning, we use these formulas: generation is empty. At the first we define Xi(t), where i ϵ {1, 2, …}, The parameters of genetic algorithm are to be feature i at time t. population size, crossover probability, selection F= (X1, X2, ..., Xn)T (1) and stopping criteria. And parameters of Where evolutionary strategy are: population size, crossover probability, mutation probability, Xt = (x1(t), x2(t),..., xn (t)) (2) selection and stopping criteria. [4] ∇δxi(t) = xi(t) − xi(t −δ) ∇δ X (t) = X (t) − X (t − δ) Neural Network: = (∇δ x1(t), ∇δ x2(t), · · · ∇δ x16(t))T ∇δ F = (∇δX(δ + 1), ∇δX(δ + 2), ..., ∇δX(n)) (3) Because of using learning from training and experience, Machine Learning is one of the Experimental Results of this Algorithm: suitable methods in Artificial Intelligence criteria. ANNs is a connectionist model, which A) Trend Prediction: can improve the network by setting the weights. This model includes nodes, direct arcs and 1) Single Feature Prediction: weights as well [1]. based on cross-correlation for Rosenblatt created the feed-forward networks approximation of importance of [9]. This model is represented by three layers: data collection in the algorithm input layer, hidden layer and output layer. In we can predict daily NAZDAQ feed-forward model the arcs are unidirectional. index trend. As we can see in In financial criteria, there are different below: problems and the important one is to predict the stock market. As we mentioned before, ANNs networks model are used to predict the stock market and it uses the following parameters: previous day’s index value, previous day’s TL/ USD exchange rate, previous day’s overnight interest rate and 5 dummy variables each shows the working days of the week [3]. Machine Learning: As we mentioned in the introduction one of the method to predict the stock market is Machine Figure 1: Prediction accuracy by single ISBN: 978-1-941968-05-5 ©2015 SDIWC 32
The Proceedings of the International Conference on Digital Information Processing, Data Mining, and Wireless Communications, Dubai, UAE, 2015 As you can see the best result Pr {vt+1– vt >ct } where ct = -(vt-ts– 70.8% belong to DAX. vt). (4) 2) Long Term Prediction: So, based on this formula we reach For reach to the more accuracy 85.0% accuracy when time period in the long term prediction, we longer than 30 days. use the below formula: Figure 2: Decision Tree for the MECE ISBN: 978-1-941968-05-5 ©2015 SDIWC 33
The Proceedings of the International Conference on Digital Information Processing, Data Mining, and Wireless Communications, Dubai, UAE, 2015 Evolution: As a result of using Machine Learning algorithm to predict the stock market, The square Root of Mean Square Error (RMSE) we can summarize them into three parts: for evaluation is used for evaluation of this model: 1) Have a strong relation between the US stock and global stock that close (5) right before or at the very beginning of a US trading market time. Based on various algorithms such as baseline, SVM, linear and GLM, we can figure out the 2) We use different Machine Learning exact value of daily NAZDAQ. based model that we mentioned in this paper for predicting daily trend Table1: Stock Index Regression Accuracy and the result is high accuracy numerical. Baseline SVM Linear GLM RMSE 40.4 21.6 24.8 28.7 3) A useful trading model based on good trained predictor, which can create high benefit [10]. B) Multiclass Classification: III. PROPOSED MODEL: For minimizing trading risk and maximizing the benefit, we use the As we mentioned in section 2, one of the SVM model and start from fundamental methods for predicting is Decision Tree. In this vision in SVM algorithm. For reach to paper we want to improve the accuracy of other methods by using SVD, Eigen value and this goal, we classify the raw data into average of features. at least three categorize: positive, In Decision Tree method, we collect the data negative and neutral. We can select with 6 attributes: previous, open, min, max, these risky points and reject the last, action. prediction results. To make the multi classifier at the first we need to define Table2: Attribute Description width of the central area. Attribute Description Value Previous day close Positive, Previous price of the stock Negative, Equal Current day open Positive, Open price of the stock Negative, Equal Current day (6) Positive, Min minimum price of the Negative, Equal stock Current day Positive, Max maximum price of the Negative, Equal stock Current day close Positive, (7) Last price of the stock Negative, Equal tp: true positive The action taken by fp: false positive Action the investor on this Buy, Sell fn: false negative stock ISBN: 978-1-941968-05-5 ©2015 SDIWC 34
The Proceedings of the International Conference on Digital Information Processing, Data Mining, and Wireless Communications, Dubai, UAE, 2015 First of all, for use Decision Tree, the continues Tabel4: Sample of historical data after selecting attribute collected data should be changed to the discrete Previous Open Max Min Last Action value. Positive Positive Positive Negative Negative Sell For changing continues data to discrete data Negative Positive Positive Negative Negative Buy there is one useful criterion, which is based on Negative Negative Equal Negative Negative Buy the close market price. When the amount of the Negative Negative Equal Negative Negative Sell Negative Equal Positive Negative Positive Buy open, max, min and last are greater than Positive Negative Positive Negative Positive Buy previous attribute in the same trading day, the Positive Positive Positive Positive Positive Buy positive value should be replaced to the Positive Equal Positive Negative Negative Buy previous attribute. Otherwise, we put negative Negative Positive Positive Negative Negative Sell instead of previous attribute, and if values are equal, we choose the equal attribute. As we The next step after to reach the discrete value, mentioned in the table 3, we can see the is to build the classification model using the continuous numerical value before we select the Decision Tree. six attributes manually and before generated them to the discrete value. In this paper we assume two different scenarios: Table3: Sample of historical data before selecting Scenario 1: relevant attributes and before generalization These steps should be done as following: Previous Open Max Min Last Action 1) Collect stock market data of 30 days. 25.82 25.99 26 25.41 25.67 Sell 25.67 25.68 25.68 25.2 25.3 Buy 2) Extract the features of them in same day but in 9 different times: previous, open, 25.3 24.8 25.3 24.41 24.9 Buy Max, Min, last and volume. 24.9 24.8 24.9 24.3 24.87 Sell 3) For each feature, form the matrix. 24.87 24.87 25.55 24.85 25.3 Buy 4) Calculate XXT and apply SVD on that for generating Eigen value. 25.3 25.25 26 25.25 25.82 Buy 25.82 25.99 26.4 25.99 26.3 Buy 5) Calculate average of sell volume and buy volume. 26.3 26.3 26.3 26 26.02 Buy 6) Calculate the average of each feature. 26.02 26.09 26.09 25.55 25.63 Sell 7) Assign different weights for first day, According to table 4, show the same sample 7th day and 30th day and average of one after collecting the six attributes and month. transforming them to the discrete amount. 8) Finally for predicting the action we have to compare the present feature with first ISBN: 978-1-941968-05-5 ©2015 SDIWC 35
The Proceedings of the International Conference on Digital Information Processing, Data Mining, and Wireless Communications, Dubai, UAE, 2015 day, 7th day, 30th day and average of U: left singular vectors month and make a best decision. V: right singular vectors 9) If our present information is match with all 4 days we have to buy. If it is match with 3 of them we can buy with 25% risk and if it is match with 2 of them we can buy with 50% risk. Where In the following you can see the formula and simulation of the scenario. δ= (11) X1 X2 X3 (12) X= X4 X5 X6 X7 X8 X9 U= R V Ʃ -1 (13) Scenario 2: X1 X4 X7 X2 X5 X8 In this scenario we have to also follow same XT = X3 X6 X9 steps but instead of applying SVD on raw data, we should use autocorrelation firstly and then We have to generate this matrix for each feature apply SVD on that matrix. in 30 days where xi represents 9 different times On the other hand, for each feature we generate at the same day. autocorrelation matrix as you can see in the After that R= XXT that it means each matrix following: should be multiplied by transpose of that. C= Calculate the SVD and Eigen value by the X1 X2 X3 X4 X5 X6 X7 X8 X9 following formula: X1 X2 X3 X4 X5 X6 X7 X8 X9 R0 Calculate Eigen value: 0 X1 X2 X3 X4 X5 X6 X7 X8 R1 |[R- λI]|=0 (8) Eigen values = λ1, λ2, …, λn 0 0 X1 X2 X3 X4 X5 X6 X7 R2 Calculate Eigen vector: 0 0 0 X1 X2 X3 X4 X5 X6 R3 R- λI =0 (9) 0 0 0 0 X1 X2 X3 X4 X5 R4 Eigen vector= Y1, Y2, …, Yn 0 0 0 0 0 X1 X2 X3 X4 R5 Calculate SVD: 0 0 0 0 0 0 X1 X2 X3 R6 t SVDR= U Ʃ V (10) 0 0 0 0 0 0 0 X1 X2 R7 Where 0 0 0 0 0 0 0 0 X1 R8 ISBN: 978-1-941968-05-5 ©2015 SDIWC 36
The Proceedings of the International Conference on Digital Information Processing, Data Mining, and Wireless Communications, Dubai, UAE, 2015 (14) group numbers with lesser standard deviation are preferred. We use the autocorrelation lags to form a new So for each day we have to keep these matrix of autocorrelation call Toeplitz matrix information in order to prediction: that may contain accurate information about our raw data. 1) ϭ previous 2) ϭ open CM= 3) ϭ Max R0 R1 R2 R3 R4 R5 R6 R7 R8 4) ϭ Min R1 R0 R1 R2 R3 R4 R5 R6 R7 5) ϭ Last R2 R1 R0 R1 R2 R3 R4 R5 R6 6) Volume sell R3 R2 R1 R0 R1 R2 R3 R4 R5 7) Volume buy R4 R3 R2 R1 R0 R1 R2 R3 R4 8) Avg previous R5 R4 R3 R2 R1 R0 R1 R2 R3 9) Avg open R6 R5 R4 R3 R2 R1 R0 R1 R2 10) Avg Max R7 R6 R5 R4 R3 R2 R1 R0 R1 11) Avg Min R8 R7 R6 R5 R4 R3 R2 R1 R0 12) Avg last 13) Min Eigen Value 14) Max Eigen Value Again, we have to repeat calculation of SVD 15) Avg Eigen Value and Eigen value for this matrix. Simulation of 2 scenarios: To compare the deviation from mean value among different numbers we calculate the Scenario1: average, variance and standard deviation of each attribute and store them in a vector. Here, we apply scenario1 to data of previous feature as an example but in real world we have 1) Calculate the average to use this algorithm for six attributes. Ā = (1/M) i (15) X= Where M is number of samples 2) Calculate variance X T= Var = (1/M) i - Ā)2 (16) 3) Calculate the standard deviation XXT= Ϭ= (17) Group numbers with same average should be compared based on their standard deviation and ISBN: 978-1-941968-05-5 ©2015 SDIWC 37
The Proceedings of the International Conference on Digital Information Processing, Data Mining, and Wireless Communications, Dubai, UAE, 2015 SVD: SVD: U= U= S= VT = S= Matrix S contains the Eigen values and these Eigen values have the main data that help in stock prediction. Scenario2: C= VT = CM= Eventually for getting the best result we use all algorithms in addition to our scenarios as following: ISBN: 978-1-941968-05-5 ©2015 SDIWC 38
The Proceedings of the International Conference on Digital Information Processing, Data Mining, and Wireless Communications, Dubai, UAE, 2015 Analysis, Mean value and SVD to increase the predication rate. But we cannot reach to 100% prediction rate. We have used the Eigen value Analysis and SVD of the time series related to the stock market index, and compare the result with old models. The simulation results and our method of prediction caused that the price of stock market index based on SVD can provide a wider range of prediction. REFERENCES [1] A.F. Shapiro “Capital Market Applications of Neural Networks, Fuzzy Logic and Genetic Algorithms” , Penn State University, April 2003. [2] M.Al-Debie, M.Walker, “Fundamental Information Figure 3: cycle of stock market predicting analysis: An extension and UK evidence”, Journal of Accounting Research, 31(3), pp. 261–280. 1999 IV. CONCLUSION [3] B.Egeli, M.Ozturan, B.Badur, Stock Market Prediction Using Artificial Neural Networks. We have proposed a model that have used the [4] G.Bonde, R.Khaled, “Stock price prediction using Neural Network (NN) for the learning and genetic algorithms and evolution Strategies” curve fitting process, Genetic Algorithm (GA) [5] J.Kamber, M.Jian, “Data Mining Concepts and for the path search and optimization process. Techniques”. San Francisco, CA: Morgan Kaufmann We also used the Decision Tree and Data Publishers, 2011 Mining, using SVD to obtain the maximum accuracy of the prediction. By applying the [6] Lev, B., Thiagarajan, R. “Fundamental information decision tree classifier on the historical price of analysis”, Journal of Accounting Research, 31(2), 190– 215. 1993. the stock market we have obtained the decision rules in which give the advice to our investors [7] J. J. Murphy, Technical Analysis of the Financial to buy or sell the stock with more confidence. Markets:” a Comprehensive Guide to Trading Methods and Applications.” New York Institute of Finance. 1999 In the real world, we have more attributes that [8] Q.A. AL-Radaideh Adel Abu Assaf, E.Alnagi. can have major effects on the stock market “Predicting Stock Prices Using Data Mining Techniques” price index. Since, this events are out of our The International hand we are not able to put them in modular format or nice mathematic formula such as Arab Conference on Information Technology political event, or natural events like (ACIT’2013),pp.1,2,5 earthquake, tsunami, and the general economic [9] Ritchie, J.C, Fundamental Analysis: a Back-To-The- condition, and investor’ expectations. Basics Investment Guide to Selecting Quality Stocks. Irwin Professional Publishing. 1996 In this research, we have focused to improve the decision tree, data mining and neural [10] Rosenblatt, Frank, Principles of neuro dynamics: network techniques by using the Eigen System perceptron and the theory of brain mechanisms. Spartan ISBN: 978-1-941968-05-5 ©2015 SDIWC 39
The Proceedings of the International Conference on Digital Information Processing, Data Mining, and Wireless Communications, Dubai, UAE, 2015 Press, Washington, DC, 1961. [11] S.Shen, H.Jiang , T.Zhang , “ Stock Market Forecasting Using Machine Learning Algorithms “ . [12] P.M Tsang A, P. Kwok A,S.O. Choy A, R.Kwan B, S.C .Ng A, J. Mak A, J.Tsang C,K.Koong D, and, T.Lam Wong E. “Design and implementation of NN5 for Hong Kong stock price forecasting”, Engineering Applications of Artificial Intelligence, 20, pp. 453-461. 2007 [13] Wu, M.C., Lin, S.Y., and Lin, C.H., “An effective application of decision tree to stock trading”, Expert Systems with Applications, 31, pp. 270-274. 2006 [14] Y.F.Wang, “Predicting stock price using fuzzy grey prediction system”, Expert Systems with Applications, 22, pp. 33-39. 2002 [15] www.tradersfly.com ISBN: 978-1-941968-05-5 ©2015 SDIWC 40
You can also read