ANN-Based Forecasting of Foreign Currency Exchange Rates
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Neural Information Processing - Letters and Reviews Vol. 3, No. 2, May 2004 LETTER ANN-Based Forecasting of Foreign Currency Exchange Rates Joarder Kamruzzaman Gippsland School of Computing and Information Technology Monash University, Churchill, Victoria-3842, Australia E-mail: Joarder Kamruzzama@infortach.monash.edu.au Ruhul A. Sarker School of Information Technology and Electrical Engineering University of NSW, ADFA Campus, NorthCott Drive, Canberra 2600, Australia E-mail: ruhul@cs.adfa.edu.au (Submitted on March 24, 2004) Abstract - In this paper, we have investigated artificial neural networks based prediction modeling of foreign currency rates using three learning algorithms, namely, Standard Backpropagation (SBP), Scaled Conjugate Gradient (SCG) and Backpropagation with Bayesian Regularization (BPR). The models were trained from historical data using five technical indicators to predict six currency rates against Australian dollar. The forecasting performance of the models was evaluated using a number of widely used statistical metrics and compared. Results show that significantly close prediction can be made using simple technical indicators without extensive knowledge of market data. Among the three models, SCG based model outperforms other models when measured on two commonly used metrics and attains comparable results with BPR based model on other three metrics. The effect of network architecture on the performance of the forecasting model is also presented. Future research direction outlining further improvement of the model is discussed. Keywords-Neural network, ARIMA, financial forecasting, foreign exchange 1. Introduction In the past, foreign exchange rates were only determined by the balance of payments. The balance of payments was merely a way of listing receipts and payments in international transactions for a country. Payments involve a supply of the domestic currency and a demand for foreign currencies. Receipts involve a demand for the domestic currency and a supply of foreign currencies. The balance was determined mainly by the import and export of goods. Thus, the prediction of the exchange rates was not a complex issue in the past. Unfortunately, the interest rates, and local and international supply-demand factors had become more relevant to each currency later on. On the top of this, the fixed foreign exchange rates were abandoned and a floating exchange rate system was implemented by the industrialized nations in 1973. Recently, further liberalization of trades is being discussed in General Agreement on Trade and Tariffs. Due to the introduction of floating exchange rates and the rapid expansion of global trading markets over the last few decades, the foreign currency exchange market has experienced unprecedented growth. In fact, the exchange rates play an important role in controlling dynamics of the exchange market as well as the trading of goods in the import-export markets. For example, if the Australian currency is weaker than the US currency, the US traders would prefer to import certain Australian goods, and the Australian producers and traders would find the US as an attractive export market. On the other hand, if Australia is dependent on US for importing certain goods, it will then be too costly for the Australian consumers under the current exchange rates. In that case, Australia may look for a better source that means shifting from the US to a new import market. As we can imagine, the trade relation and the cost of export/import goods is directly dependent on the currency exchange rate of the trading partners. The duration of any international trade agreement between any two partners could vary from short term to many years. However, the exchange rates vary continuously during the trading hours. As a result, the appropriate prediction of exchange rate is a crucial factor for the success of many businesses and 49
ANN-Based Forecasting of Foreign Currency Exchange Rate J. Kamruzzaman and R. Sarker financial institutions. Although the market is well-known for its unpredictability and volatility, there exist a number of interest groups for predicting exchange rates using numerous conventional and modern techniques. Exchange rates prediction is one of the challenging applications of modern time series forecasting. The rates are inherently noisy, non-stationary and chaotic [5, 27]. These characteristics suggest that there is no complete information that could be obtained from the past behaviour of such markets to fully capture the dependency between the future exchange rates and that of the past. One general assumption is made in such cases is that the historical data incorporate all those behaviour. As a result, the historical data is the major player in the prediction process. Although the well-known conventional forecasting techniques provide predictions, for many stable forecasting systems, of acceptable quality, these techniques seem inappropriate for non-stationary and chaotic system such as currency exchange rates, interest rates and share prices. The purpose of this paper is to investigate the use of artificial neural networks based techniques for prediction of foreign currency exchange rates. We planned to experiment with three different algorithms under different architecture for several different exchange rates. For more than two decades, Box and Jenkins’ Auto-Regressive Integrated Moving Average (ARIMA) technique [2] has been widely used for time series forecasting. Because of its popularity, the ARIMA model has been used as a benchmark to evaluate many new modelling approaches [9]. However, ARIMA is a general univariate model and it is developed based on the assumption that the time series being forecasted are linear and stationary [3]. As we indicated earlier, ARIMA would not be the right technique for prediction of exchange rates. The Artificial Neural Networks, the well-known function approximators in prediction and system modelling, has recently shown its great applicability in time-series analysis and forecasting [25-28]. ANN assists multivariate analysis. Multivariate models can rely on grater information, where not only the lagged time series being forecast, but also other indicators (such as technical, fundamental, inter-marker etc. for financial market), are combined to act as predictors. In addition, ANN is more effective in describing the dynamics of non- stationary time series due to its unique non-parametric, non-assumable, noise-tolerant and adaptive properties. ANNs are universal function approximators that can map any nonlinear function without a priori assumptions about the data [3]. In several applications, Tang and Fishwich [22], Jhee and Lee [10], Kamruzzaman and Sarker [12-13], Wang and Leu [23], Hill et al. [8], and many other researchers have shown that ANNs perform better than ARIMA models, specifically, for more irregular series and for multiple-period-ahead forecasting. Kaastra and Boyd [11] provided a general introduction of how a neural network model should be developed to model financial and economic time series. Many useful, practical considerations were presented in their article. Zhang and Hu [28] analysed backpropagation neural networks' ability to forecast an exchange rate. Wang [24] cautioned against the dangers of one-shot analysis since the inherent nature of data could vary. Klein and Rossin [14] proved that the quality of the data also affects the predictive accuracy of a model. More recently, Yao et al. [25] evaluated the capability of a backpropagation neural-network model as an option price forecasting tool. They also recognised the fact that neural-network models are context sensitive and when studies of this type are conducted, it should be as comprehensive as possible for different markets and different neural-network models. In this paper, we apply ANNs for predicting currency exchange rates of Australian Dollar against six other currencies such as US Dollar (USD), Great British Pound (GBP), Japanese Yen (JPY), Singapore Dollar (SGD), New Zealand Dollar (NZD) and Swiss Franc (CHF) using their historical exchange rates. Three different ANNs based models using the existing learning algorithms such as standard backpropagation, scaled conjugate gradient and Baysian regularization were considered. A total of 500 historical exchange rates data (closing rate of each week), for each of six currency rates, were collected and used as inputs to build the prediction models in our study, and then additional 65 exchange rates data were used to evaluate the models. The prediction results of all these models, for 35 and 65 weeks, were compared based on five different evaluation indicators such as Normalized Mean Square Error (NMSE), Mean Absolute Error (MAE), Directional Symmetry (DS), Correct Up trend (CU) and Correct Down trend (CD). The results show that scaled conjugate gradient and Baysian regularization based models show competitive results and these models forecasts more accurately than standard Backpropagation which has been studied considerably in other studies. In section 2, ANN forecasting model and performance metrics are defined. Section 3 and section 4 describe experimental results and conclusion, respectively. 2. Neural Network Forecasting Model Neural networks are a class of nonlinear model that can approximate any nonlinear function to an arbitrary degree of accuracy and have the potential to be used as forecasting tools in many different areas. The most commonly used neural network architecture is multilayer feedforward network. It consists of an input layer, an 50
Neural Information Processing - Letters and Reviews Vol. 3, No. 2, May 2004 output layer and one or more intermediate layer called hidden layer. All the nodes at each layer are connected to each node at the upper layer by interconnection strength called weights. A training algorithm is used to attain a set of weights that minimizes the difference the target and actual output produced by the network. There are many different neural net learning algorithms found in the literature. No study has been reported to analytically determine the generalization performance of each algorithm. In this study, we experimented with three different neural network learning algorithms, namely standard Backpropagation (BP), Scaled Conjugate Gradient Algorithm (SCG) and Backpropagation with Regularization (BPR) in order to evaluate which algorithm predicts the exchange rate of Australian dollar most accurately. In the following we describe the algorithms briefly. 2.1 Learning Algorithms Standard Backpropagation (SBP): Backpropagation [21] updates the weights iteratively to map a set of input vectors (x1,x2,…,xp) to a set of corresponding output vectors (y1,y2,…,yp). The input xp is presented to the network and multiplied by the weights. All the weighted inputs to each unit of upper layer are summed up, and produce output governed by the following equations. y p = f ( Wo hp + θo), (1) h p = f ( W h x p + θh ), (2) where Wo and Wh are the output and hidden layer weight matrices, hp is the vector denoting the response of hidden layer for pattern ‘p’, θo and θh are the output and hidden layer bias vectors, respectively and f(.) is the sigmoid activation function. The cost function to be minimized in standard Backpropagation is the sum of squared error defined as 1 E = ∑ (t p − y p ) (t p − y p ) T (3) 2 p where tp is the target output vector for pattern ‘p’. The algorithm uses gradient descent technique to adjust the connection weights between neurons. Denoting the fan-in weights to a single neuron by a weight vector w, its update in the t-th epoch is governed by the following equations. ∆w t = − η ∇E (w ) w = w (t) +α ∆w t -1 (4) The parameters η and α are the learning rate and the momentum factor, respectively. The learning rate parameter controls the step size in each iteration. For a large-scale problem Backpropagtion learns very slowly and its convergence largely depends on choosing suitable values of η and α by the user. Scaled Conjugate Gradient (SCG): The error surface in Backpropagation may contain long ravines with sharp curvature and gently sloping floor which causes slow convergence. In conjugate gradient methods, a search is performed along conjugate directions, which produces generally faster convergence than steepest descent directions [7]. In steepest descent search, a new direction is perpendicular to the old direction. This approach to the minimum is a zigzag path and one step can be mostly undone by the next. In conjugate gradient methods, a new search direction spoils as little as possible the minimization achieved by the previous direction and the step size is adjusted in each iteration. The general procedure to determine the new search direction is to combine the new steepest descent direction with the previous search direction so that the current and previous search directions are conjugate. Conjugate gradient techniques are based on the assumption that, for a general non- quadratic error function, error in the neighborhood of a given point is locally quadratic. The weight changes in successive steps are given by the following equations. wt +1 = wt + αt dt (5) dt = − gt + βt dt −1 (6) with gt ≡ ∇ E ( w ) w = w t (7) ∆gt −1 gt ∆gt −1 gt T T T gt gt βt = T or βt = T or βt = T (8) gt −1 gt −1 gt −1 dt −1 gt −1 gt −1 where dt and dt-1 are the conjugate directions in successive iterations. The step size is governed by the coefficient αt and the search direction is determined by βt. In scaled conjugate gradient the step size αt is calculated by the following equations. 51
ANN-Based Forecasting of Foreign Currency Exchange Rate J. Kamruzzaman and R. Sarker T dt g t αt = − (9) δt 2 δ t = dt H t dt + λ t dt T (10) where λt is the scaling co-efficient and Ht is the Hessian matrix at iteration t. λ is introduced because, in case of non-quadratic error function, the Hessian matrix need not be positive definite. In this case, without λ, δ may become negative and weight update may lead to an increase of error function. With sufficiently large λ, the modified Hessian is guaranteed to be positive (δ > 0). However, for large values of λ, step size will be small. If the error function is not quadratic or δ 0. In case of δ 0.75, λt+1=λt/2; For ∆t < 0.25, λt+1=4λt ; Otherwise, λt+1=λt Bayesian Reguralization (BPR): A desired neural network model should produce small error not only on sample data but also on out of sample data. To produce a network with better generalization ability, MacKay [17] proposed a method to constrain the size of network parameters by regularization. Regularization technique forces the network to settle to a set of weights and biases having smaller values. This causes the network response to be smoother and less likely to overfit [7] and capture noise. In regularization technique, the cost function F is defined as F = γ E D + (1 − γ ) EW (14) 2 where ED is the same as E defined in Eq. (3), Ew= w / 2 is the sum of squares of the network parameters, and γ (
Neural Information Processing - Letters and Reviews Vol. 3, No. 2, May 2004 If we assume a uniform prior distribution p(γ|M) for the regularization parameter γ, then maximizing the posterior probability is achieved by maximizing the likelihood function p(D|γ,M). Since all probabilities have a Gaussian form it can be expressed as −N / 2 −L / 2 p( D | γ , M ) = (π /γ) [π /(1−γ )] Z F (γ ) (17) where L is the total number of parameters in the NN. Supposing that F has a single minimum as a function of w at w* and has the shape of a quadratic function in a small area surrounding that point, ZF is approximated as [17] Z F ≈ (2π ) L/2 −1 / 2 * * det H exp( − F (w )) (18) 2 2 where H=γ∇ ED +(1-γ)∇ EW is the Hessian matrix of the objective function. Using Eq. (18) into Eq. (17), the optimum value of γ at the minimum point can be determined. Foresee and Hagan [6] propose to apply Gauss-Newton approximation to Hessian matrix, which can be conveniently implemented if the Lebenberg-Marquart optimization algorithm [20] is used to locate the minimum point. This minimizes the additional computation required for regularization. 2.2 Forecasting Model Technical and fundamental analyses are the two major financial forecasting methodologies. In recent times, technical analysis has drawn particular academic interest due to the increasing evidence that markets are less efficient than was originally thought [15]. Like many other economic time series model, exchange rate exhibits its own trend, cycle, season and irregularity. In this study, we used time delay moving average as technical data. The advantage of moving average is its tendency to smooth out some of the irregularity that exits between market days [26]. In our model, we used moving average values of past weeks to feed to the neural network to predict the following week’s rate. The indicators are MA5, MA10, MA20, MA60, MA120 and Xi, namely, moving average of one week, two weeks, one month, one quarter, half year and last week's closing rate, respectively. The predicted value is Xi+1. So the neural network model has 6 inputs for six indicators, one hidden layer and one output unit to predict exchange rate. Yao et al. [26] has reported that increasing the number of inputs does not necessarily improve the performance. Historical data are used to train the model. Once trained the model is used for forecasting. 3. Experimental Results and Discussion In the following, we describe the forex data collection, performance metrics to evaluate and compare the predictive power of the models and the simulation results. 3.1 Data collection The data used in this study is the foreign exchange rate of six different currencies against Australian dollar from January 1991 to July 2002 made available by the Reserve Bank of Australia. We considered exchange rate of US dollar (USD), British Pound (GBP), Japanese Yen (JPY), Singapore dollar (SGD), New Zealand dollar (NZD) and Swiss Franc (CHF). As outlined in Section 2.2, 565 weekly data was considered of which first 500 weekly data was used is training and the remaining 65 weekly data for evaluating the model. The plots of historical rates for USD, GBP, SGD, NZD, CHF are shown in Fig. 1(a) and for JPY in Fig. 1(b). 1.6 120 USD GBP SGD JPY 1.4 NZD CHF 100 1.2 Exchange Rate Exchange rate 1 80 0.8 60 0.6 40 0.4 0.2 20 0 0 1 51 101 151 201 251 301 351 401 451 501 551 1 51 101 151 201 251 301 351 401 451 501 551 Week Num ber Week Num ber (a) (b) Figure 1. Historical exchange rates for (a) USD, GBP, SGD, NZD and CHF (b) JPN against Australian dollar. 53
ANN-Based Forecasting of Foreign Currency Exchange Rate J. Kamruzzaman and R. Sarker 3. 2 Performance Metrics The forecasting performance of the above model is evaluated against a number of widely used statistical metric, namely, Normalized Mean Square Error (NMSE), Mean Absolute Error (MAE), Directional Symmetry (DS), Correct Up trend (CU) and Correct Down trend (CD). These criteria are defined in Table 1. x k and xˆ k are the actual and predicted values, respectively. NMSE and MAE measure the deviation between actual and forecast value. Smaller values of these metrics indicate higher accuracy in forecasting. Additional evaluation measures include the calculation of correct matching number of the actual and predicted values with respect to sign and directional change. DS measures correctness in predicted directions while CU and CD measure the correctness of predicted up and down trend, respectively. Table 1: Performance Metrics to Evaluate the Forecasting Accuracy of the Model. 2 ∑ ( x k − xˆ k ) 1 2 NMSE = k = ∑ ( x k − xˆ k ) σ 2 2 ∑ (x k − x k) N k k 1 MAE = xk − xˆ k N 100 ⎧1 if ( x k − x k −1) ( xˆ k − xˆ k −1) ≥ 0 DS = ∑d , dk = ⎨ N k k ⎩0 otherwise ∑dk k CU = 100 , ∑ tk k ⎧1 if ( xˆ k − xˆ k −1) > 0, ( x k − x k −1) ( xˆ k − xˆ k −1) ≥ 0 ⎧1 if ( x k − x k −1) > 0 dk = ⎨ , tk = ⎨ ⎩0 otherwise ⎩0 otherwise ∑ dk CD = 100 k ∑ tk k ⎧1 if ( xˆ k − xˆ k −1) < 0, ( x k − x k −1) ( xˆ k − xˆ k −1) ≥ 0 ⎧1 if ( x k − x k −1) < 0 dk = ⎨ , t k = ⎨0 otherwise ⎩0 otherwise ⎩ 3.3 Simulation Results A neural network model was trained with six inputs representing the six technical indicators, a hidden layer and an output unit to predict the exchange rate. The final set of weights to which a network settles down (and hence its performance) depends on a number of factors, e.g., initial weights chosen, different learning parameters used during training (described in section 2.1) and the number of hidden units. For each algorithm, we trained 30 different networks with different initial weights and learning parameters. The number of hidden units was varied between 3~7 and the training was terminated at iteration number between 5000 to 10000. The network that yielded the best result out of the 30 trials in each algorithm is presented here. We measured the performance metrics on the test data to investigate how well the neural network forecasting model captured the underlying trend of the movement of each currency against Australian dollar. Table 2 shows the performance metrics achieved by each model over a forecasting period of 35 weeks and Table 3 shows the same over 65 weeks (previous 35 weeks plus additional 30 weeks). The results show that SCG and BPR model consistently performs better than SBP model in terms of all performance metrics in almost all the currency exchange rates. For example, in case of forecasting US dollar rate over 35 weeks, NMSE achieved by SCG and BPR is quite low and is almost half of that achieved by SBP. This means these models are capable of predicting exchange rates more closely than SBP. Also, in predicting trend directions SCG and BPR is almost 10% more accurate than SBP. The reason of better performance by SCG and BPR algorithm is the improved learning technique which allows them to search efficiently in weight space for solution. Similar trend is observed in predicting other currencies. 54
Neural Information Processing - Letters and Reviews Vol. 3, No. 2, May 2004 Table 2: Measurement of Prediction Performance Table 3: Measurement of Prediction Performance over over 35 Week Prediction 65 Week Prediction. NN Performance metrics NN Performance metrics Currency Currency model NMSE MAE DS CU CD model NMSE MAE DS CU CD SBP 0.5041 0.0047 71.42 76.47 70.58 SBP 0.0937 0.0043 75.38 81.57 69.23 US. US. SCG 0.2366 0.0033 82.85 82.35 88.23 SCG 0.0437 0.0031 83.07 78.94 92.30 Dollar Dollar BPR 0.2787 0.0036 82.85 82.35 88.23 BPR 0.0441 0.0030 83.07 78.94 92.30 SBP 0.5388 0.0053 77.14 75.00 78.94 SBP 0.2231 0.0038 80.00 75.75 87.09 B. Pound SCG 0.1578 0.0030 77.14 81.25 73.68 B. Pound SCG 0.0729 0.0023 84.61 87.87 83.87 BPR 0.1724 0.0031 82.85 93.75 73.68 BPR 0.0790 0.0024 87.69 93.93 83.87 SBP 0.1530 0.6372 74.28 72.72 76.92 SBP 0.0502 0.5603 76.92 75.67 78.57 J. Yen SCG 0.1264 0.6243 80.00 81.81 76.92 J. Yen SCG 0.0411 05188 81.53 83.78 78.57 BPR 0.1091 0.5806 80.00 81.81 76.92 BPR 0.0367 0.5043 81.53 83.78 78.57 SBP 0.2950 0.0094 85.71 82.35 88.88 SBP 0.0935 0.0069 83.07 82.35 83.87 S. Dollar SCG 0.2321 0.0076 82.85 82.35 83.33 S. Dollar SCG 0.0760 0.0060 86.15 88.23 83.87 BPR 0.2495 0.0080 82.85 82.35 83.33 BPR 0.0827 0.0063 86.15 88.23 83.87 SBP 0.1200 0.0046 77.14 75.00 78.94 SBP 0.0342 0.0042 78.46 71.42 86.11 NZ NZ SCG 0.0878 0.0038 85.71 87.50 84.21 SCG 0.0217 0.0033 84.61 82.14 88.88 Dollar Dollar BPR 0.0898 0.0039 85.71 87.50 84.21 BPR 0.0221 0.0033 84.61 82.14 88.88 SBP 0.1316 0.0101 80.00 75.00 86.66 SBP 0.1266 0.0098 83.07 80.00 86.66 S. Franc SCG 0.0485 0.0059 82.85 80.00 86.66 S. Franc SCG 0.0389 0.0052 84.61 84.61 86.66 BPR 0.0496 0.0057 80.00 75.00 86.66 BPR 0.0413 0.0051 81.53 77.14 86.66 Between SCG and BPR models, the former performs better in all currencies except Japanese Yen in terms of the two most commonly used criteria, i.e., NMSE and MAE. In terms of other metrics, SCG yields slightly better performance in case of Swiss France, BR slightly better in US Dollar and British Pound, both perform equally in case of Japanese Yen, Singapore and New Zealand Dollar. In both algorithms, the directional change prediction accuracy is above 80% which is much improvement than the 70% accuracy achieved in [26] in a similar study. The comparative diagrams showing the output forecast by neural network model and actual time series over 65 weeks for six currencies are shown in Fig. 2(a)~(f). Fig 2(a) show the forecasting of USD by all the three models. The plots show that the forecasting by SCG and BPR models more closely follows the actual rate. For other currencies SCG model prediction is very close to the actual exchange rate. In this study, we also investigated the influence of neural network architecture on prediction performance. Using SCG learning algorithm, the model was trained with different number of hidden units to predict USD against AUD. Out of 30 successful trials, the prediction performance of the best trial is reported in Table 4. The architecture of the network is denoted by ‘i-h-o’ indicating i neurons in input layer, h neuron in hidden layer and o neuron in output layer. The performance varies with the number of hidden units. However, in all cases the performance is better than that of SBP model. The best performance is achieved with three units, increasing the number of hidden nodes (after three) deteriorates the performance. Table 4: Effect of hidden unit number on prediction performance. Architec 35 week prediction 65 week prediction ture NMSE MAE DS CU CD NMSE MAE DS CU CD 6-2-1 0.3228 0.0036 80.00 82.35 82.35 0.0596 0.0033 81.53 78.94 88.46 6-3-1 0.2366 0.0033 82.85 82.35 88.23 0.0437 0.0031 83.07 78.94 92.30 6-4-1 0.3349 0.0039 77.14 76.47 82.35 0.0548 0.0034 80.00 76.31 88.46 6-5-1 0.3401 0.0041 74.28 76.47 76.47 0.0564 0.0034 78.46 78.94 80.76 6-6-1 0.2975 0.0036 77.14 76.47 82.35 0.0512 0.0033 80.00 76.31 88.46 6-7-1 0.3517 0.0041 74.28 76.47 76.47 0.0808 0.0043 78.46 76.31 84.61 The generalization ability of neural networks, i.e., its ability to produce correct output in response to an unseen input is influenced by a number of factors: 1) the size of the training set, 2) the degrees of freedom of the network related to the architecture, and 3) the physical complexity of the problem at hand. Practically we have 55
ANN-Based Forecasting of Foreign Currency Exchange Rate J. Kamruzzaman and R. Sarker 0.6 0.4 Actual 0.39 Actual 0.58 Forecast (SBP) 0.38 Forecast (SCG) Forecast (BPR) 0.56 Forecast (SCG) 0.37 0.54 0.36 0.52 0.35 0.34 0.5 0.33 0.48 0.32 (b) GBP/AUD (a) USD/AUD 0.46 0.31 0.44 0.3 1 11 21 31 41 51 61 1 11 21 31 41 51 61 1.05 75 Actual Actual Forecast (SCG) Forecast (SCG) 70 1 65 0.95 60 0.9 55 (c) JPY/AUD 0.85 (d) SGD/AUD 50 0.8 1 11 21 31 41 51 61 1 11 21 31 41 51 61 1.3 1 Actual Actual Forecast (SCG) 0.95 Forecast (SCG) 1.26 0.9 1.22 0.85 1.18 0.8 1.14 0.75 (e) NZD/AUD (f) CHF/AUD 1.1 0.7 1 11 21 31 41 51 61 1 11 21 31 41 51 61 Figure 2. Forecasting of different currencies by SCG based neural network model over 65 weeks. no control on the problem complexity, and in our simulation the size of the training set is fixed. This leaves the generalization ability, i.e., performance of the model dependent on the architecture of the corresponding neural network. Generalization performance can also be related to the complexity of the model in the sense that, in order to achieve best generalization, it is important to optimize the complexity of the prediction model [1]. In case of neural networks, the complexity can be varied by changing the number of adaptive parameters in the network. A network with fewer weights is less complex than one with more weights. It is known that the “simplest hypothesis/model is least likely to overfit”. A network that uses the least number of weights and biases to achieve a given mapping is least likely to overfit the data and is most likely to generalize well on the unseen data. If redundancy is added in the form of extra hidden unit or additional parameters, it is likely to degrade 56
Neural Information Processing - Letters and Reviews Vol. 3, No. 2, May 2004 performance because more than the necessary number of parameters is used to achieve the same mapping. In case nonlinear regression, two extreme solutions should be avoided: filtering out the underlying function or underfitting (not enough hidden neurons), or modeling of noise or overfitting data (too many hidden neurons). This situation is also known as bias-variance dilemma. One way of controlling the effective complexity of the model in practice is to compare a range of models having different architectures and select the one that yields best performance on the test data. As can be seen in our simulation that the network with 3 hidden units is optimum yielding best performance. Increasing the hidden units adds additional parameters, introduces redundancy and deteriorates the performance. 4. Conclusion and further Research This paper has presented and compared three different neural network models to perform foreign currency exchange forecasting using simple technical indicators. Scaled conjugate gradient based model achieves closer prediction of all the six currencies than other models. As Medeiros et al. argues in [18], there are evidences that favour linear and nonlinear models against random walk, and nonlinear models stand a better chance when nonlinearity is spread in time series. A neural network model with improved learning technique is thus a promising candidate for forex prediction. Results in this study shows that SCG neural network model achieves very close prediction in terms of NMSE and MAE metrics. Several authors has argued that directional change metrics may be a better standard for determining the quality of forecasting. In terms of this metric, SCG model achieves comparable results to BPR model. However, which metric should be given more importance depends on trading strategies and how the prediction is used for trading advantages [25]. Both SCG and BPR based models attain significantly high rate of predicting correct directional change (above 80%). In this study, we considered six technical indicators as inputs. Some other technical indicators like RSI (relative strength index), momentums can also be considered. In that case sensitivity analysis may be done to eliminate less sensitive variables. We investigated the effect of network architecture on performance metrics. The performance varies slightly with the number of units in hidden layer. The optimum number of hidden units needs to be determined by trial and error. In [26] Yao et al. outlined the need for an automatic model building facility to avoid tedious trial and error method. This can be achieved by employing a constructive algorithm [16] that starts with a minimum architecture and automatically grows by incrementally adding hidden units and layers one by one. It is claimed that this type of architecture is most likely to build an optimum or near optimum architecture automatically. This would further add the flexibility of the model but its performance in forex prediction needs to be investigated. Chan et al. [4] has applied PNN (Probabilistic Neural Network) for forecasting and trading of stock index with a view that training PNN is faster that enables the user to develop a frequently updated training scheme. In actual applications, retraining of a forecasting model with the most recent data may be necessary to increase the chance of achieving a better forecast. Fast learning may be an advantage but it does not always guarantee an improved performance. A comparison between PNN-based forecasting model and our model, with retraining at some interval, will assess the suitability of the model in such application. Further investigation in this direction will focus on formulating an objective function in ANN model that takes other goodness measures e.g., trading strategies and profit into consideration. Recently several applications of combining wavelet transformation in financial time series forecasting has been proposed. A hybrid system using wavelet techniques in neural networks looks promising [29]. However, finding the best way of combining wavelet techniques and neural networks in forex prediction needs to be investigated and remains the focus of future research. References [1] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, NY, 1995. [2] G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Francosco, CA, 1990. [3] L. Cao and F. Tay, “Financial forecasting using support vector machines,” Neural Comput. & Applic, vol. 10, pp.184- 192, 2001. [4] A. Chen, M. Leung and H. Daouk, “Application of neural netwroks to an emerging financial market: forecasting and trading the Taiwan Stock Index,” Computer & Operations Research, vol. 30, pp. 901-903, 2003. [5] G. Deboeck, Trading on the Edge: Neural, Genetic and Fuzzy Systems for Chaotic Financial Markets, New York Wiley, 1994. [6] F. D. Foresee and M.T. Hagan, “Gauss-Newton approximation to Bayesian regularization,” Proc. IJCNN 1997, pp. 1930-1935, 1997. [7] M.T. Hagan, H.B. Demuth and M.H. Beale, Neural Network Design, PWS Publishing, Boston, MA, 1996. 57
ANN-Based Forecasting of Foreign Currency Exchange Rate J. Kamruzzaman and R. Sarker [8] T. Hill, M. O’Connor and W. Remus, “Neural network models for time series forecasts,” Management Science, vol. 42, pp 1082-1092, 1996. [9] H. B. Hwarng and H. T. Ang, “A simple neural network for ARMA(p,q) time series,” OMEGA: Int. Journal of Management Science, vol. 29, pp 319-333, 2002. [10] W. C. Jhee and J. K. Lee, “Performance of neural networks in managerial forecasting,” Intelligent Systems in Accounting, Finance and Management, vol. 2, pp 55-71, 1993. [11] I. Kaastra and M. Boyd, “Designing a neural network for forecasting financial and economic time-series,” Neurocomputing, vol. 10, pp. 215-236, 1996. [12] J. Kamruzzaman and R. Sarker, “Comparing ANN based models with ARIMA for prediction of forex rates,” ASOR Bulletin, vol. 22, pp. 2-11, 2003. [13] J. Kamruzzaman and R. Sarker, “Forecasting of currency exchange rates using ANN: a case study,” Proc. of IEEE Int. Conf. on Neural Networks & Signal Processing (ICNNSP03), pp. 793-797, 2003. [14] B. D. Klein and D. F. Rossin, “Data quality in neural network models: effect of error rate and magnitude of error on predictive accuracy,” OMEGA: Int. Journal of Management Science, vol. 27, pp 569-582, 1999. [15] B. LeBaron, “Technical trading rule profitability and foreign exchange intervention,” Journal of Int. Economics, vol. 49, pp. 124-143, 1999. [16] M. Lehtokangas, “Modified constructive backpropagation for regression,” Neurocomputing, vol. 35, pp. 113-122, 2000. [17] D.J.C. Mackay, “Bayesian interpolation,” Neural Computation, vol. 4, pp. 415-447, 1992. [18] M.C. Medeiros, A. Veiga and C.E. Pedreira, “Modeling exchange rates: smooth transitions, neural network and linear models,” IEEE Trans. Neural Networks, vol. 12, no. 4, 2001. [19] A.F. Moller, “A scaled conjugate gradient algorithm for fast supervised learning,” Neural Networks, vol. 6, pp.525-533, 1993. [20] J.J. More, “The Levenberg-Marquart algorithm: implementation and theory, in: G.A. Watson, ed., Numerical analysis,” Lecture Notes in Mathematics 630, pp. 105-116, Springer-Verlag, London, 1977. [21] D.E Rumelhart, J.L. McClelland and the PDP research group, Parallel Distributed Processing, vol. 1, MIT Press, 1986. [22] Z. Tang and P. A. Fishwich, “Backpropagation neural nets as models for time series forecasting,” ORSA Journal on Computing, vol. 5, no. 4, pp 374-385, 1993. [23] J. H.Wang and J. Y. Leu, “Stock market trend prediction using ARIMA-based neural networks,” Proc. of IEEE Int. Conf. on Neural Networks, vol. 4, pp. 2160-2165, 1996. [24] S. Wang, “An insight into the standard back-propagation neural network model for regression analysis,” OMEGA: Int. Journal of Management Science, vol. 26, pp.133-140, 1998. [25] J. Yao, Y. Li and C. L. Tan, “Option price forecasting using neural networks,” OMEGA: Int. Journal of Management Science, vol. 28, pp 455-466, 2000. [26] J. Yao and C.L. Tan, “A case study on using neural networks to perform technical forecasting of forex,” Neurocomputing, vol. 34, pp. 79-98, 2000. [27] S. Yaser and A. Atiya, “Introduction to Financial Forecasting,” Applied Intelligence, vol. 6, pp 205-213, 1996. [28] G. Zhang and M. Y. Hu, “Neural network forecasting of the British Pound/US dollar exchange rate,” OMEGA: Int. Journal of Management Science, vol. 26, pp. 495-506, 1998. [29] B. Zhang, R. Coggins, M. Jabri, D. Dersch and B. Flower, “Multiresolution forecasting for future trading using wavelet decomposition,” IEEE Trans. Neural Networks, special issue in Financial Engg., vol. 12, no.4, pp.765-775, 2000. Joarder Kamruzzaman received his B.Sc and M.Sc in Electrical Engineering from Bangladesh University of Engineering and Technology, Dhaka, Bangladesh in 1986 and 1989 respectively, and PhD in Information Systems Engineering from Muroran Institute of Technology, Japan in 1993. Currently he is a faculty member in the Faculty of Information Technology, Monash University, Australia. His research interest includes neural networks, fuzzy system, genetic algorithm, computer networks and bioinformatics etc. Ruhul Sarker received his Ph.D. in 1991 from DalTech, Dalhousie University, Halifax, Canada, and is currently a senior faculty member in the School of Information Technology and Electrical Engineering, University of New South Wales, ADFA Campus, Canberra, Australia. His main research interests are Evolutionary Optimization, Neural Networks and Applied Operations Research. He has recently edited four books and has published more than 90 refereed papers in international journals and conference proceedings. Dr Sarker is actively involved with a number of national and international conferences & workshop organizations in the capacity of chair / co-chair or program committee member. He has recently served as a technical co-chair for IEEE-CEC2003. He is also a member of task force for promoting evolutionary multi-objective optimization operated by IEEE Neural Network Society. 58
You can also read