BIGDATA PROCESSING USING MAPREDUCE FOREIGN EXCHANGE (EUR/USD CURRENCY PAIR)
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
BigData Processing Using MapReduce Foreign Exchange (EUR/USD Currency Pair) Say Er Lim University Malaya, Selangor, Malaysia say2@siswa.um.edu.my Hui Kim Law, Saeed Aghabozorgi, Ying Wah Teh , and Tutut Herawan University Malaya, Selangor, Malaysia {stephyi_hk.qin@siswa.um.edu.my, saeed@um.edu.my, tehyw@um.edu.my, tutut@um.edu.my } Abstract - This paper describes how using Hadoop The foreign exchange market is representing the largest MapReduce to process big data. The big data that used in asset class in the world leading to high liquidity, it is this project is foreign exchange rate of EUR/USD currency unique and its trading volume is huge. The foreign pair which taken day by day within a minute. Firstly, the exchange market operates continuously day by day with foreign exchange data will load into a Linux environment 24 hours per day. Thus, the exchange rates are that stimulated by the Ubuntu that already set up in a desktop computer by using Hadoop MapReduce function. inconsistent, it might change every day with every minute After that, we extract the required data from the Hadoop either rise or decline. Foreign exchange rate is among the that has been successfully loaded. Then, those data are used most important economic indices in the international to show time series and predict the foreign exchange rate for monetary market. the future (e.g. the next day). In foreign exchange markets, normally we have two sets of price data which are bid and ask price. Ask is the Index Term - foreign exchange rate, big data, Hadoop, price that the broker will sell you the position you MapReduce, predict, moving average. required, while bid price is the price which a broker will buy your current day trading position from you. Broker I. INTRODUCTION uses the bid and ask price to buy current trading position Advance in technology and social networks have or use it to sell the trading position to intended buyer. In brought a lot of data. The volume of data is increasing, addition, there have two sets of data to refer to the become more complex, high velocity and the type of data opening and the closing price end of the period is variable. The size of big data might be petabytes, it respectively in foreign exchange chart. There have a lot collected by millions of people that consisting of billions of factors can affect the ask and bid price of foreign to trillions of record. Furthermore, big data is coming exchange market such as volatility of trading market, from a variety of sources such as social media, web, differentials in interest rates, differentials in inflations sales, customer information and other. The large and and other. The effect of foreign exchange fluctuations complex data sets are difficult and slow to process might affect the profitability of an organization’s efficiently by using traditional data processing business and caused the organization is put to exchange applications. The challenges of those processing risk. Due to foreign exchange market trade is operating applications are hard to process, capture, store, transfer every day, so the data for foreign exchange market is and analysis the data sets. large and high rate fluctuation. Therefore, these data need The big data used in this project is the EUR/USD to be processed, stored, analyzed and predicted in order foreign exchange’s data. Foreign exchange is the to see the trend of the foreign exchange and help the conversion of currency into another currency. The buyer and seller to identify and make a profit trading. definition of foreign exchange from Cambridge In this paper, we will explain about installation of Advanced Learner’s Dictionary & Thesaurus is described Ubuntu and configuration of Hadoop to store data and as the system by which the type of money used in one retrieve it. Then we will explain about the Moving country is exchanged for another country’s money, Average approach which is used to predict the foreign making international trade easier. The foreign exchange exchange. market enables currency conversion to assists The rest of this paper is organized as follows. In international trade and investment. US dollar (USD), euro Section II, the related works are described. The (EUR), Japanese yen (JPY), British pound (GBP) and Installation of Ubuntu and configuration of Hadoop to Australian dollar (AUD) are the major currencies in the stimulate a Linux environment for processing big data is foreign exchange market. EUR/USD is a widely traded briefly discussed in Section III and IV. In Section V, we currency pair in the world (Bekiros & Diks, 2008) [1]. will outline the Moving Average algorithm that applied on foreign exchange time series datasets and the system 1
architecture. The Graphical User Interface (GUI) for this fuzzy network with a parallel genetic algorithm also is a user module is described in Section VI. In Section VII, good choice for predicting the foreign exchange. Fuzzy conclusion and future perspectives are drawn. inference system has the ability to approximate any non- linear mapping (Kosko, 1993). The genetic algorithm and II. RELATED WORKS the adaptive fuzzy network system will optimize the network to approximate the mapping. AutoRegressive In the study of Meese and Rogoff showed that naïve Integrated Moving-Average (ARIMA) is also a foreign random walk benchmark model is better than exchange forecasting model that used by many conventional linear models in forecasting future exchange researchers in foreign exchange market. The ARIMA rates. The authors Chun Teck, Tze Haw and Chee Wooi models are often referred to as Box-Jenkins models and employ artificial neural networks (ANNs) and are first popularized by Box and Jenkins. ARIMA model unconditional Vector Autogressive model (VAR) to combining its own past values, past errors, current and predict Yuan/USD exchange rates by using monetary past values of other time series to predict a value in time fundamentals. The result of them shows that ANNs series. ARIMA model consist three stages which are outperformed in market rate forecasts and are supported identification stage, estimation and diagnostic checking by monetary fundamentals [2]. Besides that, some stage, and the last stage is forecasting. researchers had used order flow in exchange rate prediction. They found out that order flow can provide III. HADOOP MAPREDUCE powerful information that allow public to forecast the daily exchange rate. Mahnaz Mahdavi had used the loss MapReduce is a computing model, it used for function approach of Bayesian statistics to forecast efficiency processing large data sets and distributed over foreign exchange rate in his paper. He proposes a loss cluster of computers. However, Hadoop is an open source function in his forecasting model and the Bayesian Java programming framework; it implements a forecasts slightly outperformed the classical forecast of computational paradigm named MapReduce for foreign exchange [3]. In the paper of Forecasting of processing large data sets on distributed computing foreign exchange rates of Taiwan’s major trading environment. MapReduce is a programming model and partners by novel nonlinear Grey Bernoulli model software framework proposed by Google(Dean & NGBM, the authors had study the feasibility and Ghemawat, 2008) [8]. The Hadoop MapReduce is effectiveness of novel Grey model with the concept of inspired by the Google’s MapReduce that invented in the Bernoulli differential equation for foreign exchange year 2004, where a software framework application could prediction. Novel Nonlinear Grey Bernoulli Model be broken down into numerous small parts. This Hadoop (NGBM) has shown improving in the precision of the MapReduce is a popular big data processing engine that traditional Grey forecasting model in the preliminary dedicated to scalable and distributed data intensive result of this paper and this model is successfully applied computing. MapReduce consist and perform two separate in forecasting annual foreign exchange rates of 13 and user-defined functions which is map and reduce in countries in year 2005 [4]. Furthermore, from the paper Hadoop program. First, the data sets will be split into that I study, the authors use relative power parity (PPP) smaller chunks and then distributed as an input into map model based on consumer price index (CPI) or traded- process. The map process will break down the individual goods price index (TPI) and a linear forecasting elements into tuples (key/value pairs). After that, the technique to determine Yen/US Dollar exchange rates Hadoop MapReduce framework sorts the outputs of the over a short-term horizon period. The TPI-based PPP- maps, which are then input to the reduce process. The model in outperforming the pure random walk is better reduce job will combine those data tuples into a smaller than CPI-based PPP-model [5]. However, CPI-based set of tuples to form the output. PPP-model produced lower forecast error than a random walk model. An adaptive autoregressive moving average IV. INSTALLATION OF UBUNTU (ARMA) combining with differential evolution (DE) based training forecasting model had been studied by Firstly, before storing and processing the foreign some researchers to shows that this proposed ARMA-DE exchange data, the installation and configuration for the exchange rate prediction model has superior prediction Hadoop MapReduce in the personal computers (stand- potential in short and long range if compare to other alone system) are needed. From the literature review models [6]. (Daneshyar & Patel, 2012) that has been found, it is determined that the Hadoop MapReduce is more suitable A. Forecasting Techniques to install on the Linux environment than the windows Neural network is one of the forecasting models for environment because the windows environment had foreign exchange market. Yeo state that neural network problems connecting to the distributed cluster [9]. By techniques are prime candidates for prediction purpose of default the personal computer is using the windows high volatility, complexity and noise market environment environment, so, it is highly recommended to install the (Yao & Tan, 2000). Neural networks model able to use Ubuntu operating system into the personal laptop in order fundamental and technical indicators as an input to to run the Hadoop MapReduce. This Ubuntu operating simulate fundamental and technical analysis, can also system is a complete desktop Linux-based operating decrease prediction risks [7]. In addition, an adaptive system that allows the Linux application to be compiled 2
and run on a windows operating system in secure - the prediction of foreign exchange rate is because the data files and data will stay protected, as well as it loads analysis of EUR/USD exchange rate is within one day quickly on any computer. The installation of this Ubuntu per minute time series and its focus is only for the closing operating system enables the Hadoop MapReduce to run ask. It focused on the closing ask is because the closing on the windows laptop over the Ubuntu. After installation asks are the most real data of the day and this ask rate of the Ubuntu operating system, the Hadoop MapReduce will be brought to the next day’s open asks, furthermore in the Ubuntu operating system needs to be configured people mostly use this ask rate to buy the current trading before it can be used by executing the command. Then, position from a broker or changing the other country’s the foreign exchange rate for EUR/USD currency pair currency. In addition, using moving average for analysis can be loaded into the Hadoop MapReduce, and user and predicting foreign exchange rate is because it need needs to key in the Java coding to extract the desired data rely on previous observed exchange rate to perform such as date, time and closing ask of the EUR/USD further forecasting. foreign exchange rate as the output. Essentially the analysis performed by Moving Average modeling is divided into two stages. The “Identification” V. MOVING AVERAGE TECHNIQUES and “Prediction” stages are summarized below. Time series data is ordered by time, exchange rate is A. Identification Stage time series data and its data is collected at specific points The first process in identification stage is to specify the in time. The data (exchange rate) that we measuring are input data set. The input data set is the foreign exchange referred as variable. Commonly, the frequencies of time rate of EUR/USD currency pair. Then use an identify series data are observed at annual, quarterly, monthly, statement to read the data of EUR/USD foreign exchange weekly or daily. In this project, we observed the rate. After that, extract the wanted parameters from the frequency of exchange rate in daily. Time series analysis Hadoop MapReduce as an output to plot a time series includes methods that use for analyzing time series data graph according to the date (as an input) that enter by in order to extract useful and meaningful statistics and users. Table 1 shows the example of EUR/USD foreign also other characteristics of the data. The techniques of exchange rate data set, and the time series of EUR/USD time series analysis may be parametric or non-parametric foreign exchange rate that has been plotted is shown in methods. Time series prediction is use of a model to the Fig. 1 below. The system architecture is shown in Fig. predict future values based on previously observed 2 below. values. The exist a lot of time series prediction techniques that use previously observed values or data as TABLE 1. EUR/USD Foreign Exchange Rate Data Sets the basis of estimating future outcome such as moving Date Time EUR/USD average, weighted moving average, exponential (Close, Ask) smoothing, autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), 12-09-2012 00:09:00 1.28617 linear prediction, trend estimation, growth curve and 12-09-2012 00:08:00 1.28617 other techniques. 12-09-2012 00:07:00 1.28620 In this paper, Moving Average technique will be used 12-09-2012 00:06:00 1.28618 to analyze the data and performing prediction. The 12-09-2012 00:05:00 1.28616 extracted output from the Hadoop MapReduce will be 12-09-2012 00:04:00 1.28627 passed to the Moving Average model for further analysis 12-09-2012 00:03:00 1.28622 by performing a series of calculation on the closing ask 12-09-2012 00:02:00 1.28625 of foreign exchange rate in order to predict the future 12-09-2012 00:01:00 1.28625 exchange rate. Moving average also called rolling 12-09-2012 00:00:00 1.28625 average or running average in statistics. The moving 11-09-2012 23:59:00 1.28620 average model is a simple and common technique that 11-09-2012 23:58:00 1.28616 used with time series data to analyze a set of data points, 11-09-2012 23:57:00 1.28615 and it can smooth out the fluctuations and highlight 11-09-2012 23:56:00 1.28632 longer-term trends. This moving average model is often 11-09-2012 23:55:00 1.28607 used in technical analysis of financial data such as stock 11-09-2012 23:54:00 1.28611 prices, exchange rate or trading volume and can also use 11-09-2012 23:53:00 1.28604 in economics to examine microeconomic time series. 11-09-2012 23:52:00 1.28602 More than that, moving average is one of the most used 11-09-2012 23:51:00 1.28625 indicators in Foreign Exchange Market (FOREX). A 11-09-2012 23:50:00 1.28619 moving average’s formula is taken to predict the foreign 11-09-2012 23:49:00 1.28624 exchange rate after identifying and extracting necessary 11-09-2012 23:48:00 1.28625 data from Hadoop MapReduce. 11-09-2012 23:47:00 1.28626 The following example illustrates Moving Average modeling and prediction using a simulated data set 11-09-2012 23:46:00 1.28621 containing a time series data. The reasons for choosing 11-09-2012 23:45:00 1.28624 Moving Average model as big data analytics and 11-09-2012 23:44:00 1.28622 3
11-09-2012 23:43:00 1.28613 11-09-2012 23:42:00 1.28605 R11 = 1.28629 +1.28615 + 1.28610 + 1.28611 + 1.28610 11-09-2012 23:41:00 1.28622 + 11-09-2012 23:40:00 1.28633 1.28610 + 1.28608 + 1.28609 + 1.28626 + 1.28609 10 = 1.28614 VI. GRAPHICAL USER INTERFACE (GUI) The user module that used in this paper is the Java Graphical User Interface (GUI). This module is to provide an interface for the user to select based on their preferred date of exchange rate graph and then predict the next closing asks exchange rate accordingly. The GUI performance is shown in the Fig. 3, Fig. 4 and Fig. 5 below. Figure 1. Time Series of EUR/USD Foreign Exchange Rate (From Sept 11, 2012 to Sept 12, 2012). Figure 3. The user interface of EUR/USD Currency Prediction System Figure 2. System Architecture B. Prediction Stage When the outputs are extracted and the time series is plotted, the next step is using formula to perform the prediction of future exchange rate. For example, if those exchange rates are R t, Rt-1, Rt-2, …… R t-(N-1) for N days then the formula is: R t+1 = R t + Rt-1 + Rt-2 + …… + R t-(N-1) N where Rt+1 = Prediction Closing Ask Rate for Period t+1 Rt-1 = Closing Ask Rate for Period t-1 Figure 4. The users interface that let user make a selection based on their desired date N = Number of Periods in the Moving Average So for example, if a ten-period moving average would be: R t+1 = R t + Rt-1 + Rt-2 + …… + R t-(N-1) 10 4
differential evolution based training. Journal of King Saud University-Computer and Information Sciences. [7] Yao, J., & Tan, C. L. (2000). A case study on using neural networks to perform technical forecasting of forex. Neurocomputing, 34(1), 79-98. [8] Dean, J., & Ghemawat, S. (2008). MapReduce: SimplifiedDataProcessingonLargeClusters. Communication of The ACM, Vol.51, No, 107–113. [9] Daneshyar, S., & Patel, A. (2012). Evaluation of Data Processing Using MapReduce Framework in Cloud and Stand-Alone Computing. International Journal, 3. [10] Muhammad, A., & King, G. A. (1997, March). Foreign exchange market forecasting using evolutionary fuzzy networks. In Computational Intelligence for Financial Engineering (CIFEr), 1997., Proceedings of the IEEE/IAFE 1997 (pp. 213-219). IEEE. [11] Iokibe, T., Murata, S., & Koyama, M. (1995, October). Prediction of foreign exchange rate by local fuzzy reconstruction method. In Systems, Man and Cybernetics, 1995. Intelligent Systems for Figure 5. Time Series of EUR/USD Foreign Exchange Rate that the 21st Century., IEEE International Conference on (Vol. 5, pp. generated based on the user selection. 4051-4054). IEEE. [12] Gutjahr, S., Riedmiller, M., & Klingemann, J. (1997). Daily prediction of the foreign exchange rate between the us dollar and VII. CONCLUSION the german mark using neural networks. Proc. of SPICES, 492- 498. We have proposed using Hadoop MapReduce for [13] Dittrich, J., & Quiané-Ruiz, J. A. (2012). Efficient big data processing foreign exchange data in this paper. The processing in Hadoop MapReduce. Proceedings of the VLDB programming language used in this user module is Java. Endowment, 5(12), 2014-2015. A simple and clear technique (Moving Average) is used [14] Narayan, S., Bailey, S., & Daga, A. (2012, November). Hadoop to forecast the exchange rate for EUR/USD currency pair. Acceleration in an OpenFlow-based cluster. In High Performance Besides that, we found out that Hadoop MapReduce is Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: (pp. 535-538). IEEE. suitable for processing a variety of big data sets, it can [15] Schultz, J., Vierya, J., & Lu, E. (2012, November). Analyzing minimize the processing time and get the accurate output Patterns in Large-Scale Graphs Using MapReduce in Hadoop. in the shortest time. Using another algorithm to predict In High Performance Computing, Networking, Storage and the exchange rate and processing the big data within least Analysis (SCC), 2012 SC Companion: (pp. 1457-1458). IEEE. time require can be another opportunity for further work. [16] Saeed Reza Aghabozorgi and Teh Ying Wah. "Shape-based Clustering of Time Series Data", Journal of Intelligent Data Analysis 18(5) (ISI/SCOPUS Cited Publication /Accepted). VIII. Acknowledgment [17] Saeed Reza Aghabozorgi and Teh Ying Wah. "Incremental The authors would like to thank the reviewers for their Clustering of Time Series Data by Fuzzy Clustering", Journal of comments on earlier versions of this paper. This research Information Scienceand Engineering 28 (4), 671-688 is funded by University of Malaya Research Grant (ISI/SCOPUS Cited Publication / Published ) (UM.C/625/1/HIR/MOHE/SC/13/2). [18] Saeed Reza Aghabozorgi and The Ying Wah. “Stock Market Co- movement Assessment using a Three-Phase Clustering Method”, Expert Systems With Applications, DOI:10.1016/j.eswa.2013.08.028 (ISI/SCOPUS Cited Publication). REFERENCES [19] Saeed Reza Aghabozorgi, Teh Ying Wah, Amineh Amini, and Mahmoud Reza Saybani "A New Approach to Present Prototypes in Clustering of Time Series", in Proceedings of The 7th [1] Bekiros, S. D., & Diks, C. G. (2008). The nonlinear dynamic International Conference of Data Mining, Las Vegas, USA, July relationship of exchange rates: Parametric and nonparametric 2011, pp. 214-220. causality testing. Journal of macroeconomics, 30(4), 1641-1650. [2] Lye, C. T., Chan, T. H., & Hooy, C. W. (2011). Forecasting Chinese Foreign Exchange with Monetary Fundamentals using Artificial Neural Networks. In 3rd Int Conf Inf Finance Eng (Vol. 12, pp. 560-564). [3] Mahdavi, M. (1997). A Bayesian approach to foreign exchange Say Er Lim was born at Muar, Johor Malaysia, on 27 July 1990. She forecasting.Global Finance Journal, 8(1), 15-31. gained her bachelor of Information [4] Chen, C. I., Chen, H. L., & Chen, S. P. (2008). Forecasting of Technology (IT) which major in management foreign exchange rates of Taiwan’s major trading partners by at University of Malaya, Malaysia (2010- novel nonlinear Grey Bernoulli model NGBM (1, 2014). 1). Communications in Nonlinear Science and Numerical Simulation,13(6), 1194-1204. [5] Grossmann, A., & Simpson, M. W. (2010). Forecasting the Yen/US Dollar exchange rate: Empirical evidence from a capital enhanced relative PPP-based model. Journal of Asian Economics, 21(5), 476-484. [6] Rout, M., Majhi, B., Majhi, R., & Panda, G. (2013). Forecasting of currency exchange rates using an adaptive ARMA model with 5
Saeed Aghabozorgi received his B.Sc. in Computer Engineering and Software Discipline from University of Isfahan, Iran, in 2002. He received his M.Sc. from Islamic Azad University, Iran, in 2005, and his Ph.D from University of Malaya in 2013. Currently, he is a lecturer at the Department of Information System, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia. His current research area is data mining. Law Hui Kim, was born in the city of Malacca, Malaysia, July 26, 1990 She gained her barcelor of Information Technology (IT) that major in the management field at the University of Malaya (UM), Kuala Lumpur, Malaysia (2010-2014). Ying-Wah Teh received his B.Sc. and M.Sc. from Oklahoma City University and Ph.D. from University of Malaya. He is currently an Associate Professor at Information Science Department, faculty of Computer Science and Information Technology, University of Malaya. His research interests include data mining, text mining, document mining, cloud computing and big data. TUTUT HERAWAN received PhD degree in computer science in 2010 from Universiti Tun Hussein Onn Malaysia. He is currently a senior lecturer at Department of Information System, University of Malaya. His research area includes rough and soft set theory, DMKDD, and decision support in information system. He is an editorial board and act as a reviewer for various journals. He has also served as a program committee member and co-organizer for numerous international conferences/workshops. 6
You can also read