BIGDATA PROCESSING USING MAPREDUCE FOREIGN EXCHANGE (EUR/USD CURRENCY PAIR)
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
BigData Processing Using MapReduce
Foreign Exchange (EUR/USD Currency Pair)
Say Er Lim
University Malaya, Selangor, Malaysia
say2@siswa.um.edu.my
Hui Kim Law, Saeed Aghabozorgi, Ying Wah Teh , and Tutut Herawan
University Malaya, Selangor, Malaysia
{stephyi_hk.qin@siswa.um.edu.my, saeed@um.edu.my, tehyw@um.edu.my, tutut@um.edu.my }
Abstract - This paper describes how using Hadoop The foreign exchange market is representing the largest
MapReduce to process big data. The big data that used in asset class in the world leading to high liquidity, it is
this project is foreign exchange rate of EUR/USD currency unique and its trading volume is huge. The foreign
pair which taken day by day within a minute. Firstly, the exchange market operates continuously day by day with
foreign exchange data will load into a Linux environment
24 hours per day. Thus, the exchange rates are
that stimulated by the Ubuntu that already set up in a
desktop computer by using Hadoop MapReduce function. inconsistent, it might change every day with every minute
After that, we extract the required data from the Hadoop either rise or decline. Foreign exchange rate is among the
that has been successfully loaded. Then, those data are used most important economic indices in the international
to show time series and predict the foreign exchange rate for monetary market.
the future (e.g. the next day). In foreign exchange markets, normally we have two
sets of price data which are bid and ask price. Ask is the
Index Term - foreign exchange rate, big data, Hadoop, price that the broker will sell you the position you
MapReduce, predict, moving average. required, while bid price is the price which a broker will
buy your current day trading position from you. Broker
I. INTRODUCTION uses the bid and ask price to buy current trading position
Advance in technology and social networks have or use it to sell the trading position to intended buyer. In
brought a lot of data. The volume of data is increasing, addition, there have two sets of data to refer to the
become more complex, high velocity and the type of data opening and the closing price end of the period
is variable. The size of big data might be petabytes, it respectively in foreign exchange chart. There have a lot
collected by millions of people that consisting of billions of factors can affect the ask and bid price of foreign
to trillions of record. Furthermore, big data is coming exchange market such as volatility of trading market,
from a variety of sources such as social media, web, differentials in interest rates, differentials in inflations
sales, customer information and other. The large and and other. The effect of foreign exchange fluctuations
complex data sets are difficult and slow to process might affect the profitability of an organization’s
efficiently by using traditional data processing business and caused the organization is put to exchange
applications. The challenges of those processing risk. Due to foreign exchange market trade is operating
applications are hard to process, capture, store, transfer every day, so the data for foreign exchange market is
and analysis the data sets. large and high rate fluctuation. Therefore, these data need
The big data used in this project is the EUR/USD to be processed, stored, analyzed and predicted in order
foreign exchange’s data. Foreign exchange is the to see the trend of the foreign exchange and help the
conversion of currency into another currency. The buyer and seller to identify and make a profit trading.
definition of foreign exchange from Cambridge In this paper, we will explain about installation of
Advanced Learner’s Dictionary & Thesaurus is described Ubuntu and configuration of Hadoop to store data and
as the system by which the type of money used in one retrieve it. Then we will explain about the Moving
country is exchanged for another country’s money, Average approach which is used to predict the foreign
making international trade easier. The foreign exchange exchange.
market enables currency conversion to assists The rest of this paper is organized as follows. In
international trade and investment. US dollar (USD), euro Section II, the related works are described. The
(EUR), Japanese yen (JPY), British pound (GBP) and Installation of Ubuntu and configuration of Hadoop to
Australian dollar (AUD) are the major currencies in the stimulate a Linux environment for processing big data is
foreign exchange market. EUR/USD is a widely traded briefly discussed in Section III and IV. In Section V, we
currency pair in the world (Bekiros & Diks, 2008) [1]. will outline the Moving Average algorithm that applied
on foreign exchange time series datasets and the system
1architecture. The Graphical User Interface (GUI) for this fuzzy network with a parallel genetic algorithm also is a
user module is described in Section VI. In Section VII, good choice for predicting the foreign exchange. Fuzzy
conclusion and future perspectives are drawn. inference system has the ability to approximate any non-
linear mapping (Kosko, 1993). The genetic algorithm and
II. RELATED WORKS the adaptive fuzzy network system will optimize the
network to approximate the mapping. AutoRegressive
In the study of Meese and Rogoff showed that naïve Integrated Moving-Average (ARIMA) is also a foreign
random walk benchmark model is better than exchange forecasting model that used by many
conventional linear models in forecasting future exchange researchers in foreign exchange market. The ARIMA
rates. The authors Chun Teck, Tze Haw and Chee Wooi models are often referred to as Box-Jenkins models and
employ artificial neural networks (ANNs) and are first popularized by Box and Jenkins. ARIMA model
unconditional Vector Autogressive model (VAR) to combining its own past values, past errors, current and
predict Yuan/USD exchange rates by using monetary past values of other time series to predict a value in time
fundamentals. The result of them shows that ANNs series. ARIMA model consist three stages which are
outperformed in market rate forecasts and are supported identification stage, estimation and diagnostic checking
by monetary fundamentals [2]. Besides that, some stage, and the last stage is forecasting.
researchers had used order flow in exchange rate
prediction. They found out that order flow can provide III. HADOOP MAPREDUCE
powerful information that allow public to forecast the
daily exchange rate. Mahnaz Mahdavi had used the loss MapReduce is a computing model, it used for
function approach of Bayesian statistics to forecast efficiency processing large data sets and distributed over
foreign exchange rate in his paper. He proposes a loss cluster of computers. However, Hadoop is an open source
function in his forecasting model and the Bayesian Java programming framework; it implements a
forecasts slightly outperformed the classical forecast of computational paradigm named MapReduce for
foreign exchange [3]. In the paper of Forecasting of processing large data sets on distributed computing
foreign exchange rates of Taiwan’s major trading environment. MapReduce is a programming model and
partners by novel nonlinear Grey Bernoulli model software framework proposed by Google(Dean &
NGBM, the authors had study the feasibility and Ghemawat, 2008) [8]. The Hadoop MapReduce is
effectiveness of novel Grey model with the concept of inspired by the Google’s MapReduce that invented in the
Bernoulli differential equation for foreign exchange year 2004, where a software framework application could
prediction. Novel Nonlinear Grey Bernoulli Model be broken down into numerous small parts. This Hadoop
(NGBM) has shown improving in the precision of the MapReduce is a popular big data processing engine that
traditional Grey forecasting model in the preliminary dedicated to scalable and distributed data intensive
result of this paper and this model is successfully applied computing. MapReduce consist and perform two separate
in forecasting annual foreign exchange rates of 13 and user-defined functions which is map and reduce in
countries in year 2005 [4]. Furthermore, from the paper Hadoop program. First, the data sets will be split into
that I study, the authors use relative power parity (PPP) smaller chunks and then distributed as an input into map
model based on consumer price index (CPI) or traded- process. The map process will break down the individual
goods price index (TPI) and a linear forecasting elements into tuples (key/value pairs). After that, the
technique to determine Yen/US Dollar exchange rates Hadoop MapReduce framework sorts the outputs of the
over a short-term horizon period. The TPI-based PPP- maps, which are then input to the reduce process. The
model in outperforming the pure random walk is better reduce job will combine those data tuples into a smaller
than CPI-based PPP-model [5]. However, CPI-based set of tuples to form the output.
PPP-model produced lower forecast error than a random
walk model. An adaptive autoregressive moving average IV. INSTALLATION OF UBUNTU
(ARMA) combining with differential evolution (DE)
based training forecasting model had been studied by Firstly, before storing and processing the foreign
some researchers to shows that this proposed ARMA-DE exchange data, the installation and configuration for the
exchange rate prediction model has superior prediction Hadoop MapReduce in the personal computers (stand-
potential in short and long range if compare to other alone system) are needed. From the literature review
models [6]. (Daneshyar & Patel, 2012) that has been found, it is
determined that the Hadoop MapReduce is more suitable
A. Forecasting Techniques to install on the Linux environment than the windows
Neural network is one of the forecasting models for environment because the windows environment had
foreign exchange market. Yeo state that neural network problems connecting to the distributed cluster [9]. By
techniques are prime candidates for prediction purpose of default the personal computer is using the windows
high volatility, complexity and noise market environment environment, so, it is highly recommended to install the
(Yao & Tan, 2000). Neural networks model able to use Ubuntu operating system into the personal laptop in order
fundamental and technical indicators as an input to to run the Hadoop MapReduce. This Ubuntu operating
simulate fundamental and technical analysis, can also system is a complete desktop Linux-based operating
decrease prediction risks [7]. In addition, an adaptive system that allows the Linux application to be compiled
2and run on a windows operating system in secure - the prediction of foreign exchange rate is because the data
files and data will stay protected, as well as it loads analysis of EUR/USD exchange rate is within one day
quickly on any computer. The installation of this Ubuntu per minute time series and its focus is only for the closing
operating system enables the Hadoop MapReduce to run ask. It focused on the closing ask is because the closing
on the windows laptop over the Ubuntu. After installation asks are the most real data of the day and this ask rate
of the Ubuntu operating system, the Hadoop MapReduce will be brought to the next day’s open asks, furthermore
in the Ubuntu operating system needs to be configured people mostly use this ask rate to buy the current trading
before it can be used by executing the command. Then, position from a broker or changing the other country’s
the foreign exchange rate for EUR/USD currency pair currency. In addition, using moving average for analysis
can be loaded into the Hadoop MapReduce, and user and predicting foreign exchange rate is because it need
needs to key in the Java coding to extract the desired data rely on previous observed exchange rate to perform
such as date, time and closing ask of the EUR/USD further forecasting.
foreign exchange rate as the output. Essentially the analysis performed by Moving Average
modeling is divided into two stages. The “Identification”
V. MOVING AVERAGE TECHNIQUES and “Prediction” stages are summarized below.
Time series data is ordered by time, exchange rate is A. Identification Stage
time series data and its data is collected at specific points
The first process in identification stage is to specify the
in time. The data (exchange rate) that we measuring are
input data set. The input data set is the foreign exchange
referred as variable. Commonly, the frequencies of time
rate of EUR/USD currency pair. Then use an identify
series data are observed at annual, quarterly, monthly,
statement to read the data of EUR/USD foreign exchange
weekly or daily. In this project, we observed the
rate. After that, extract the wanted parameters from the
frequency of exchange rate in daily. Time series analysis
Hadoop MapReduce as an output to plot a time series
includes methods that use for analyzing time series data
graph according to the date (as an input) that enter by
in order to extract useful and meaningful statistics and
users. Table 1 shows the example of EUR/USD foreign
also other characteristics of the data. The techniques of
exchange rate data set, and the time series of EUR/USD
time series analysis may be parametric or non-parametric
foreign exchange rate that has been plotted is shown in
methods. Time series prediction is use of a model to
the Fig. 1 below. The system architecture is shown in Fig.
predict future values based on previously observed
2 below.
values. The exist a lot of time series prediction
techniques that use previously observed values or data as TABLE 1. EUR/USD Foreign Exchange Rate Data Sets
the basis of estimating future outcome such as moving
Date Time EUR/USD
average, weighted moving average, exponential
(Close, Ask)
smoothing, autoregressive moving average (ARMA),
autoregressive integrated moving average (ARIMA), 12-09-2012 00:09:00 1.28617
linear prediction, trend estimation, growth curve and 12-09-2012 00:08:00 1.28617
other techniques. 12-09-2012 00:07:00 1.28620
In this paper, Moving Average technique will be used 12-09-2012 00:06:00 1.28618
to analyze the data and performing prediction. The 12-09-2012 00:05:00 1.28616
extracted output from the Hadoop MapReduce will be 12-09-2012 00:04:00 1.28627
passed to the Moving Average model for further analysis 12-09-2012 00:03:00 1.28622
by performing a series of calculation on the closing ask 12-09-2012 00:02:00 1.28625
of foreign exchange rate in order to predict the future 12-09-2012 00:01:00 1.28625
exchange rate. Moving average also called rolling 12-09-2012 00:00:00 1.28625
average or running average in statistics. The moving 11-09-2012 23:59:00 1.28620
average model is a simple and common technique that 11-09-2012 23:58:00 1.28616
used with time series data to analyze a set of data points, 11-09-2012 23:57:00 1.28615
and it can smooth out the fluctuations and highlight 11-09-2012 23:56:00 1.28632
longer-term trends. This moving average model is often 11-09-2012 23:55:00 1.28607
used in technical analysis of financial data such as stock 11-09-2012 23:54:00 1.28611
prices, exchange rate or trading volume and can also use 11-09-2012 23:53:00 1.28604
in economics to examine microeconomic time series. 11-09-2012 23:52:00 1.28602
More than that, moving average is one of the most used 11-09-2012 23:51:00 1.28625
indicators in Foreign Exchange Market (FOREX). A 11-09-2012 23:50:00 1.28619
moving average’s formula is taken to predict the foreign
11-09-2012 23:49:00 1.28624
exchange rate after identifying and extracting necessary
11-09-2012 23:48:00 1.28625
data from Hadoop MapReduce.
11-09-2012 23:47:00 1.28626
The following example illustrates Moving Average
modeling and prediction using a simulated data set 11-09-2012 23:46:00 1.28621
containing a time series data. The reasons for choosing 11-09-2012 23:45:00 1.28624
Moving Average model as big data analytics and 11-09-2012 23:44:00 1.28622
311-09-2012 23:43:00 1.28613
11-09-2012 23:42:00 1.28605 R11 = 1.28629 +1.28615 + 1.28610 + 1.28611 + 1.28610
11-09-2012 23:41:00 1.28622 +
11-09-2012 23:40:00 1.28633 1.28610 + 1.28608 + 1.28609 + 1.28626 + 1.28609
10
= 1.28614
VI. GRAPHICAL USER INTERFACE (GUI)
The user module that used in this paper is the Java
Graphical User Interface (GUI). This module is to
provide an interface for the user to select based on their
preferred date of exchange rate graph and then predict the
next closing asks exchange rate accordingly. The GUI
performance is shown in the Fig. 3, Fig. 4 and Fig. 5
below.
Figure 1. Time Series of EUR/USD Foreign Exchange Rate (From Sept
11, 2012 to Sept 12, 2012).
Figure 3. The user interface of EUR/USD Currency Prediction System
Figure 2. System Architecture
B. Prediction Stage
When the outputs are extracted and the time series is
plotted, the next step is using formula to perform the
prediction of future exchange rate. For example, if those
exchange rates are R t, Rt-1, Rt-2, …… R t-(N-1) for N days
then the formula is:
R t+1 = R t + Rt-1 + Rt-2 + …… + R t-(N-1)
N
where Rt+1 = Prediction Closing Ask Rate for Period t+1
Rt-1 = Closing Ask Rate for Period t-1 Figure 4. The users interface that let user make a selection based on
their desired date
N = Number of Periods in the Moving Average
So for example, if a ten-period moving average would be:
R t+1 = R t + Rt-1 + Rt-2 + …… + R t-(N-1)
10
4differential evolution based training. Journal of King Saud
University-Computer and Information Sciences.
[7] Yao, J., & Tan, C. L. (2000). A case study on using neural
networks to perform technical forecasting of
forex. Neurocomputing, 34(1), 79-98.
[8] Dean, J., & Ghemawat, S. (2008). MapReduce:
SimplifiedDataProcessingonLargeClusters. Communication of The
ACM, Vol.51, No, 107–113.
[9] Daneshyar, S., & Patel, A. (2012). Evaluation of Data Processing
Using MapReduce Framework in Cloud and Stand-Alone
Computing. International Journal, 3.
[10] Muhammad, A., & King, G. A. (1997, March). Foreign exchange
market forecasting using evolutionary fuzzy networks.
In Computational Intelligence for Financial Engineering (CIFEr),
1997., Proceedings of the IEEE/IAFE 1997 (pp. 213-219). IEEE.
[11] Iokibe, T., Murata, S., & Koyama, M. (1995, October). Prediction
of foreign exchange rate by local fuzzy reconstruction method.
In Systems, Man and Cybernetics, 1995. Intelligent Systems for
Figure 5. Time Series of EUR/USD Foreign Exchange Rate that the 21st Century., IEEE International Conference on (Vol. 5, pp.
generated based on the user selection. 4051-4054). IEEE.
[12] Gutjahr, S., Riedmiller, M., & Klingemann, J. (1997). Daily
prediction of the foreign exchange rate between the us dollar and
VII. CONCLUSION the german mark using neural networks. Proc. of SPICES, 492-
498.
We have proposed using Hadoop MapReduce for
[13] Dittrich, J., & Quiané-Ruiz, J. A. (2012). Efficient big data
processing foreign exchange data in this paper. The processing in Hadoop MapReduce. Proceedings of the VLDB
programming language used in this user module is Java. Endowment, 5(12), 2014-2015.
A simple and clear technique (Moving Average) is used [14] Narayan, S., Bailey, S., & Daga, A. (2012, November). Hadoop
to forecast the exchange rate for EUR/USD currency pair. Acceleration in an OpenFlow-based cluster. In High Performance
Besides that, we found out that Hadoop MapReduce is Computing, Networking, Storage and Analysis (SCC), 2012 SC
Companion: (pp. 535-538). IEEE.
suitable for processing a variety of big data sets, it can
[15] Schultz, J., Vierya, J., & Lu, E. (2012, November). Analyzing
minimize the processing time and get the accurate output Patterns in Large-Scale Graphs Using MapReduce in Hadoop.
in the shortest time. Using another algorithm to predict In High Performance Computing, Networking, Storage and
the exchange rate and processing the big data within least Analysis (SCC), 2012 SC Companion: (pp. 1457-1458). IEEE.
time require can be another opportunity for further work. [16] Saeed Reza Aghabozorgi and Teh Ying Wah. "Shape-based
Clustering of Time Series Data", Journal of Intelligent Data
Analysis 18(5) (ISI/SCOPUS Cited Publication /Accepted).
VIII. Acknowledgment
[17] Saeed Reza Aghabozorgi and Teh Ying Wah. "Incremental
The authors would like to thank the reviewers for their
Clustering of Time Series Data by Fuzzy Clustering", Journal of
comments on earlier versions of this paper. This research Information Scienceand Engineering 28 (4), 671-688
is funded by University of Malaya Research Grant (ISI/SCOPUS Cited Publication / Published )
(UM.C/625/1/HIR/MOHE/SC/13/2). [18] Saeed Reza Aghabozorgi and The Ying Wah. “Stock Market Co-
movement Assessment using a Three-Phase Clustering Method”,
Expert Systems With Applications,
DOI:10.1016/j.eswa.2013.08.028 (ISI/SCOPUS Cited
Publication).
REFERENCES [19] Saeed Reza Aghabozorgi, Teh Ying Wah, Amineh Amini, and
Mahmoud Reza Saybani "A New Approach to Present Prototypes
in Clustering of Time Series", in Proceedings of The 7th
[1] Bekiros, S. D., & Diks, C. G. (2008). The nonlinear dynamic International Conference of Data Mining, Las Vegas, USA, July
relationship of exchange rates: Parametric and nonparametric 2011, pp. 214-220.
causality testing. Journal of macroeconomics, 30(4), 1641-1650.
[2] Lye, C. T., Chan, T. H., & Hooy, C. W. (2011). Forecasting
Chinese Foreign Exchange with Monetary Fundamentals using
Artificial Neural Networks. In 3rd Int Conf Inf Finance Eng (Vol.
12, pp. 560-564).
[3] Mahdavi, M. (1997). A Bayesian approach to foreign exchange Say Er Lim was born at Muar, Johor Malaysia, on 27 July 1990. She
forecasting.Global Finance Journal, 8(1), 15-31. gained her bachelor of Information
[4] Chen, C. I., Chen, H. L., & Chen, S. P. (2008). Forecasting of Technology (IT) which major in management
foreign exchange rates of Taiwan’s major trading partners by at University of Malaya, Malaysia (2010-
novel nonlinear Grey Bernoulli model NGBM (1, 2014).
1). Communications in Nonlinear Science and Numerical
Simulation,13(6), 1194-1204.
[5] Grossmann, A., & Simpson, M. W. (2010). Forecasting the
Yen/US Dollar exchange rate: Empirical evidence from a capital
enhanced relative PPP-based model. Journal of Asian
Economics, 21(5), 476-484.
[6] Rout, M., Majhi, B., Majhi, R., & Panda, G. (2013). Forecasting
of currency exchange rates using an adaptive ARMA model with
5Saeed Aghabozorgi received his B.Sc. in Computer Engineering and
Software Discipline from University of
Isfahan, Iran, in 2002. He received his
M.Sc. from Islamic Azad University, Iran, in
2005, and his Ph.D from University of Malaya
in 2013. Currently, he is a lecturer at the
Department of Information System, Faculty of
Computer Science and Information
Technology, University of Malaya, Kuala
Lumpur, Malaysia. His current research area is
data mining.
Law Hui Kim, was born in the city of Malacca, Malaysia, July 26,
1990 She gained her barcelor of Information
Technology (IT) that major in the
management field at the University of Malaya
(UM), Kuala Lumpur, Malaysia (2010-2014).
Ying-Wah Teh received his B.Sc. and M.Sc. from Oklahoma City
University and Ph.D. from University of
Malaya. He is currently an Associate
Professor at Information Science Department,
faculty of Computer Science and Information
Technology, University of Malaya. His
research interests include data mining, text
mining, document mining, cloud computing
and big data.
TUTUT HERAWAN received PhD degree in
computer science in 2010 from Universiti Tun Hussein Onn Malaysia.
He is currently a senior lecturer at Department of Information System,
University of Malaya. His research area includes rough and soft set
theory, DMKDD, and decision support in information system. He is an
editorial board and act as a reviewer for various journals. He has also
served as a program committee member and co-organizer for numerous
international conferences/workshops.
6You can also read