VOLATILITY FORECASTING ON GLOBAL STOCK MARKET INDICES - EVALUATION AND COMPARISON OF GARCH-FAMILY MODELS FORECASTING PERFORMANCE - DIVA PORTAL

Page created by Gary Hodges
 
CONTINUE READING
VOLATILITY FORECASTING ON GLOBAL STOCK MARKET INDICES - EVALUATION AND COMPARISON OF GARCH-FAMILY MODELS FORECASTING PERFORMANCE - DIVA PORTAL
Volatility forecasting on
global stock market indices
 Evaluation and comparison of GARCH-
 family models forecasting performance
 Simon Molin

 Master Thesis
 Economics, 15 hp
 Spring term 2021
Page intentionally left blank
Acknowledgment

I would like to express my gratitude towards Giovanni Forchini for his guidance and support
throughout this thesis.

Sincerely,
Simon Molin
2021-05-30
Page intentionally left blank
Abstract

Volatility is arguably one of the most important measures in financial economics since it is
often used as a rough measure of the total risk of financial assets. Many volatility models have
been developed to model the process, where the GARCH-family models capture several
characteristics that are observed in financial data. An accurate volatility forecast is of great
value for monetary policymakers, risk managers, investors and for assessing the price and value
of financial derivates. The purpose of this thesis is to evaluate if asymmetric or symmetric
GARCH models generate better volatility forecast and specifically which model that is
superior. Two symmetric models, the GARCH and IGARCH, and two asymmetric models, the
EGARCH and TGARCH are used. Daily volatility forecasts on the returns from 10 stock
market indices, where the predicted forecasts cover the period from 2020-03-31 to 2021-03-
31, will be compared to the realized volatility by four different evaluation measures. The
evidence suggests that the symmetric GARCH models, on average, produce the best volatility
forecast and specifically the IGARCH model.
Table of Contents
1 Introduction ............................................................................................................. 1
2 Literature ................................................................................................................. 3
3 Theoretical framework ............................................................................................. 6
 3.1 ARCH model................................................................................................................. 6
 3.2 Conditional mean......................................................................................................... 7
 3.3 Symmetric models ....................................................................................................... 8
 3.3.1 GARCH model ...................................................................................................................................8
 3.3.2 IGARCH model ..................................................................................................................................9
 3.4 Asymmetric models ................................................................................................... 10
 3.4.1 EGARCH model .............................................................................................................................. 10
 3.4.2 TGARCH model .............................................................................................................................. 11

4 Data and Methodology ...........................................................................................12
 4.1 Data .......................................................................................................................... 12
 4.1.1 Distribution ................................................................................................................................... 14
 4.2 Method ..................................................................................................................... 15
 4.2.1 Diagnostics test ............................................................................................................................. 15
 4.2.2 Out of sample forecast method .................................................................................................... 16
 4.2.3 Realized volatility .......................................................................................................................... 17
 4.2.4 Forecast evaluation ....................................................................................................................... 17

5 Results ....................................................................................................................20
6 Discussion ...............................................................................................................24
7 Conclusion...............................................................................................................26
8 References ..............................................................................................................28
9 Appendix.................................................................................................................31
1 Introduction

The financial literature has put considerable attention on modeling and forecasting stock
market volatility over the past decades. There are numerous motivations to why this has been
the focus. Volatility is often used as a rough measure of the total risk of financial assets and is
arguably one of the most important measures in financial economics. Since volatility is not
directly observable an accurately estimated forecast of stock markets volatility is important for
several financial applications. Particularly for monetary policymaking, risk management,
portfolio selection, and asset and financial derivates valuation (Brooks 2014).

Financial market volatility has commonly seen characteristics that are well documented.
Volatility clustering, leverage effects, mean reversion and similar movements of volatilities
between financial markets are some of these features, (Poon and Granger 2003). Volatility
clustering implies that volatility usually fluctuates between certain high and low periods.
Leverage effects refer to that large price drops seem to have a greater impact on volatility
compared to an equally large increase in prices. Furthermore, volatility progresses over time
in a continuous manner and it varies within some fixed range, that is volatility is usually
stationary (Tsay 2013). The GARCH-family models have proven to capture many of these
characteristics in financial markets. The symmetric GARCH models capture volatility
clustering and the fat tail distribution in a series. The GARCH models with the asymmetric
component allow the response of volatility to positive and negative shocks to be asymmetric,
with a greater impact on negative shocks (Tsay 2013).

The purpose of this paper is to examine the performance of GARCH-family models volatility
forecasts by using ten stock market indices. Two symmetric GARCH models will be included,
these are the standard Generalised Autoregressive Conditional Heteroskedasticity (GARCH)
model and the Integrated Autoregressive Conditional Heteroskedasticity (IGARCH) model.
Two asymmetric GARCH models will be considered to account for the leverage effects, these
are the Threshold Generalised Autoregressive Conditional Heteroskedasticity (TGARCH)
model and the Exponential Generalised Autoregressive Conditional Heteroskedasticity
(EGARCH) model. Specifically, the study examines which of the two model types and which

 1
specific GARCH model that generates the best volatility forecasts by the usage of daily returns
from a large number of stock market indices.

Since volatility is not directly observable a proxy of the true volatility will be used and
compared to the estimated forecasts from the GARCH-family models. The realized volatility
will be used as a proxy for the true volatility, which is the sum of intraday squared returns at
high frequencies (Andersen and Bollerslev 1998). To evaluate the performance of each
GARCH model four different evaluation measures will be used these are the Mean Square
Error (MSE), Mean Absolute Percentile Error (MAPE), Mean Absolute Error (MAE) and the
asymmetric QLIKE loss function. These measures evaluate the difference between the realized
volatility and the volatility forecast. The method of use includes one day ahead forecast with a
recursive expanding refit.

Several studies in the volatility forecasting literature use a small number of stock market
indices and at times just one evaluation measure to examine the performance of GARCH
models forecasting performance. This research aims to examine several GARCH-models by
using a large number of stock market indices, several forecast evaluation measures and a
volatility proxy proven to be superior to provide an evaluation of GARCH models forecasting
performance that are as accurate as possible.

The paper is organized as follows: Section 2 provides extensive coverage of the existing
literature on volatility forecasting, specifically for GARCH models. In section 3 the theoretical
framework is presented which describes the models and their specifications. Section 4 presents
the financial data used to forecast the volatility and the methods that are implemented. In
sections 5 to 7, we present and discuss the result and finally we summarize and conclude.

 2
2 Literature

Poon and Granger (2003), provides a comprehensive coverage of the volatility forecasting
literature. In their paper, they examine 93 papers and their collective findings. They emphasize
that volatility forecasting is important for assets valuation, risk management, investments and
monetary policymaking. Time series volatility forecasting models are categorized into four
groups, these are the GARCH-family models, stochastic volatility (SV) models, option-based
volatility models such as the Black-Scholes model, and historical volatility models such as the
exponential moving average model (EWMA). Their overall findings suggest that option-based
volatility models are superior in forecasting, followed by GARCH- and historical volatility
models. Regarding the GARCH-family models, such as the EGARCH model, which accounts
for the strong negative relationship between volatility and a shock in asset return, generally
outperforms the symmetric GARCH models. However, a drawback when comparing a variety
of studies is that they use different data sets, assets, time periods and evaluation techniques.

It has been established by the literature that the GARCH-family models can be divided into
two groups, the symmetric models and the asymmetric models (Tsay 2013). In this thesis, the
symmetric models are the GARCH and the IGARCH model and the asymmetric models are
the EGARCH and the TGARCH model. The difference between the symmetric- and the
asymmetric GARCH models is that the latter can capture the leverage effects, where a negative
shock on prices has a larger effect on volatility than an equally large positive price shock
(Alexander 2001).

In financial time series and financial market volatility there are several features that have been
observed and that are well documented. The fat tail distribution of asset returns, volatility
clustering, mean reversion and similar movements of volatilities between financial markets are
some of these features (Poon and Granger 2003). Moreover, recent studies also provide
evidence for that volatility is stronger during bear markets and financial crises. Where large
price drops seem to have a greater impact on volatility compared to an equally large increase
in prices. The leverage effect defines this characteristic (Tsay 2013). Historically various
GARCH models have been implemented to capture these kinds of features in the financial
markets. For example, the GARCH model developed by Bollerslev (1986) captures the

 3
properties of fat tail distribution and volatility clustering. The GARCH-family models have
been extended to include more of the features that are observed in the financial markets.

When evaluating GARCH models forecasting performance, the literature usually employs
various statistical evaluation measurements, such as the Mean Square Error (MSE), Mean
Absolute Percentile Error (MAPE), Mean Absolute Error (MAE) and Root Mean Square Error
(RMSE), (Poon and Granger, 2003). These measurements are specified differently, however,
they all measure the error of the forecast which is the estimated loss value between the
forecasted volatility and the volatility proxy (Reschenhofer, Mangat and Stark, 2020).
Moreover, the literature on the evaluation of the volatility forecast has been fairly criticized
and is not as excessive as the literature on the construction of volatility models and forecast,
(Poon and Granger 2003). However, Patton (2011), derives two forecasting evaluation
measures that are robust to noise in the volatility proxy. These are the asymmetric QLIKE loss
function and the Mean Square Error (MSE). These measures offer a consistent model ranking
even in the presence of a noisy volatility proxy. This implies that the volatility proxy under the
robust measures offers the same ranking as if the true volatility process would be used.

The volatility proxy tries to capture the unobserved underlying process that defines volatility.
Andersen and Bollerslev (1998) suggested that intra-day high-frequency data on the sum of the
squared returns is a sufficient proxy of the true volatility, this proxy is defined as the realized
volatility. Before the introduction of realized volatility, when high-frequency data was not as
easily accessible, daily squared returns were a proxy that was widely applied to reflect the true
volatility.

Awartani and Corradi (2005) examine the out-of-sample predictive ability and accuracy of
various GARCH models with a focus on the asymmetric component in the models. The models
studied are the same as in this paper, that is the GARCH, IGARCH, TGARCH and EGARCH,
and other variations of GARCH-models. Their data consist of daily observations of the S&P-
500 index converging the period from 1900 to 2001 and they use various out-of-sample forecast
horizons. Their main finding suggests that the asymmetric GARCH models outperform the
GARCH model, however, the superior predictive ability in the asymmetric models diminishes
as the forecasting horizon increases. Moreover, the GARCH model is superior when compared
to other symmetric versions.

 4
Evan and McMillan (2007) studied the performance of nine different models of daily volatility
including the GARCH model and several asymmetric GARCH models. The data included 33
stock market indices globally, with the out-of-sample forecast covering the period from the
beginning of 2004 to April 2005. They evaluate their results based on the RMSE measure.
Firstly, they conclude that the GARCH-family model is superior to the historical volatility
models such as moving average models. Secondly, GARCH models with the asymmetric
component are superior to the standard GARCH model. Where the EGARCH model appears
to outperform the other models.

Garbiel (2012), evaluated the forecasting performance of several GARCH models, including
the TGARCH, IGARCH, EGARCH and Power-GARCH. The data covers the period from
2001 to 2012 with daily observations of the Bucharest Stock Exchange Trading Index and the
forecasts are evaluated with the RMSE, MAE, MAPE and Theil inequality coefficient (TIC)
measures. They conclude that the most model with the best forecasting ability was the
TGARCH model.

Another study that focuses on the European stock markets and evaluates which GARSH model
that has the best volatility forecasting ability is Harrison and Moore (2011). They include 10
stock exchanges in the European region covering the period from 1991 to 2008. Six widely
used loss functions are implemented, and they conclude that GARCH models which allow for
the asymmetric component consistently outperform the symmetric GARCH models,
specifically the EGARCH model was one of the models that were preferred.

Lim and Sek (2013) studied the performance of symmetric and asymmetric GARCH models
during the financial crisis in Malaysia in 1997. The data were categorized into three different
groups, i.e., pre-crisis, post-crisis and during-crisis They used three evaluation measures, the
MSE, MAPE and RMSE, and three different models, the GARCH, EGARCH and TGARCH
model. They found that in general the symmetric GARCH models perform better during
periods with high fluctuations (during-crisis) and the asymmetric models perform better during
normal periods (pre- and post-crisis).

To summarize the literature review, GARCH models that allow for asymmetries seem to, in
general, outperform the symmetric GARCH models. Furthermore, during more volatile periods
the evidence suggests that the symmetric GARCH models perform better. It is important to

 5
apply a proxy for the true volatility that is as accurate as possible, and the literature suggests
that using a proxy, such as the realized volatility, which is based on intra-day high-frequency
data is superior. Finally, regarding the different forecast evaluation measures, that calculate the
error of the forecast compared to the volatility proxy, the literature has applied several widely
used statistical measures. Patton (2011) suggests that the QLIKE and MSE are two measures
that are robust to noise in the volatility proxy.

 3 Theoretical framework
The ARCH and GARCH models are some of the most frequently used models for forecasting
volatility. The GARCH-family models use conditional variance of returns and are estimated
through a maximum likelihood function. These models capture various important features of
the financial market. Firstly, they can capture the fat tail distribution of asset returns. Secondly,
they account for that the volatility in financial markets tends to appear in clusters. This means
that when assets exhibit large returns, the reruns in the following period are also expected to
be large and vice versa. Finally, the leverage effects, which describe the tendency for volatility
to increase more after a price drop than a price rise with the same magnitude. The intuition here
is that a price drop increases the financial leverage which makes the asset riskier and therefore
increases its volatility (Brooks 2014).

 3.1 ARCH model
When starting to describe the GARCH-family framework, we start with the simplest version
of these models which is the autoregressive conditionally heteroscedastic model (ARCH
model) developed by Engle (1982). The ARCH model describes the conditional variance as a
simple quadratic function of its lagged squared errors. Furthermore, the shock of an asset return
is serially uncorrelated but dependent. The ARCH(q) model for conditional variance can be
written as (Tsay 2013):

 2 = 0 + 1 −1
 2 2
 + 2 −2 2
 + ⋯ + − (1)

 = 

 6
Where is a sequence of iid random variables with zero mean and unit variance. 2 is the
conditional variance, − are the lagged squared errors and 0 is the long-term average value
of the variance. The distribution of (in all following GARCH-family models) is assumed to
follow either a standard normal, standardized student t or a generalized error distribution
(GED). Furthermore, non-negative constraints are applied on the coefficients in order to ensure
a positive conditional variance, where:

 ≥ 0
 = 0,1,2, … , 

The ARCH model provides a sufficient fundamental framework for the analysis of time series
modeling of volatility. However, there are some weaknesses with this model. The lagged
squared errors are required to capture all of the dependence in the conditional variance and
could be very large. Furthermore, the ARCH reacts slowly to large shocks in the return of an
asset and is thus likely to overpredict the volatility (Tsay 2013). The natural extension to
overcome the weaknesses in the ARCH model are the GARCH models discussed below.

 3.2 Conditional mean
The GARCH-family models consist of two equations, that is a conditional variance equation
and a conditional mean equation. Thus, before starting to describe the GARCH models in terms
of the conational variance, we need to consider the conditional mean since the conditional
variance is measured around the mean (Alexander 2008). The mean model specification
includes autoregressive terms AR( ) and moving average terms MA( ). For all following
GARCH models, the mean model specification can be expressed by the general ARMA( , ):

 = 0 + ∑ − + ∑ − + (2)
 =1 =1

Where is the autoregressive parameter, is the moving average parameter, 0 is a constant
and a random variable assumed to be i.i.d. with zero mean and unity variance.

 7
In order to find the best possible order for the ARMA( , ) model the “auto.arima” function
in R was used. The function conducts a search of different combinations of AR and MA
processes to find the best possible model according to the lowest value of the Bayesian
Information Criterion (BIC). For all of the stock indices studied, except for the indices in the
United States, the ARMA order (0,0) with non-zero mean was the preferred model. Thus, the
mean model for these indices will follow an ARMA(0,0) process. Regarding the indices in the
United States, the preferred mean model specification follows an ARMA(0,1) process with a
non-zero mean. The results for the ARMA( , ) order selections are presented in Appendix
A1.

 3.3 Symmetric models
The symmetric GARCH models apply non-negative constraints on parameters to avoid a
negative volatility since a negative volatility would be meaningless and cannot be interpreted.
The symmetric GARCH models capture volatility clustering and the fat tail distribution in a
series, however, these models don’t capture the leverage effects (Tsay 2013). Two symmetric
GARCH models are applied, the standard GARCH model and the IGARCH model.

 3.3.1 GARCH model

The generalized ARCH model (GARCH), which is widely used in the volatility forecasting
litterateur, was developed by Bollerslev (1986) and is an extension of the ARCH model. The
GARCH model has the benefit of capturing the heavy tail distribution of asset returns and
volatility clustering. However, one of the weaknesses with the GARCH model is that it doesn’t
account for the leverage effects in asset returns. The difference between the ARCH and
GARCH model is that the GARCH allows the conditional variance to be dependent on previous
own lags besides lagged squared errors.

The conditional variance in the GARCH is parameterized to depend on lags of the squared
errors and lags of the conditional variance, thus the question of which orders to choose arises.
In the finance literature and in the volatility forecasting literature there are rarely any higher-
order models estimated than GARCH(1,1) since it is sufficient to capture the volatility
clustering in financial data. Thus, the following GARCH-models applied in this thesis will
follow the formulation of (1,1).

 8
We can express the GARCH(1,1) model as:

 2 = 0 + 1 −1
 2 2
 + 1 −1 (3)

 = 

 1 + 1 < 1

Where 2 is the conditional variance and 0 is a function of the long-term average variance
rate. The parameter 1 attached to the lagged squared error ( −1 ) measures the effect of a
shock on conditional variance whereas the parameter 1 attach to the lagged conditional
 2
variance ( −1 ) measures the persistence in volatility. Following the ARCH model, we also put
non-negative constraints on the coefficients and is defined as before. Considering the last
constraint that 1 + 1 < 1, implies that the conditional variance 2 evolves over time and
unconditional variance is finite. This leads to a forecast of the volatility that will converge
to the long-term average rate of the variance as the forecast horizon increases (Tsay 2013).

 3.3.2 IGARCH model

The integrated GARCH model (IGARCH) has one major difference to the GARCH, that is, it
has a non-stationary variance process such that the impact of shocks on volatility is persistent.
This means that volatility is not mean reverting (Alexander 2001). To see this, consider the last
constraint in the GARCH model 1 + 1 < 1. For the GARCH(1,1) the unconditional variance
of is constant and expressed as:

 0
 ( ) = (4)
 1 − ( 1 + 1 )

However, if 1 + 1 ≥ 1 the unconditional variance for is not defined, which implies non-
stationarity in variance. If 1 + 1 = 1 this would imply a unit-root in variance which would be
known as the IGARCH model, where convergence of the conditional variance forecast will not
occur. Building on the previous ARCH and GARCH model the following equation describes
an IGARCH(1,1):

 9
 2 = 0 + (1 − 1 ) −1
 2 2
 + 1 −1 (5)

 = 

Where we use similar notations as to the GARCH model, non-negativity constraints on the
parameters and where is defined as before. The strength of the IGARCH model is arguably
its ability to capture occasionally level shifts in volatility (Tsay 2013). However, it has similar
weaknesses as the GARCH, since the model assumes that positive and negative shocks have
the same effect on the volatility.

 3.4 Asymmetric models
The financial literature has developed models that will account for the weaknesses in the
symmetric GARCH models by capturing the leverage effects, where the response of volatility
to positive and negative shocks are asymmetric, with a larger impact on negative shocks. The
following two models (EGARCH and TGARCH) are two commonly used models used to
manage the leverage effects.

 3.4.1 EGARCH model

The exponential GARCH model (EGARCH) was developed by Nelson (1991). There are
several advantages with the EGARCH model over the symmetric GARCH models. Firstly, the
conditional variance in logarithmic form is modeled, which removes the need to impose non-
negative constraints on the parameters. Secondly, it handles the leverage effects by
incorporating an asymmetric response function. Thirdly, it permits an oscillatory behavior
which is random in the variance process (Nelson 1991). The EGARCH(1,1) specification can
be defined by the equation:

 ( 2 ) = 0 + ( −1 ) + ( −1
 2
 ) (6)

where

 ( −1 ) = −1 + (⌈ −1 ⌉ − √2⁄ ) (7)

 10
The EGARCH-model expresses the logged conditional variance in three parts, which is a
long-run average value of the variance, ( −1 ) the asymmetric response function and
 2
 ( −1 ) the logged conditional variance in the previous period. The asymmetric response
function, which handles the leverage effects, includes −1 which decides the sign of the

effect and (⌈ −1 ⌉ − √2⁄ ) the decides the size of the effect (Alexander 2001). The
EGARCH-model is widely used in the volatility forecasting literature, arguably for its simple
specification that manages to capture the leverage effects. Furthermore, many studies suggest
that the logarithmic specifications in the EGARCH seem to be appropriate when modeling
asset returns in financial data. However, the asymmetric models tend to have a smaller
volatility forecast than the symmetric models, which could be an issue (Poon and Granger
2003).

 3.4.2 TGARCH model

To further evaluate the asymmetric GARCH models we include the threshold GARCH model
(TGARCH), also known as the GJR model, which was developed by Glosten, Jagannathan and
Runkle (1993). The TGARCH model is a simple extension to the GARCH model where a
multiplicative dummy variable is included to account for the leverage effects. The inclusion of
the dummy specification controls for statistically significant differences between positive and
negative shocks on the conditional variance. The conditional variance in TGARCH(1,1) model
can be defined by the following specification:

 2 = 0 + ( 1 + 1 −1 ) −1
 2 2
 + 1 −1 (8)

where −1 is an indicator for a negative shock:

 2
 1 −1
when the rate of return switches sign (Poon and Granger, 2003). However, as mentioned earlier,
the asymmetric models tend to underpredict the volatility forecasts.

 4 Data and Methodology

 4.1 Data
To investigate the forecasting ability of the GARCH-family models we use 10 stock market
indices daily closing prices. The data covers the period from the first trading day in January
2006 to the last trading day in March 2021 for all indices. This period was selected since it is
the newest data available when starting to write this thesis. Following the financial literature,
we treat trading days as a continuous time series ignoring weekends and holidays. The data
was collected from Thomas Reuters Datastream. The stock market indices are not dividend-
adjusted, meaning that when a company pays dividends, the stock value decreases by the
amount of the total payout which in turn decreases the price of the index.

The stock market indices are divided into an in-sample and out-of-sample period. The in-
sample period is used to estimate the parameters in order to produce a forecast of future
volatility. The out-of-sample and in-sample periods, for all indices, are based on the same time
period but have a different amount of trading days due to differences in holidays between
countries. The in-sample period is roughly 15 years, covering the first trading day in January
2006 to the last trading day in March 2020. This implies an out-of-sample period covering one
year of trading days that stretches from the last trading day in March 2020 to the last trading
day in March 2021.

 12
Table 1. Overview of stock market indices
 Index Name Region # of trading days
 DJI Dow Jones Industrial Average United States 3826
 IXIC Nasdaq 100 United States 3830
 N225 Nikkei 225 Japan 3724
 NSEI NIFTY 50 India 3766
 OMXSPI OMX Stockholm All Share Index Sweden 3822
 OSAEX Oslo Exchange All-share Index Norway 3802
 RUT Russel 2000 United States 3827
 SPX S&P 500 Index United States 3827
 SSEC Shanghai Composite Index China 3702
 STOXX50E EURO STOXX 50 Eurozone 3891

The GARCH-family models are estimated using daily returns. The closing price for each index
is used to calculate the daily log return, given by:

 = ( ( ) − ( −1
 ))
 (9)

where denotes the closing price at time and −1
 
 the closing price at time − 1.

The daily returns are occasionally subject to rare extreme values, which often occur during rare
market conditions such as the financial crisis of 2008. These market conditions could possibly
influence the ranking of volatility models and tend to be very difficult or impossible to predict
(Lyocsa, Molnar and Vyrost 2020). To cope with this potential problem and make our models
less dependent on these conditions, daily returns were subject to a rolling window filtering
procedure. Values of the daily returns that were above the 99,5 percentile were substituted with
the 99,5 percentile value, with a rolling window set to 1000 trading days. Descriptive statistics
of daily returns are displayed in the table below.

 13
Table 2. Descriptive statistics of daily returns
Index Mean SD Skew. Kurt. min. max.
DJI 0,028 1,119 -0,539 5,581 -6,415 5,271
IXIC 0,049 1,288 -0,307 3,208 -5,703 5,993
N225 0,015 1,416 -0,428 2,587 -7,028 5,645
NSEI 0,048 1,362 -0,204 3,520 -6,380 6,114
OMXSPI 0,028 1,263 -0,322 3,069 -5,789 5,820
OSAEX 0,032 1,398 -0,600 4,929 -7,980 6,730
RUT 0,036 1,547 -0,252 4,023 -7,806 6,892
SPX 0,032 1,157 -0,391 5,664 -6,305 6,157
SSEC 0,030 1,575 -0,616 3,691 -7,627 7,093
STOXX50E 0,001 1,310 -0,337 2,848 -6,295 5,640
 Daily returns are multiplied by 100%, so that Mean, SD, min and max are interpreted in percentages. SD denotes standard
 deviation, Skew. the skewness, and Kurt. the kurtosis.

 There is a high level of kurtosis for all of the 10 indices, indicating that the fat tail distribution
 of asset returns exists for these indices and that the distribution of daily returns seems to be
 leptokurtic. Furthermore, daily returns have a negative skewness, indicating that the deviation
 from the average return is often above positive values. The mean for all indices is positive and
 there is a fairly large discrepancy between the minimum and maximum value of daily returns.

 4.1.1 Distribution

 The distributions of returns may follow different shapes and are dependent on the underlying
 stochastic process and whether parameters are time varying, (Poon and Granger, 2003).
 Therefore, we need to test for what kind of distributions the error-term ( ) have for all of the
 selected indices. We test three commonly used distributions when modeling returns, that is,
 Normalmodeling t and Generalized Error Distribution (GED). In the below figures, the returns
 for STOXX50E were plotted against these three different distributions. As can be seen in the
 below figure, the Student t distribution is slightly better than Normal and GED and thus the
 best fit. All 10 indices followed the same procedure and reached the same conclusion as the
 STOXX50E. Thus, a Student t distribution is applied in all following GARCH models.

 14
Figure 1. Distribution plots

 4.2 Method

 4.2.1 Diagnostics test

To evaluate the validity and stability of the GARCH models, diagnostics tests are performed.
The GARCH framework assumes that the shock of an asset return is serially uncorrelated but
dependent and that volatility appears in clusters. The Ljung-box test evaluates the first
assumption and the ARCHLM-test the second property.

The Ljung-Box test by Ljung and Box (1978) was conducted. The test is applied on the lagged
residuals from the ARMA( , ) fit. The null-hypothesis states that there is no serial correlation
in the residuals against the alternative that they are serially correlated. Thus, the preferred result
is to accept the null hypothesis. The null hypothesis is accepted for all indices except for NESI,
which rejects the null hypothesis on a 10 % significant level. The Ljung–Box test statistics are
presented in Table 3.

Engle's (1982) LM test for autoregressive conditional heteroskedasticity (ARCH) effects was
performed on the residuals from the ARMA ( , ) fit and specified with 12 lags. The test
reveals if there are any ARCH-effects or in other words if volatility clustering is present, which
is an appropriate condition when applying the GARCH models. The null-hypothesis states that
there are no ARCH effects against the alternative that there exist ARCH ( ) disturbances. The
test shows that from lag 1 to lag 12 there are significant ARCH effects, at a 1 % significant
level. The 12th lags of the ARCHLM test are displayed in the table below.

 15
Table 3. Box-Ljung test and ARCHLM-test

 Index Q ARCHLM (12)
 DJI 0,0105 313***
 IXIC 0,0003 307***
 N225 0,0016 343***
 NSEI 3,4151* 405***
 OMXSPI 0,1408 351***
 OSAEX 0,6262 348***
 RUT 0,0031 316***
 SPX 0,0000 347***
 SSEC 1,3078 696***
 STOXX50E 0,0138 430***
Note: *, **, ***, indicates that the null hypothesis can be rejected at 10%, 5%, and 1%. Q denotes the Box-Lung test statistic
and ARCHLM (12) the LM test statistic for the 12th lag.

 4.2.2 Out of sample forecast method

The out-of-sample forecast method involves a one-day ahead rolling density forecast from the
GARCH models with a recursive refit every trading day. The “ugarchroll” function in R is
used. It is specified with a forecast length (the out of sample) consisting of one year of trading
days and it forecasts one day ahead with refit. Furthermore, the refit is done recursively, that
is, expanding the window and including all the previous data for each one day ahead forecast.
In other words, the first forecast is based on the parameter estimates of the in-sample
observations. By expanding the window, the second forecast is based on the parameter
estimates by using all of the historical data leading up to that point, that is using the in-sample
observations and including the first observation in the out of sample. To show this more
formally the one day ahead forecast of the GARCH(1,1) model is specified as:

 2
 ̂ +1 = ̂0 + ̂1 2 + ̂1 2 (10)

 2
Where ̂ +1 is the predicted volatility at time + 1 and the remaining is specified as before.
 2
The recursive refit implies that the forecast of ̂ + is calculated based on all of the historical
data leading up to that point, (Alexander, 2008). The forecasts of the remaining GARCH
models use the same forecasting method, however by regarding their different specifications
respectively.

 16
4.2.3 Realized volatility

The volatility proxy tries to accurately follow the underlying process that defines volatility. In
the presence of a noisy volatility proxy, the ranking of volatility forecasts could be inaccurate
and not be a reflection of the true conditional variance. Therefore, it is vital to consider a
volatility proxy that can accurately reflect the true underlying process of the volatility (Patton
2011).

The term realized volatility is defined as the sum of intraday squared returns at high frequencies
such as five or fifteen minutes. Realized volatility has been demonstrated to provide an accurate
estimate of the underlying process that defines volatility, (Poon and Granger 2003).
Furthermore, Patton (2011) argues that realized volatility is one of the less noisy proxies and
leads to less distortion. Thus, the proxy for volatility, that will be compared to the forecasted
volatility of the GARCH models, will in this thesis be the realized volatility. The data on
realized volatility is collected from Oxford-Man Institute’s Realized library (Heber et al., 2009)
and constructed from intraday high-frequency data on a five-minute sub-sample, for each index
collected.

 4.2.4 Forecast evaluation

In contradiction to the vast volatility forecasting literature, the literature on the evaluation of
the forecast is limited. Furthermore, there is no clear evidence for which forecast evaluation
measure that should be used or which measure that is preferred (Reschenhofer, Mangat and
Stark 2020)

In this paper, four different forecast evaluation measures are implemented to evaluate the best
forecast model. These methods measure the error of the forecast, that is the difference between
the realized volatility and the volatility forecast from the GARCH models. Three of these
evaluation methods are outlined in Poon and Granger (2003) and are commonly used in the
literature. These measures are the Mean Square Error (MSE), Mean Absolute Percentile Error
(MAPE) and Mean Absolute Error (MAE). Another widely used evaluation measure is the
asymmetric QLIKE loss function which also will be used in this paper.

 17
The asymmetric QLIKE (Quasi-Likelihood) loss function allows for an asymmetric loss when
evaluating the volatility forecasting performance. This implies that the QLIKE assigns more
weight to volatility underestimation and thus penalizes estimations of the forecast that are
below the realized volatility more severely, (Patton, 2011). This aspect is especially important
for risk managers and companies selling financial instruments since underprediction of the
volatility of assets could be costly. Furthermore, according to Patton (2011), the QLIKE loss
function offers a consistent model ranking even in the presence of a noisy volatility proxy. This
implies that the volatility proxy under the QLIKE measure offers the same ranking as if the
true volatility process would be used.

The MSE is a quadratic loss function that measures the average of the squared differences
between the proxy and the volatility forecast. The MSE is a symmetric loss function, which
means that it assigns equal weight to over- and under-estimations of the volatility forecast and
thus penalizes over and under-estimations equally. The MSE measure is therefore more
sensitive to over-predictions than the asymmetric QLIKE loss function which penalizes over-
predictions very little. As noted by (Hansen and Lunde, 2005), the MSE measure has been
proven to be appropriate when there are large differences between the forecast of the volatility
and the proxy. Similar to the QLIKE, the MSE measure also provides a consistent model
ranking in the presence of a noisy volatility proxy (Patton 2011).

The MAE measures the average of the absolute difference between the realized volatility and
the estimated forecast. The measure is more robust to outliers than the other methods,
(Hyndman and Koehler 2006). The MAE measure is also, like the MSE, a symmetric loss
function and it therefore penalizes over- and under-predictions equally.

Building on the MAE, the MAPE measures the average percentage error of the forecast and is
often used when comparing forecast performance between indices and assets since the measure
is unit-free. However, MAPE can take extreme values when the realized volatility is close to
zero considering that it is scale sensitive, (Hyndman and Athanasopoulos 2021). Furthermore,
in contradiction to the asymmetric OLKE measure, the MAPE measure penalizes positive
errors heavier than negative errors. This is caused by the percentage error that cannot exceed
100% for under-prediction of the forecast while it is not restricted at 100% for over-predictions,
(Hyndman and Koehler 2006).
The specifications of the forecast measurements are defined as:

 18
 
 −1
 = ∑( − ̂ )2 (11)
 =1

 −1
 ( 2 − ̂ 2 )
 = 100 ∑| | (12)
 2
 =1

 −1
 = ∑⌈ − ̂ ⌉ (13)
 =1

 = 2 − ( 2 ) − 1 (14)
 ̂ ̂ 

Where is the realized volatility and ̂ is the estimated forecasted volatility at time and 
is the number of forecasts. These four methods evaluate the model performance of the volatility
forecast and the lowest value of the measurements indicates the best model.

 19
5 Results

The out-of-sample forecast for all GARCH models and for each index are presented in
Appendix B1 and are plotted against the realized volatility for each index respectively. The
out-of-sample forecast for OMXSPI is also presented in Figure 2. Notably for all models is that
the volatility forecasts consistently underpredict the realized volatility. Furthermore, the
GARCH models are unable to capture the sudden spikes in the realized volatility and seem to
generally have issues in predicting the proxy during periods with high fluctuations. Also
notable is that the forecast for all models appears to react slower during periods with
fluctuations. Overall, our predictions of the volatility follow the trend of the proxy quite well
and are able to capture the underlying process of the volatility.

Since all of the models out-of-sample forecasts follow the trend of the realized volatility rather
well it is difficult to graphically draw any clear conclusions of which model that might be the
preferred one. The forecast for all models tends to overlap each other and the difference
between the models is fairly small. However, by studying the graphs (in Appendix B1), it
appears that the asymmetric models, on average, predict the volatility lower than the symmetric
models. Furthermore, the asymmetric models also seem to fluctuate slightly more than the
symmetric models.

 20
Figure 2. Out-of-sample forecast for OMXSPI

 OMXSPI
 0.05

 0.04

 0.03

 0.02

 0.01

 0.00
 3/31/20 5/31/20 7/31/20 9/30/20 11/30/20 1/31/21

 RV GARCH IGARCH EGARCH TGARCH

Note: The gray line represents the realized volatility (RV). The y-axis is the volatility, and the x-axis is the time period

To further evaluate the performance of the GARCH models the estimated forecast is compared
to the realized volatility by incorporating four different forecast evaluation measures. The
evaluation measures for all GARCH models and indices are presented in Table 5-8. The lowest
value of the evaluation measures indicates that the forecasted volatility is, on average, closest
to the proxy and thus the preferred model. Summary of the total number of times each GARCH
model was preferred according to the evaluation measures is presented in Table 4.

The results from the evaluation methods demonstrate that the IGARCH model was, on average,
closes to the realized volatility 27 out of 40 times across all indices, followed by the TGARCH
model which had the smallest loss value compared to the realized volatility 8 out of 40 times.
The result in table 4 also shows that the symmetric models had the smallest errors in 29 out of
40 times. Indicating that they on average generate the best predictions of the volatility and
follow the realized volatility better over the period compared to the asymmetric models.

 21
By studying each evaluation measure individually (see Table 5-8) we can observe that for
MAPE the TGARCH model was the preferred model 8 out of 10 times. This also corresponds
to the total number of times the TGARCH was preferred when all of the evaluation measures
are considered. Also noticeable regarding the MAPE measure is that it consistently ranks the
EGARCH model second best when the TGARCH was the preferred model, indicating that with
regards to MAPE the asymmetric models tend to perform better than the symmetric ones.
Moreover, when evaluating QLIKE, MSE and MAE it is clear that the symmetric models are
preferred and especially the IGARCH model, which was on average closes to the realized
volatility 27 out of 30 times, followed closely by the GARCH model. The MSE measure
suggests that the IGARCH model was preferred for every index studied. Furthermore, there is
no evidence for that a specific GARCH model is favorable for a certain index.

The forecast evaluation measures also illustrate what we graphically demonstrated. Firstly, that
the difference between the GARCH models forecasts is generally very small. Secondly, the
evaluated loss values are fairly small for all models, indicating that the forecasted volatility
follows the trend of the realized volatility quite well.

Table 4. Summary of evaluation measures
 Models Total times preferred
 GARCH 2
 IGARCH 27
 EGARCH 3
 TGARCH 8
Note: Total times preferred indicates the number of times the model had the lowest value of the evaluation measures and thus,
on average, closest to the realized volatility.

 22
Table 5. Forecast evaluation measure Asymmetric QLIKE loss function
Model DJI IXIC N255 NSEI OMXSPI OSAEX RUT SPX SSEC STOXX50 Total
GARCH 0,5217 0,6063 0,2763 0,3807 0,1600 0,7269 0,5237 0,4777 0,3610 0,7383 0
IGARCH 0,5194 0,5805 0,2705 0,3646 0,1562 0,7007 0,4949 0,4731 0,3587 0,7138 9
EGARCH 0,6182 0,7774 0,3096 0,4357 0,1693 0,7737 0,6403 0,5649 0,3511 0,8857 1
TGARCH 0,6503 0,8506 0,3110 0,4483 0,1759 0,7813 0,6480 0,6431 0,3544 0,8633 0
 Gray indicates the smallest error of the forecast compared to the realized volatility. The top row denotes each index and total
 denotes the number of times each model had the lowest evaluated loss estimate.

 Table 6. Forecast evaluation measure Mean Square Error (MSE)
Model DJI IXIC N255 NSEI OMXSPI OSAEX RUT SPX SSEC STOXX50 Total
GARCH 2,16E-05 2,73E-05 4,83E-06 2,41E-05 4,86E-07 7,14E-05 5,32E-05 6,28E-06 2,98E-06 6,88E-05 0
IGARCH 2,16E-05 2,72E-05 4,79E-06 2,40E-05 4,76E-07 7,13E-05 5,29E-05 6,27E-06 2,98E-06 6,87E-05 10
EGARCH 2,21E-05 2,75E-05 4,87E-06 2,42E-05 5,06E-07 7,16E-05 5,38E-05 6,29E-06 2,98E-06 6,92E-05 0
TGARCH 2,17E-05 2,75E-05 4,82E-06 2,41E-05 4,97E-07 7,15E-05 5,37E-05 6,37E-06 2,98E-06 6,90E-05 0
 Gray indicates the smallest error of the forecast compared to the realized volatility. The top row denotes each index and total
 denotes the number of times each model had the lowest evaluated loss estimate.

 Table 7. Forecast evaluation measure Mean Absolute Percentile Error (MAPE)
Model DJI IXIC N255 NSEI OMXSPI OSAEX RUT SPX SSEC STOXX50 Total
GARCH 80 83 231 94 87 78 85 117 193 123 1
IGARCH 80 85 238 96 91 79 90 118 195 124 0
EGARCH 75 76 223 83 66 70 68 104 209 111 1
TGARCH 74 75 221 80 65 69 67 98 207 112 8
 Gray indicates the smallest error of the forecast compared to the realized volatility. The top row denotes each index and total
 denotes the number of times each model had the lowest evaluated loss estimate.

 Table 8. Forecast evaluation measure Mean Absolute Error (MAE)
Model DJI IXIC N255 NSEI OMXSPI OSAEX RUT SPX SSEC STOXX50E Total
GARCH 0,0164 0,0191 0,0090 0,0132 0,0061 0,0200 0,0201 0,0140 0,0099 0,0232 1
IGARCH 0,0164 0,0188 0,0090 0,0130 0,0061 0,0198 0,0199 0,0140 0,0099 0,0230 8
EGARCH 0,0172 0,0201 0,0091 0,0137 0,0061 0,0202 0,0209 0,0141 0,0099 0,0242 1
TGARCH 0,0171 0,0204 0,0091 0,0137 0,0062 0,0202 0,0209 0,0145 0,0099 0,0241 0
 Gray indicates the smallest error of the forecast compared to the realized volatility. The top row denotes each index and total
 denotes the number of times each model had the lowest evaluated loss estimate.

 To briefly summarize the results, all of the GARCH models tend to follow the trend of the
 realized volatility quite well, with the exception of sudden spikes in the proxy and during times
 with high fluctuations. The symmetric models are preferred and especially the IGARCH model
 according to MSE, QLIKE and MAE. With regards to the MAPE the asymmetric models have
 the smallest evaluated loss, where the TGARCH model performs best.

 23
6 Discussion
The calculated errors from the forecast generally displayed a small evaluated loss when
compared to the realized volatility. Firstly, this implies that the GARCH models applied in this
paper, on average, predict the volatility rather well. Secondly, since we have used a method
that involves one day ahead forecast with a recursive expanding refit, it is not suppressing that
we get small forecasting errors and that the difference in the estimated loss values between the
models is quite small.

Our results regarding three of the evaluation measures (MSE, MAE and QLIKE) are quite
consistent and clear, that is, the symmetric models (especially the IGARCH model) outperform
the asymmetric models. Looking at the literature, when accounting for the negative relationship
between volatility and a shock in asset return by applying the asymmetric component in the
GARCH, these models generally perform better than the symmetric models, (see e.g., Harisson
and Moore 2011; Garbiel 2012; Evan and McMillan 2007). This somewhat opposes our main
results. However, when analyzing the performance of different GARCH models, results can
differ depending on various factors. Poon and Granger (2003) highlight that different time
periods and evaluation measures have an impact on which model that performs best. The time
interval studied in this thesis has captured some of the corona crisis, which had a significant
impact on the financial market, leading to a period with a substantial increase in volatility and
uncertainty in financial markets, (Zhanga, Hu and Ji 2020). Thus, when incorporating this
aspect of the analysis, which might reflect why, in this specific time period, the symmetric
models appear to outperform the asymmetric GARCH models.

If we can establish that the out-of-sample time period in this paper is a period with high
fluctuations, as described in Zhanga, Hu and Ji (2020), the findings in the literature regarding
the asymmetric GARCH models dominancy changes. The literature on GARCH model’s
performance during periods with increased volatility suggests that, in general, the symmetric
GARCH models perform better. Analyzing this in perspective to our result the MSE, MAE and
QLIKE measures follow these results, which favors the symmetric GARCH models. However,
the MAPE evaluation measure, which generally favors the asymmetric models, opposes the
literature.

 24
By taking a closer look at each of the forecasting evaluation measures in this paper, it can
provide us with a clearer understanding of the results. The two symmetric loss functions, which
penalize over- and under-estimations of the forecast equally, suggest that the best model is the
IGARCH model and consistently favors the two symmetric models. This indicates that, with
no consideration taken to weigh the forecasting errors differently, the symmetric GARCH
models are superior and follow, on average, the realized volatility better. Moreover, when more
weight is applied to volatility underestimation, as with the QLIKE loss function, the symmetric
models are still superior and specifically the IGARCH. Looking at the Graphs in appends B1,
it appears that the symmetric models are, on average, above the asymmetric models and the
realized volatility is above all model types. This indicates that the QLIKE loss function, on
average, penalizes the underestimations of the asymmetric GARCH models heavier. Thus,
when accounting for that the estimated losses are asymmetric, the superior set of models are
the IGARCH and GARSH.

The only evaluation measure that systematically favors the asymmetric GARCH model is the
MAPE. Arguably, a potential cause is the scale sensitivity in the MAPE measure, which
implies that the errors of the forecast can take on extreme values when the realized volatility is
close to zero. By studying the Graphs in Appendix B1, it is clear that during several periods
the realized volatility is very low with values below 0,02, which leads to a very high estimated
loss in the forecasted volatility. This is a possible cause why the MAPE measure penalizes the
symmetric models very severely during periods with low realized volatility and thus favoring
the asymmetric GARCH models. Furthermore, Hyndman and Koehler (2006) suggest that the
MAPE measure is not fully appropriate with data that are close to zero. Thus, regarding the
scale sensitivity in the MAPE measure and the suggestion in Hyndman and Koehler (2006),
the MAPE results should be taken with caution.

By evaluating the features of the IGARCH model we can get a better understanding of why it
generally generates the best forecasts. The IGARCH model has a non-stationary variance
process such that the impact of shocks on volatility is persistent. This implies that the volatility
is not mean reverting, where convergence of the conditional variance forecast will not occur.
These properties are arguably favorable during more volatile periods since it produces larger
volatility forecasts, on average, compared to the asymmetric models. Furthermore, Poon and
Granger (2003) also suggest that models that don’t have convergence to the conditional
variance tend to provide a larger forecast. Thus, this evidence suggests that the leverage effects

 25
in the asymmetric GARCH models tend to not be as significant during more volatile periods
and that the symmetric models produce better volatility forecasts on average.

 7 Conclusion

This thesis has studied the GARCH-family models volatility forecasting performance,
including two symmetric and two asymmetric GARCH models to examine which specific
GARCH model that produces the best volatility forecasts. Predictions were based on the daily
returns from ten different stock indices with the newest financial data available. The method
of use includes one day ahead forecast with a recursive expanding refit. The estimated forecast
for each index and GARCH model was compared to the realized volatility, which acts as a
measure for the true underlying volatility process. The estimated loss value between the
realized volatility and the predicted forecast was evaluated by four different evaluation
measures the Mean Square Error (MSE), Mean Absolute Percentile Error (MAPE), Mean
Absolute Error (MAE) and the asymmetric QLIKE loss function.

The evaluated forecasting performance suggests that the symmetric GARCH models and
specifically the IGARCH model, on average, generated the most accurate forecast according
to the MSE, MAE and QLIKE measures. This is consistent with previous research performed
during periods with increased volatility. However, opposes the evidence that, during regular
periods, asymmetric GARCH models generally perform better. Moreover, according to the
MAPE measure, the asymmetric GARCH models forecasts had the smallest evaluated loss.

This study has aimed to provide an accurate evaluation of the forecasting performance of
GARCH-family models. By examining several GARCH-models, using a large number of stock
market indices with the newest financial data available, several forecast evaluation measures
and a volatility proxy proven to be superior, we believe that our results can contribute to the
volatility forecasting literature.

 26
Given the time restrictions during this thesis, only one specific time period was applied to
investigate the performance of the model forecasts. It would be interesting to evaluate a large
number of stock market indices over several time periods to give more depth to the analysis.
Moreover, for future studies it would be interesting to focus on periods with increased volatility
and investigate if the symmetric GARCH models are still best in predicting the volatility
forecast. It would also be of value to research more about how to optimally evaluate the
performance of the forecasts. One of the most important aspects of the forecasting exercise is
arguably how to compare the forecasting performance between volatility models. As of now,
the construction of volatility models and forecasts is the focus in the literature with little
attention on the evaluation. There are several statistical measurements used to evaluate the
error of the forecast compared to the realized volatility, which can give different results
depending on its specifications.

 27
8 References

Alexander, C. 2001. Market models. A guide to financial data analysis. New Jersey: John
Wiley & Sons.

Alexander, C. 2008. Market risk analysis. Practical financial econometrics. New Jersey: John
Wiley & Sons.

Andersen, T., and Bollerslev, T. 1998. Answering the skeptics: Yes, standard volatility models
do provide accurate forecasts. International Economic Review 39(4): 885-905.

Awartani, B., and Corradi, V. 2005. Predicting the volatility of the S&P-500 stock index via
GARCH models: the role of asymmetries. International Journal of Forecasting 21(1): 167-
183.

Bollerslev, T. 1986. Generalized Autoregressive Conditional Heteroscedasticity. Journal of
Econometrics 31(3): 307-327.

Brooks, C. 2014. Introductory econometrics for finance. 3rd edition. Cambridge: Cambridge
University Press.

Engle, R. 1982. Autoregressive Conditional Heteroscedasticity with Estimates of the Variance
of United Kingdom Inflation. Econometrica 50(4): 987-1007.

Evans, T., and McMillan, D. 2007. Volatility forecasts: The role of asymmetric and long-
memory dynamics and regional evidence. Applied Financial Economics 17(17) 1421-1430.

Gabriel, A. S. 2012 Evaluating the Forecasting Performance of GARCH Models. Evidence
from Romania. Procedia - Social and Behavioral Sciences 62(1): 1006-1010.

Glosten, L., Jagannathan, R., and Runkle, D. 1993. On the Relation between the Expected
Value and the Volatility of the Nominal Excess Return on Stocks. The Journal of Finance
48(5): 1779–1801.

 28
You can also read