VOLATILITY FORECASTING ON GLOBAL STOCK MARKET INDICES - EVALUATION AND COMPARISON OF GARCH-FAMILY MODELS FORECASTING PERFORMANCE - DIVA PORTAL
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Volatility forecasting on global stock market indices Evaluation and comparison of GARCH- family models forecasting performance Simon Molin Master Thesis Economics, 15 hp Spring term 2021
Page intentionally left blank
Acknowledgment I would like to express my gratitude towards Giovanni Forchini for his guidance and support throughout this thesis. Sincerely, Simon Molin 2021-05-30
Page intentionally left blank
Abstract Volatility is arguably one of the most important measures in financial economics since it is often used as a rough measure of the total risk of financial assets. Many volatility models have been developed to model the process, where the GARCH-family models capture several characteristics that are observed in financial data. An accurate volatility forecast is of great value for monetary policymakers, risk managers, investors and for assessing the price and value of financial derivates. The purpose of this thesis is to evaluate if asymmetric or symmetric GARCH models generate better volatility forecast and specifically which model that is superior. Two symmetric models, the GARCH and IGARCH, and two asymmetric models, the EGARCH and TGARCH are used. Daily volatility forecasts on the returns from 10 stock market indices, where the predicted forecasts cover the period from 2020-03-31 to 2021-03- 31, will be compared to the realized volatility by four different evaluation measures. The evidence suggests that the symmetric GARCH models, on average, produce the best volatility forecast and specifically the IGARCH model.
Table of Contents 1 Introduction ............................................................................................................. 1 2 Literature ................................................................................................................. 3 3 Theoretical framework ............................................................................................. 6 3.1 ARCH model................................................................................................................. 6 3.2 Conditional mean......................................................................................................... 7 3.3 Symmetric models ....................................................................................................... 8 3.3.1 GARCH model ...................................................................................................................................8 3.3.2 IGARCH model ..................................................................................................................................9 3.4 Asymmetric models ................................................................................................... 10 3.4.1 EGARCH model .............................................................................................................................. 10 3.4.2 TGARCH model .............................................................................................................................. 11 4 Data and Methodology ...........................................................................................12 4.1 Data .......................................................................................................................... 12 4.1.1 Distribution ................................................................................................................................... 14 4.2 Method ..................................................................................................................... 15 4.2.1 Diagnostics test ............................................................................................................................. 15 4.2.2 Out of sample forecast method .................................................................................................... 16 4.2.3 Realized volatility .......................................................................................................................... 17 4.2.4 Forecast evaluation ....................................................................................................................... 17 5 Results ....................................................................................................................20 6 Discussion ...............................................................................................................24 7 Conclusion...............................................................................................................26 8 References ..............................................................................................................28 9 Appendix.................................................................................................................31
1 Introduction The financial literature has put considerable attention on modeling and forecasting stock market volatility over the past decades. There are numerous motivations to why this has been the focus. Volatility is often used as a rough measure of the total risk of financial assets and is arguably one of the most important measures in financial economics. Since volatility is not directly observable an accurately estimated forecast of stock markets volatility is important for several financial applications. Particularly for monetary policymaking, risk management, portfolio selection, and asset and financial derivates valuation (Brooks 2014). Financial market volatility has commonly seen characteristics that are well documented. Volatility clustering, leverage effects, mean reversion and similar movements of volatilities between financial markets are some of these features, (Poon and Granger 2003). Volatility clustering implies that volatility usually fluctuates between certain high and low periods. Leverage effects refer to that large price drops seem to have a greater impact on volatility compared to an equally large increase in prices. Furthermore, volatility progresses over time in a continuous manner and it varies within some fixed range, that is volatility is usually stationary (Tsay 2013). The GARCH-family models have proven to capture many of these characteristics in financial markets. The symmetric GARCH models capture volatility clustering and the fat tail distribution in a series. The GARCH models with the asymmetric component allow the response of volatility to positive and negative shocks to be asymmetric, with a greater impact on negative shocks (Tsay 2013). The purpose of this paper is to examine the performance of GARCH-family models volatility forecasts by using ten stock market indices. Two symmetric GARCH models will be included, these are the standard Generalised Autoregressive Conditional Heteroskedasticity (GARCH) model and the Integrated Autoregressive Conditional Heteroskedasticity (IGARCH) model. Two asymmetric GARCH models will be considered to account for the leverage effects, these are the Threshold Generalised Autoregressive Conditional Heteroskedasticity (TGARCH) model and the Exponential Generalised Autoregressive Conditional Heteroskedasticity (EGARCH) model. Specifically, the study examines which of the two model types and which 1
specific GARCH model that generates the best volatility forecasts by the usage of daily returns from a large number of stock market indices. Since volatility is not directly observable a proxy of the true volatility will be used and compared to the estimated forecasts from the GARCH-family models. The realized volatility will be used as a proxy for the true volatility, which is the sum of intraday squared returns at high frequencies (Andersen and Bollerslev 1998). To evaluate the performance of each GARCH model four different evaluation measures will be used these are the Mean Square Error (MSE), Mean Absolute Percentile Error (MAPE), Mean Absolute Error (MAE) and the asymmetric QLIKE loss function. These measures evaluate the difference between the realized volatility and the volatility forecast. The method of use includes one day ahead forecast with a recursive expanding refit. Several studies in the volatility forecasting literature use a small number of stock market indices and at times just one evaluation measure to examine the performance of GARCH models forecasting performance. This research aims to examine several GARCH-models by using a large number of stock market indices, several forecast evaluation measures and a volatility proxy proven to be superior to provide an evaluation of GARCH models forecasting performance that are as accurate as possible. The paper is organized as follows: Section 2 provides extensive coverage of the existing literature on volatility forecasting, specifically for GARCH models. In section 3 the theoretical framework is presented which describes the models and their specifications. Section 4 presents the financial data used to forecast the volatility and the methods that are implemented. In sections 5 to 7, we present and discuss the result and finally we summarize and conclude. 2
2 Literature Poon and Granger (2003), provides a comprehensive coverage of the volatility forecasting literature. In their paper, they examine 93 papers and their collective findings. They emphasize that volatility forecasting is important for assets valuation, risk management, investments and monetary policymaking. Time series volatility forecasting models are categorized into four groups, these are the GARCH-family models, stochastic volatility (SV) models, option-based volatility models such as the Black-Scholes model, and historical volatility models such as the exponential moving average model (EWMA). Their overall findings suggest that option-based volatility models are superior in forecasting, followed by GARCH- and historical volatility models. Regarding the GARCH-family models, such as the EGARCH model, which accounts for the strong negative relationship between volatility and a shock in asset return, generally outperforms the symmetric GARCH models. However, a drawback when comparing a variety of studies is that they use different data sets, assets, time periods and evaluation techniques. It has been established by the literature that the GARCH-family models can be divided into two groups, the symmetric models and the asymmetric models (Tsay 2013). In this thesis, the symmetric models are the GARCH and the IGARCH model and the asymmetric models are the EGARCH and the TGARCH model. The difference between the symmetric- and the asymmetric GARCH models is that the latter can capture the leverage effects, where a negative shock on prices has a larger effect on volatility than an equally large positive price shock (Alexander 2001). In financial time series and financial market volatility there are several features that have been observed and that are well documented. The fat tail distribution of asset returns, volatility clustering, mean reversion and similar movements of volatilities between financial markets are some of these features (Poon and Granger 2003). Moreover, recent studies also provide evidence for that volatility is stronger during bear markets and financial crises. Where large price drops seem to have a greater impact on volatility compared to an equally large increase in prices. The leverage effect defines this characteristic (Tsay 2013). Historically various GARCH models have been implemented to capture these kinds of features in the financial markets. For example, the GARCH model developed by Bollerslev (1986) captures the 3
properties of fat tail distribution and volatility clustering. The GARCH-family models have been extended to include more of the features that are observed in the financial markets. When evaluating GARCH models forecasting performance, the literature usually employs various statistical evaluation measurements, such as the Mean Square Error (MSE), Mean Absolute Percentile Error (MAPE), Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), (Poon and Granger, 2003). These measurements are specified differently, however, they all measure the error of the forecast which is the estimated loss value between the forecasted volatility and the volatility proxy (Reschenhofer, Mangat and Stark, 2020). Moreover, the literature on the evaluation of the volatility forecast has been fairly criticized and is not as excessive as the literature on the construction of volatility models and forecast, (Poon and Granger 2003). However, Patton (2011), derives two forecasting evaluation measures that are robust to noise in the volatility proxy. These are the asymmetric QLIKE loss function and the Mean Square Error (MSE). These measures offer a consistent model ranking even in the presence of a noisy volatility proxy. This implies that the volatility proxy under the robust measures offers the same ranking as if the true volatility process would be used. The volatility proxy tries to capture the unobserved underlying process that defines volatility. Andersen and Bollerslev (1998) suggested that intra-day high-frequency data on the sum of the squared returns is a sufficient proxy of the true volatility, this proxy is defined as the realized volatility. Before the introduction of realized volatility, when high-frequency data was not as easily accessible, daily squared returns were a proxy that was widely applied to reflect the true volatility. Awartani and Corradi (2005) examine the out-of-sample predictive ability and accuracy of various GARCH models with a focus on the asymmetric component in the models. The models studied are the same as in this paper, that is the GARCH, IGARCH, TGARCH and EGARCH, and other variations of GARCH-models. Their data consist of daily observations of the S&P- 500 index converging the period from 1900 to 2001 and they use various out-of-sample forecast horizons. Their main finding suggests that the asymmetric GARCH models outperform the GARCH model, however, the superior predictive ability in the asymmetric models diminishes as the forecasting horizon increases. Moreover, the GARCH model is superior when compared to other symmetric versions. 4
Evan and McMillan (2007) studied the performance of nine different models of daily volatility including the GARCH model and several asymmetric GARCH models. The data included 33 stock market indices globally, with the out-of-sample forecast covering the period from the beginning of 2004 to April 2005. They evaluate their results based on the RMSE measure. Firstly, they conclude that the GARCH-family model is superior to the historical volatility models such as moving average models. Secondly, GARCH models with the asymmetric component are superior to the standard GARCH model. Where the EGARCH model appears to outperform the other models. Garbiel (2012), evaluated the forecasting performance of several GARCH models, including the TGARCH, IGARCH, EGARCH and Power-GARCH. The data covers the period from 2001 to 2012 with daily observations of the Bucharest Stock Exchange Trading Index and the forecasts are evaluated with the RMSE, MAE, MAPE and Theil inequality coefficient (TIC) measures. They conclude that the most model with the best forecasting ability was the TGARCH model. Another study that focuses on the European stock markets and evaluates which GARSH model that has the best volatility forecasting ability is Harrison and Moore (2011). They include 10 stock exchanges in the European region covering the period from 1991 to 2008. Six widely used loss functions are implemented, and they conclude that GARCH models which allow for the asymmetric component consistently outperform the symmetric GARCH models, specifically the EGARCH model was one of the models that were preferred. Lim and Sek (2013) studied the performance of symmetric and asymmetric GARCH models during the financial crisis in Malaysia in 1997. The data were categorized into three different groups, i.e., pre-crisis, post-crisis and during-crisis They used three evaluation measures, the MSE, MAPE and RMSE, and three different models, the GARCH, EGARCH and TGARCH model. They found that in general the symmetric GARCH models perform better during periods with high fluctuations (during-crisis) and the asymmetric models perform better during normal periods (pre- and post-crisis). To summarize the literature review, GARCH models that allow for asymmetries seem to, in general, outperform the symmetric GARCH models. Furthermore, during more volatile periods the evidence suggests that the symmetric GARCH models perform better. It is important to 5
apply a proxy for the true volatility that is as accurate as possible, and the literature suggests that using a proxy, such as the realized volatility, which is based on intra-day high-frequency data is superior. Finally, regarding the different forecast evaluation measures, that calculate the error of the forecast compared to the volatility proxy, the literature has applied several widely used statistical measures. Patton (2011) suggests that the QLIKE and MSE are two measures that are robust to noise in the volatility proxy. 3 Theoretical framework The ARCH and GARCH models are some of the most frequently used models for forecasting volatility. The GARCH-family models use conditional variance of returns and are estimated through a maximum likelihood function. These models capture various important features of the financial market. Firstly, they can capture the fat tail distribution of asset returns. Secondly, they account for that the volatility in financial markets tends to appear in clusters. This means that when assets exhibit large returns, the reruns in the following period are also expected to be large and vice versa. Finally, the leverage effects, which describe the tendency for volatility to increase more after a price drop than a price rise with the same magnitude. The intuition here is that a price drop increases the financial leverage which makes the asset riskier and therefore increases its volatility (Brooks 2014). 3.1 ARCH model When starting to describe the GARCH-family framework, we start with the simplest version of these models which is the autoregressive conditionally heteroscedastic model (ARCH model) developed by Engle (1982). The ARCH model describes the conditional variance as a simple quadratic function of its lagged squared errors. Furthermore, the shock of an asset return is serially uncorrelated but dependent. The ARCH(q) model for conditional variance can be written as (Tsay 2013): 2 = 0 + 1 −1 2 2 + 2 −2 2 + ⋯ + − (1) = 6
Where is a sequence of iid random variables with zero mean and unit variance. 2 is the conditional variance, − are the lagged squared errors and 0 is the long-term average value of the variance. The distribution of (in all following GARCH-family models) is assumed to follow either a standard normal, standardized student t or a generalized error distribution (GED). Furthermore, non-negative constraints are applied on the coefficients in order to ensure a positive conditional variance, where: ≥ 0 = 0,1,2, … , The ARCH model provides a sufficient fundamental framework for the analysis of time series modeling of volatility. However, there are some weaknesses with this model. The lagged squared errors are required to capture all of the dependence in the conditional variance and could be very large. Furthermore, the ARCH reacts slowly to large shocks in the return of an asset and is thus likely to overpredict the volatility (Tsay 2013). The natural extension to overcome the weaknesses in the ARCH model are the GARCH models discussed below. 3.2 Conditional mean The GARCH-family models consist of two equations, that is a conditional variance equation and a conditional mean equation. Thus, before starting to describe the GARCH models in terms of the conational variance, we need to consider the conditional mean since the conditional variance is measured around the mean (Alexander 2008). The mean model specification includes autoregressive terms AR( ) and moving average terms MA( ). For all following GARCH models, the mean model specification can be expressed by the general ARMA( , ): = 0 + ∑ − + ∑ − + (2) =1 =1 Where is the autoregressive parameter, is the moving average parameter, 0 is a constant and a random variable assumed to be i.i.d. with zero mean and unity variance. 7
In order to find the best possible order for the ARMA( , ) model the “auto.arima” function in R was used. The function conducts a search of different combinations of AR and MA processes to find the best possible model according to the lowest value of the Bayesian Information Criterion (BIC). For all of the stock indices studied, except for the indices in the United States, the ARMA order (0,0) with non-zero mean was the preferred model. Thus, the mean model for these indices will follow an ARMA(0,0) process. Regarding the indices in the United States, the preferred mean model specification follows an ARMA(0,1) process with a non-zero mean. The results for the ARMA( , ) order selections are presented in Appendix A1. 3.3 Symmetric models The symmetric GARCH models apply non-negative constraints on parameters to avoid a negative volatility since a negative volatility would be meaningless and cannot be interpreted. The symmetric GARCH models capture volatility clustering and the fat tail distribution in a series, however, these models don’t capture the leverage effects (Tsay 2013). Two symmetric GARCH models are applied, the standard GARCH model and the IGARCH model. 3.3.1 GARCH model The generalized ARCH model (GARCH), which is widely used in the volatility forecasting litterateur, was developed by Bollerslev (1986) and is an extension of the ARCH model. The GARCH model has the benefit of capturing the heavy tail distribution of asset returns and volatility clustering. However, one of the weaknesses with the GARCH model is that it doesn’t account for the leverage effects in asset returns. The difference between the ARCH and GARCH model is that the GARCH allows the conditional variance to be dependent on previous own lags besides lagged squared errors. The conditional variance in the GARCH is parameterized to depend on lags of the squared errors and lags of the conditional variance, thus the question of which orders to choose arises. In the finance literature and in the volatility forecasting literature there are rarely any higher- order models estimated than GARCH(1,1) since it is sufficient to capture the volatility clustering in financial data. Thus, the following GARCH-models applied in this thesis will follow the formulation of (1,1). 8
We can express the GARCH(1,1) model as: 2 = 0 + 1 −1 2 2 + 1 −1 (3) = 1 + 1 < 1 Where 2 is the conditional variance and 0 is a function of the long-term average variance rate. The parameter 1 attached to the lagged squared error ( −1 ) measures the effect of a shock on conditional variance whereas the parameter 1 attach to the lagged conditional 2 variance ( −1 ) measures the persistence in volatility. Following the ARCH model, we also put non-negative constraints on the coefficients and is defined as before. Considering the last constraint that 1 + 1 < 1, implies that the conditional variance 2 evolves over time and unconditional variance is finite. This leads to a forecast of the volatility that will converge to the long-term average rate of the variance as the forecast horizon increases (Tsay 2013). 3.3.2 IGARCH model The integrated GARCH model (IGARCH) has one major difference to the GARCH, that is, it has a non-stationary variance process such that the impact of shocks on volatility is persistent. This means that volatility is not mean reverting (Alexander 2001). To see this, consider the last constraint in the GARCH model 1 + 1 < 1. For the GARCH(1,1) the unconditional variance of is constant and expressed as: 0 ( ) = (4) 1 − ( 1 + 1 ) However, if 1 + 1 ≥ 1 the unconditional variance for is not defined, which implies non- stationarity in variance. If 1 + 1 = 1 this would imply a unit-root in variance which would be known as the IGARCH model, where convergence of the conditional variance forecast will not occur. Building on the previous ARCH and GARCH model the following equation describes an IGARCH(1,1): 9
2 = 0 + (1 − 1 ) −1 2 2 + 1 −1 (5) = Where we use similar notations as to the GARCH model, non-negativity constraints on the parameters and where is defined as before. The strength of the IGARCH model is arguably its ability to capture occasionally level shifts in volatility (Tsay 2013). However, it has similar weaknesses as the GARCH, since the model assumes that positive and negative shocks have the same effect on the volatility. 3.4 Asymmetric models The financial literature has developed models that will account for the weaknesses in the symmetric GARCH models by capturing the leverage effects, where the response of volatility to positive and negative shocks are asymmetric, with a larger impact on negative shocks. The following two models (EGARCH and TGARCH) are two commonly used models used to manage the leverage effects. 3.4.1 EGARCH model The exponential GARCH model (EGARCH) was developed by Nelson (1991). There are several advantages with the EGARCH model over the symmetric GARCH models. Firstly, the conditional variance in logarithmic form is modeled, which removes the need to impose non- negative constraints on the parameters. Secondly, it handles the leverage effects by incorporating an asymmetric response function. Thirdly, it permits an oscillatory behavior which is random in the variance process (Nelson 1991). The EGARCH(1,1) specification can be defined by the equation: ( 2 ) = 0 + ( −1 ) + ( −1 2 ) (6) where ( −1 ) = −1 + (⌈ −1 ⌉ − √2⁄ ) (7) 10
The EGARCH-model expresses the logged conditional variance in three parts, which is a long-run average value of the variance, ( −1 ) the asymmetric response function and 2 ( −1 ) the logged conditional variance in the previous period. The asymmetric response function, which handles the leverage effects, includes −1 which decides the sign of the effect and (⌈ −1 ⌉ − √2⁄ ) the decides the size of the effect (Alexander 2001). The EGARCH-model is widely used in the volatility forecasting literature, arguably for its simple specification that manages to capture the leverage effects. Furthermore, many studies suggest that the logarithmic specifications in the EGARCH seem to be appropriate when modeling asset returns in financial data. However, the asymmetric models tend to have a smaller volatility forecast than the symmetric models, which could be an issue (Poon and Granger 2003). 3.4.2 TGARCH model To further evaluate the asymmetric GARCH models we include the threshold GARCH model (TGARCH), also known as the GJR model, which was developed by Glosten, Jagannathan and Runkle (1993). The TGARCH model is a simple extension to the GARCH model where a multiplicative dummy variable is included to account for the leverage effects. The inclusion of the dummy specification controls for statistically significant differences between positive and negative shocks on the conditional variance. The conditional variance in TGARCH(1,1) model can be defined by the following specification: 2 = 0 + ( 1 + 1 −1 ) −1 2 2 + 1 −1 (8) where −1 is an indicator for a negative shock: 2 1 −1
when the rate of return switches sign (Poon and Granger, 2003). However, as mentioned earlier, the asymmetric models tend to underpredict the volatility forecasts. 4 Data and Methodology 4.1 Data To investigate the forecasting ability of the GARCH-family models we use 10 stock market indices daily closing prices. The data covers the period from the first trading day in January 2006 to the last trading day in March 2021 for all indices. This period was selected since it is the newest data available when starting to write this thesis. Following the financial literature, we treat trading days as a continuous time series ignoring weekends and holidays. The data was collected from Thomas Reuters Datastream. The stock market indices are not dividend- adjusted, meaning that when a company pays dividends, the stock value decreases by the amount of the total payout which in turn decreases the price of the index. The stock market indices are divided into an in-sample and out-of-sample period. The in- sample period is used to estimate the parameters in order to produce a forecast of future volatility. The out-of-sample and in-sample periods, for all indices, are based on the same time period but have a different amount of trading days due to differences in holidays between countries. The in-sample period is roughly 15 years, covering the first trading day in January 2006 to the last trading day in March 2020. This implies an out-of-sample period covering one year of trading days that stretches from the last trading day in March 2020 to the last trading day in March 2021. 12
Table 1. Overview of stock market indices Index Name Region # of trading days DJI Dow Jones Industrial Average United States 3826 IXIC Nasdaq 100 United States 3830 N225 Nikkei 225 Japan 3724 NSEI NIFTY 50 India 3766 OMXSPI OMX Stockholm All Share Index Sweden 3822 OSAEX Oslo Exchange All-share Index Norway 3802 RUT Russel 2000 United States 3827 SPX S&P 500 Index United States 3827 SSEC Shanghai Composite Index China 3702 STOXX50E EURO STOXX 50 Eurozone 3891 The GARCH-family models are estimated using daily returns. The closing price for each index is used to calculate the daily log return, given by: = ( ( ) − ( −1 )) (9) where denotes the closing price at time and −1 the closing price at time − 1. The daily returns are occasionally subject to rare extreme values, which often occur during rare market conditions such as the financial crisis of 2008. These market conditions could possibly influence the ranking of volatility models and tend to be very difficult or impossible to predict (Lyocsa, Molnar and Vyrost 2020). To cope with this potential problem and make our models less dependent on these conditions, daily returns were subject to a rolling window filtering procedure. Values of the daily returns that were above the 99,5 percentile were substituted with the 99,5 percentile value, with a rolling window set to 1000 trading days. Descriptive statistics of daily returns are displayed in the table below. 13
Table 2. Descriptive statistics of daily returns Index Mean SD Skew. Kurt. min. max. DJI 0,028 1,119 -0,539 5,581 -6,415 5,271 IXIC 0,049 1,288 -0,307 3,208 -5,703 5,993 N225 0,015 1,416 -0,428 2,587 -7,028 5,645 NSEI 0,048 1,362 -0,204 3,520 -6,380 6,114 OMXSPI 0,028 1,263 -0,322 3,069 -5,789 5,820 OSAEX 0,032 1,398 -0,600 4,929 -7,980 6,730 RUT 0,036 1,547 -0,252 4,023 -7,806 6,892 SPX 0,032 1,157 -0,391 5,664 -6,305 6,157 SSEC 0,030 1,575 -0,616 3,691 -7,627 7,093 STOXX50E 0,001 1,310 -0,337 2,848 -6,295 5,640 Daily returns are multiplied by 100%, so that Mean, SD, min and max are interpreted in percentages. SD denotes standard deviation, Skew. the skewness, and Kurt. the kurtosis. There is a high level of kurtosis for all of the 10 indices, indicating that the fat tail distribution of asset returns exists for these indices and that the distribution of daily returns seems to be leptokurtic. Furthermore, daily returns have a negative skewness, indicating that the deviation from the average return is often above positive values. The mean for all indices is positive and there is a fairly large discrepancy between the minimum and maximum value of daily returns. 4.1.1 Distribution The distributions of returns may follow different shapes and are dependent on the underlying stochastic process and whether parameters are time varying, (Poon and Granger, 2003). Therefore, we need to test for what kind of distributions the error-term ( ) have for all of the selected indices. We test three commonly used distributions when modeling returns, that is, Normalmodeling t and Generalized Error Distribution (GED). In the below figures, the returns for STOXX50E were plotted against these three different distributions. As can be seen in the below figure, the Student t distribution is slightly better than Normal and GED and thus the best fit. All 10 indices followed the same procedure and reached the same conclusion as the STOXX50E. Thus, a Student t distribution is applied in all following GARCH models. 14
Figure 1. Distribution plots 4.2 Method 4.2.1 Diagnostics test To evaluate the validity and stability of the GARCH models, diagnostics tests are performed. The GARCH framework assumes that the shock of an asset return is serially uncorrelated but dependent and that volatility appears in clusters. The Ljung-box test evaluates the first assumption and the ARCHLM-test the second property. The Ljung-Box test by Ljung and Box (1978) was conducted. The test is applied on the lagged residuals from the ARMA( , ) fit. The null-hypothesis states that there is no serial correlation in the residuals against the alternative that they are serially correlated. Thus, the preferred result is to accept the null hypothesis. The null hypothesis is accepted for all indices except for NESI, which rejects the null hypothesis on a 10 % significant level. The Ljung–Box test statistics are presented in Table 3. Engle's (1982) LM test for autoregressive conditional heteroskedasticity (ARCH) effects was performed on the residuals from the ARMA ( , ) fit and specified with 12 lags. The test reveals if there are any ARCH-effects or in other words if volatility clustering is present, which is an appropriate condition when applying the GARCH models. The null-hypothesis states that there are no ARCH effects against the alternative that there exist ARCH ( ) disturbances. The test shows that from lag 1 to lag 12 there are significant ARCH effects, at a 1 % significant level. The 12th lags of the ARCHLM test are displayed in the table below. 15
Table 3. Box-Ljung test and ARCHLM-test Index Q ARCHLM (12) DJI 0,0105 313*** IXIC 0,0003 307*** N225 0,0016 343*** NSEI 3,4151* 405*** OMXSPI 0,1408 351*** OSAEX 0,6262 348*** RUT 0,0031 316*** SPX 0,0000 347*** SSEC 1,3078 696*** STOXX50E 0,0138 430*** Note: *, **, ***, indicates that the null hypothesis can be rejected at 10%, 5%, and 1%. Q denotes the Box-Lung test statistic and ARCHLM (12) the LM test statistic for the 12th lag. 4.2.2 Out of sample forecast method The out-of-sample forecast method involves a one-day ahead rolling density forecast from the GARCH models with a recursive refit every trading day. The “ugarchroll” function in R is used. It is specified with a forecast length (the out of sample) consisting of one year of trading days and it forecasts one day ahead with refit. Furthermore, the refit is done recursively, that is, expanding the window and including all the previous data for each one day ahead forecast. In other words, the first forecast is based on the parameter estimates of the in-sample observations. By expanding the window, the second forecast is based on the parameter estimates by using all of the historical data leading up to that point, that is using the in-sample observations and including the first observation in the out of sample. To show this more formally the one day ahead forecast of the GARCH(1,1) model is specified as: 2 ̂ +1 = ̂0 + ̂1 2 + ̂1 2 (10) 2 Where ̂ +1 is the predicted volatility at time + 1 and the remaining is specified as before. 2 The recursive refit implies that the forecast of ̂ + is calculated based on all of the historical data leading up to that point, (Alexander, 2008). The forecasts of the remaining GARCH models use the same forecasting method, however by regarding their different specifications respectively. 16
4.2.3 Realized volatility The volatility proxy tries to accurately follow the underlying process that defines volatility. In the presence of a noisy volatility proxy, the ranking of volatility forecasts could be inaccurate and not be a reflection of the true conditional variance. Therefore, it is vital to consider a volatility proxy that can accurately reflect the true underlying process of the volatility (Patton 2011). The term realized volatility is defined as the sum of intraday squared returns at high frequencies such as five or fifteen minutes. Realized volatility has been demonstrated to provide an accurate estimate of the underlying process that defines volatility, (Poon and Granger 2003). Furthermore, Patton (2011) argues that realized volatility is one of the less noisy proxies and leads to less distortion. Thus, the proxy for volatility, that will be compared to the forecasted volatility of the GARCH models, will in this thesis be the realized volatility. The data on realized volatility is collected from Oxford-Man Institute’s Realized library (Heber et al., 2009) and constructed from intraday high-frequency data on a five-minute sub-sample, for each index collected. 4.2.4 Forecast evaluation In contradiction to the vast volatility forecasting literature, the literature on the evaluation of the forecast is limited. Furthermore, there is no clear evidence for which forecast evaluation measure that should be used or which measure that is preferred (Reschenhofer, Mangat and Stark 2020) In this paper, four different forecast evaluation measures are implemented to evaluate the best forecast model. These methods measure the error of the forecast, that is the difference between the realized volatility and the volatility forecast from the GARCH models. Three of these evaluation methods are outlined in Poon and Granger (2003) and are commonly used in the literature. These measures are the Mean Square Error (MSE), Mean Absolute Percentile Error (MAPE) and Mean Absolute Error (MAE). Another widely used evaluation measure is the asymmetric QLIKE loss function which also will be used in this paper. 17
The asymmetric QLIKE (Quasi-Likelihood) loss function allows for an asymmetric loss when evaluating the volatility forecasting performance. This implies that the QLIKE assigns more weight to volatility underestimation and thus penalizes estimations of the forecast that are below the realized volatility more severely, (Patton, 2011). This aspect is especially important for risk managers and companies selling financial instruments since underprediction of the volatility of assets could be costly. Furthermore, according to Patton (2011), the QLIKE loss function offers a consistent model ranking even in the presence of a noisy volatility proxy. This implies that the volatility proxy under the QLIKE measure offers the same ranking as if the true volatility process would be used. The MSE is a quadratic loss function that measures the average of the squared differences between the proxy and the volatility forecast. The MSE is a symmetric loss function, which means that it assigns equal weight to over- and under-estimations of the volatility forecast and thus penalizes over and under-estimations equally. The MSE measure is therefore more sensitive to over-predictions than the asymmetric QLIKE loss function which penalizes over- predictions very little. As noted by (Hansen and Lunde, 2005), the MSE measure has been proven to be appropriate when there are large differences between the forecast of the volatility and the proxy. Similar to the QLIKE, the MSE measure also provides a consistent model ranking in the presence of a noisy volatility proxy (Patton 2011). The MAE measures the average of the absolute difference between the realized volatility and the estimated forecast. The measure is more robust to outliers than the other methods, (Hyndman and Koehler 2006). The MAE measure is also, like the MSE, a symmetric loss function and it therefore penalizes over- and under-predictions equally. Building on the MAE, the MAPE measures the average percentage error of the forecast and is often used when comparing forecast performance between indices and assets since the measure is unit-free. However, MAPE can take extreme values when the realized volatility is close to zero considering that it is scale sensitive, (Hyndman and Athanasopoulos 2021). Furthermore, in contradiction to the asymmetric OLKE measure, the MAPE measure penalizes positive errors heavier than negative errors. This is caused by the percentage error that cannot exceed 100% for under-prediction of the forecast while it is not restricted at 100% for over-predictions, (Hyndman and Koehler 2006). The specifications of the forecast measurements are defined as: 18
−1 = ∑( − ̂ )2 (11) =1 −1 ( 2 − ̂ 2 ) = 100 ∑| | (12) 2 =1 −1 = ∑⌈ − ̂ ⌉ (13) =1 = 2 − ( 2 ) − 1 (14) ̂ ̂ Where is the realized volatility and ̂ is the estimated forecasted volatility at time and is the number of forecasts. These four methods evaluate the model performance of the volatility forecast and the lowest value of the measurements indicates the best model. 19
5 Results The out-of-sample forecast for all GARCH models and for each index are presented in Appendix B1 and are plotted against the realized volatility for each index respectively. The out-of-sample forecast for OMXSPI is also presented in Figure 2. Notably for all models is that the volatility forecasts consistently underpredict the realized volatility. Furthermore, the GARCH models are unable to capture the sudden spikes in the realized volatility and seem to generally have issues in predicting the proxy during periods with high fluctuations. Also notable is that the forecast for all models appears to react slower during periods with fluctuations. Overall, our predictions of the volatility follow the trend of the proxy quite well and are able to capture the underlying process of the volatility. Since all of the models out-of-sample forecasts follow the trend of the realized volatility rather well it is difficult to graphically draw any clear conclusions of which model that might be the preferred one. The forecast for all models tends to overlap each other and the difference between the models is fairly small. However, by studying the graphs (in Appendix B1), it appears that the asymmetric models, on average, predict the volatility lower than the symmetric models. Furthermore, the asymmetric models also seem to fluctuate slightly more than the symmetric models. 20
Figure 2. Out-of-sample forecast for OMXSPI OMXSPI 0.05 0.04 0.03 0.02 0.01 0.00 3/31/20 5/31/20 7/31/20 9/30/20 11/30/20 1/31/21 RV GARCH IGARCH EGARCH TGARCH Note: The gray line represents the realized volatility (RV). The y-axis is the volatility, and the x-axis is the time period To further evaluate the performance of the GARCH models the estimated forecast is compared to the realized volatility by incorporating four different forecast evaluation measures. The evaluation measures for all GARCH models and indices are presented in Table 5-8. The lowest value of the evaluation measures indicates that the forecasted volatility is, on average, closest to the proxy and thus the preferred model. Summary of the total number of times each GARCH model was preferred according to the evaluation measures is presented in Table 4. The results from the evaluation methods demonstrate that the IGARCH model was, on average, closes to the realized volatility 27 out of 40 times across all indices, followed by the TGARCH model which had the smallest loss value compared to the realized volatility 8 out of 40 times. The result in table 4 also shows that the symmetric models had the smallest errors in 29 out of 40 times. Indicating that they on average generate the best predictions of the volatility and follow the realized volatility better over the period compared to the asymmetric models. 21
By studying each evaluation measure individually (see Table 5-8) we can observe that for MAPE the TGARCH model was the preferred model 8 out of 10 times. This also corresponds to the total number of times the TGARCH was preferred when all of the evaluation measures are considered. Also noticeable regarding the MAPE measure is that it consistently ranks the EGARCH model second best when the TGARCH was the preferred model, indicating that with regards to MAPE the asymmetric models tend to perform better than the symmetric ones. Moreover, when evaluating QLIKE, MSE and MAE it is clear that the symmetric models are preferred and especially the IGARCH model, which was on average closes to the realized volatility 27 out of 30 times, followed closely by the GARCH model. The MSE measure suggests that the IGARCH model was preferred for every index studied. Furthermore, there is no evidence for that a specific GARCH model is favorable for a certain index. The forecast evaluation measures also illustrate what we graphically demonstrated. Firstly, that the difference between the GARCH models forecasts is generally very small. Secondly, the evaluated loss values are fairly small for all models, indicating that the forecasted volatility follows the trend of the realized volatility quite well. Table 4. Summary of evaluation measures Models Total times preferred GARCH 2 IGARCH 27 EGARCH 3 TGARCH 8 Note: Total times preferred indicates the number of times the model had the lowest value of the evaluation measures and thus, on average, closest to the realized volatility. 22
Table 5. Forecast evaluation measure Asymmetric QLIKE loss function Model DJI IXIC N255 NSEI OMXSPI OSAEX RUT SPX SSEC STOXX50 Total GARCH 0,5217 0,6063 0,2763 0,3807 0,1600 0,7269 0,5237 0,4777 0,3610 0,7383 0 IGARCH 0,5194 0,5805 0,2705 0,3646 0,1562 0,7007 0,4949 0,4731 0,3587 0,7138 9 EGARCH 0,6182 0,7774 0,3096 0,4357 0,1693 0,7737 0,6403 0,5649 0,3511 0,8857 1 TGARCH 0,6503 0,8506 0,3110 0,4483 0,1759 0,7813 0,6480 0,6431 0,3544 0,8633 0 Gray indicates the smallest error of the forecast compared to the realized volatility. The top row denotes each index and total denotes the number of times each model had the lowest evaluated loss estimate. Table 6. Forecast evaluation measure Mean Square Error (MSE) Model DJI IXIC N255 NSEI OMXSPI OSAEX RUT SPX SSEC STOXX50 Total GARCH 2,16E-05 2,73E-05 4,83E-06 2,41E-05 4,86E-07 7,14E-05 5,32E-05 6,28E-06 2,98E-06 6,88E-05 0 IGARCH 2,16E-05 2,72E-05 4,79E-06 2,40E-05 4,76E-07 7,13E-05 5,29E-05 6,27E-06 2,98E-06 6,87E-05 10 EGARCH 2,21E-05 2,75E-05 4,87E-06 2,42E-05 5,06E-07 7,16E-05 5,38E-05 6,29E-06 2,98E-06 6,92E-05 0 TGARCH 2,17E-05 2,75E-05 4,82E-06 2,41E-05 4,97E-07 7,15E-05 5,37E-05 6,37E-06 2,98E-06 6,90E-05 0 Gray indicates the smallest error of the forecast compared to the realized volatility. The top row denotes each index and total denotes the number of times each model had the lowest evaluated loss estimate. Table 7. Forecast evaluation measure Mean Absolute Percentile Error (MAPE) Model DJI IXIC N255 NSEI OMXSPI OSAEX RUT SPX SSEC STOXX50 Total GARCH 80 83 231 94 87 78 85 117 193 123 1 IGARCH 80 85 238 96 91 79 90 118 195 124 0 EGARCH 75 76 223 83 66 70 68 104 209 111 1 TGARCH 74 75 221 80 65 69 67 98 207 112 8 Gray indicates the smallest error of the forecast compared to the realized volatility. The top row denotes each index and total denotes the number of times each model had the lowest evaluated loss estimate. Table 8. Forecast evaluation measure Mean Absolute Error (MAE) Model DJI IXIC N255 NSEI OMXSPI OSAEX RUT SPX SSEC STOXX50E Total GARCH 0,0164 0,0191 0,0090 0,0132 0,0061 0,0200 0,0201 0,0140 0,0099 0,0232 1 IGARCH 0,0164 0,0188 0,0090 0,0130 0,0061 0,0198 0,0199 0,0140 0,0099 0,0230 8 EGARCH 0,0172 0,0201 0,0091 0,0137 0,0061 0,0202 0,0209 0,0141 0,0099 0,0242 1 TGARCH 0,0171 0,0204 0,0091 0,0137 0,0062 0,0202 0,0209 0,0145 0,0099 0,0241 0 Gray indicates the smallest error of the forecast compared to the realized volatility. The top row denotes each index and total denotes the number of times each model had the lowest evaluated loss estimate. To briefly summarize the results, all of the GARCH models tend to follow the trend of the realized volatility quite well, with the exception of sudden spikes in the proxy and during times with high fluctuations. The symmetric models are preferred and especially the IGARCH model according to MSE, QLIKE and MAE. With regards to the MAPE the asymmetric models have the smallest evaluated loss, where the TGARCH model performs best. 23
6 Discussion The calculated errors from the forecast generally displayed a small evaluated loss when compared to the realized volatility. Firstly, this implies that the GARCH models applied in this paper, on average, predict the volatility rather well. Secondly, since we have used a method that involves one day ahead forecast with a recursive expanding refit, it is not suppressing that we get small forecasting errors and that the difference in the estimated loss values between the models is quite small. Our results regarding three of the evaluation measures (MSE, MAE and QLIKE) are quite consistent and clear, that is, the symmetric models (especially the IGARCH model) outperform the asymmetric models. Looking at the literature, when accounting for the negative relationship between volatility and a shock in asset return by applying the asymmetric component in the GARCH, these models generally perform better than the symmetric models, (see e.g., Harisson and Moore 2011; Garbiel 2012; Evan and McMillan 2007). This somewhat opposes our main results. However, when analyzing the performance of different GARCH models, results can differ depending on various factors. Poon and Granger (2003) highlight that different time periods and evaluation measures have an impact on which model that performs best. The time interval studied in this thesis has captured some of the corona crisis, which had a significant impact on the financial market, leading to a period with a substantial increase in volatility and uncertainty in financial markets, (Zhanga, Hu and Ji 2020). Thus, when incorporating this aspect of the analysis, which might reflect why, in this specific time period, the symmetric models appear to outperform the asymmetric GARCH models. If we can establish that the out-of-sample time period in this paper is a period with high fluctuations, as described in Zhanga, Hu and Ji (2020), the findings in the literature regarding the asymmetric GARCH models dominancy changes. The literature on GARCH model’s performance during periods with increased volatility suggests that, in general, the symmetric GARCH models perform better. Analyzing this in perspective to our result the MSE, MAE and QLIKE measures follow these results, which favors the symmetric GARCH models. However, the MAPE evaluation measure, which generally favors the asymmetric models, opposes the literature. 24
By taking a closer look at each of the forecasting evaluation measures in this paper, it can provide us with a clearer understanding of the results. The two symmetric loss functions, which penalize over- and under-estimations of the forecast equally, suggest that the best model is the IGARCH model and consistently favors the two symmetric models. This indicates that, with no consideration taken to weigh the forecasting errors differently, the symmetric GARCH models are superior and follow, on average, the realized volatility better. Moreover, when more weight is applied to volatility underestimation, as with the QLIKE loss function, the symmetric models are still superior and specifically the IGARCH. Looking at the Graphs in appends B1, it appears that the symmetric models are, on average, above the asymmetric models and the realized volatility is above all model types. This indicates that the QLIKE loss function, on average, penalizes the underestimations of the asymmetric GARCH models heavier. Thus, when accounting for that the estimated losses are asymmetric, the superior set of models are the IGARCH and GARSH. The only evaluation measure that systematically favors the asymmetric GARCH model is the MAPE. Arguably, a potential cause is the scale sensitivity in the MAPE measure, which implies that the errors of the forecast can take on extreme values when the realized volatility is close to zero. By studying the Graphs in Appendix B1, it is clear that during several periods the realized volatility is very low with values below 0,02, which leads to a very high estimated loss in the forecasted volatility. This is a possible cause why the MAPE measure penalizes the symmetric models very severely during periods with low realized volatility and thus favoring the asymmetric GARCH models. Furthermore, Hyndman and Koehler (2006) suggest that the MAPE measure is not fully appropriate with data that are close to zero. Thus, regarding the scale sensitivity in the MAPE measure and the suggestion in Hyndman and Koehler (2006), the MAPE results should be taken with caution. By evaluating the features of the IGARCH model we can get a better understanding of why it generally generates the best forecasts. The IGARCH model has a non-stationary variance process such that the impact of shocks on volatility is persistent. This implies that the volatility is not mean reverting, where convergence of the conditional variance forecast will not occur. These properties are arguably favorable during more volatile periods since it produces larger volatility forecasts, on average, compared to the asymmetric models. Furthermore, Poon and Granger (2003) also suggest that models that don’t have convergence to the conditional variance tend to provide a larger forecast. Thus, this evidence suggests that the leverage effects 25
in the asymmetric GARCH models tend to not be as significant during more volatile periods and that the symmetric models produce better volatility forecasts on average. 7 Conclusion This thesis has studied the GARCH-family models volatility forecasting performance, including two symmetric and two asymmetric GARCH models to examine which specific GARCH model that produces the best volatility forecasts. Predictions were based on the daily returns from ten different stock indices with the newest financial data available. The method of use includes one day ahead forecast with a recursive expanding refit. The estimated forecast for each index and GARCH model was compared to the realized volatility, which acts as a measure for the true underlying volatility process. The estimated loss value between the realized volatility and the predicted forecast was evaluated by four different evaluation measures the Mean Square Error (MSE), Mean Absolute Percentile Error (MAPE), Mean Absolute Error (MAE) and the asymmetric QLIKE loss function. The evaluated forecasting performance suggests that the symmetric GARCH models and specifically the IGARCH model, on average, generated the most accurate forecast according to the MSE, MAE and QLIKE measures. This is consistent with previous research performed during periods with increased volatility. However, opposes the evidence that, during regular periods, asymmetric GARCH models generally perform better. Moreover, according to the MAPE measure, the asymmetric GARCH models forecasts had the smallest evaluated loss. This study has aimed to provide an accurate evaluation of the forecasting performance of GARCH-family models. By examining several GARCH-models, using a large number of stock market indices with the newest financial data available, several forecast evaluation measures and a volatility proxy proven to be superior, we believe that our results can contribute to the volatility forecasting literature. 26
Given the time restrictions during this thesis, only one specific time period was applied to investigate the performance of the model forecasts. It would be interesting to evaluate a large number of stock market indices over several time periods to give more depth to the analysis. Moreover, for future studies it would be interesting to focus on periods with increased volatility and investigate if the symmetric GARCH models are still best in predicting the volatility forecast. It would also be of value to research more about how to optimally evaluate the performance of the forecasts. One of the most important aspects of the forecasting exercise is arguably how to compare the forecasting performance between volatility models. As of now, the construction of volatility models and forecasts is the focus in the literature with little attention on the evaluation. There are several statistical measurements used to evaluate the error of the forecast compared to the realized volatility, which can give different results depending on its specifications. 27
8 References Alexander, C. 2001. Market models. A guide to financial data analysis. New Jersey: John Wiley & Sons. Alexander, C. 2008. Market risk analysis. Practical financial econometrics. New Jersey: John Wiley & Sons. Andersen, T., and Bollerslev, T. 1998. Answering the skeptics: Yes, standard volatility models do provide accurate forecasts. International Economic Review 39(4): 885-905. Awartani, B., and Corradi, V. 2005. Predicting the volatility of the S&P-500 stock index via GARCH models: the role of asymmetries. International Journal of Forecasting 21(1): 167- 183. Bollerslev, T. 1986. Generalized Autoregressive Conditional Heteroscedasticity. Journal of Econometrics 31(3): 307-327. Brooks, C. 2014. Introductory econometrics for finance. 3rd edition. Cambridge: Cambridge University Press. Engle, R. 1982. Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica 50(4): 987-1007. Evans, T., and McMillan, D. 2007. Volatility forecasts: The role of asymmetric and long- memory dynamics and regional evidence. Applied Financial Economics 17(17) 1421-1430. Gabriel, A. S. 2012 Evaluating the Forecasting Performance of GARCH Models. Evidence from Romania. Procedia - Social and Behavioral Sciences 62(1): 1006-1010. Glosten, L., Jagannathan, R., and Runkle, D. 1993. On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. The Journal of Finance 48(5): 1779–1801. 28
You can also read