THEREALDONALDTRUMP'S TWEETS CORRELATION WITH STOCK MARKET VOLATILITY - ISAK OLOFSSON - DIVA

Page created by Billy Bishop
 
CONTINUE READING
THEREALDONALDTRUMP'S TWEETS CORRELATION WITH STOCK MARKET VOLATILITY - ISAK OLOFSSON - DIVA
EXAMENSARBETE INOM TEKNIK,
GRUNDNIVÅ, 15 HP
STOCKHOLM, SVERIGE 2020

@TheRealDonaldTrump’s tweets
correlation with stock market
volatility

ISAK OLOFSSON

KTH
SKOLAN FÖR TEKNIKVETENSKAP
THEREALDONALDTRUMP'S TWEETS CORRELATION WITH STOCK MARKET VOLATILITY - ISAK OLOFSSON - DIVA
THEREALDONALDTRUMP'S TWEETS CORRELATION WITH STOCK MARKET VOLATILITY - ISAK OLOFSSON - DIVA
@TheRealDonaldTrump’s tweets
correlation with stock market
volatility

Isak Olofsson

ROYAL

Degree Projects in Applied Mathematics and Industrial Economics (15 hp)
Degree Programme in Industrial Engineering and Management (300 hp)
KTH Royal Institute of Technology year 2020
Supervisor at KTH: Alessandro Mastrototaro
Examiner at KTH: Sigrid Källblad Nordin
THEREALDONALDTRUMP'S TWEETS CORRELATION WITH STOCK MARKET VOLATILITY - ISAK OLOFSSON - DIVA
TRITA-SCI-GRU 2020:116
MAT-K 2020:017

Royal Institute of Technology
School of Engineering Sciences
KTH SCI
SE-100 44 Stockholm, Sweden
URL: www.kth.se/sci
THEREALDONALDTRUMP'S TWEETS CORRELATION WITH STOCK MARKET VOLATILITY - ISAK OLOFSSON - DIVA
Abstract

The purpose of this study is to analyze if there is any tweet specific data posted by
Donald Trump that has a correlation with the volatility of the stock market. If any
details about the president Trump’s tweets show correlation with the volatility, the goal
is to find a subset of regressors with as high as possible predictability. The content of
tweets is used as the base for regressors.

The method which has been used is a multiple linear regression with tweet and volatil-
ity data ranging from 2010 until 2020. As a measure of volatility, the Cboe VIX has
been used, and the regressors in the model have focused on the content of tweets posted
by Trump using TF-IDF to evaluate the content of tweets.

The results from the study imply that the chosen regressors display a small significant
correlation of with an adjusted R2 = 0.4501 between Trumps tweets and the market
volatility. The findings Include 78 words with correlation to stock market volatility
when part of President Trump’s tweets. The stock market is a large and complex
system of many unknowns, which aggravate the process of simplifying and quantifying
data of only one source into a regression model with high predictability.

                                           2
THEREALDONALDTRUMP'S TWEETS CORRELATION WITH STOCK MARKET VOLATILITY - ISAK OLOFSSON - DIVA
Sammanfattning
Syftet med denna studie är att analysera om det finns några specifika egenskaper i
de tweets publicerade av Donald Trump som har en korrelation med volatiliteten på
aktiemarknaden. Om egenskaper kring president Trumps tweets visar ett samband
med volatiliteten är målet att hitta en delmängd av regressorer med för att beskriva
sambandet med så hög signifikans som möjligt. Innehållet i tweets har varit i fokus
använts som regressorer.

Metoden som har använts är en multipel linjär regression med tweet och volatilitetsdata
som sträcker sig från 2010 till 2020. Som ett mått på volatilitet har Cboe VIX använts,
och regressorerna i modellen har fokuserat på innehållet i tweets där TF-IDF har använts
för att transformera ord till numeriska värden.

Resultaten från studien visar att de valda regressorerna uppvisar en liten men sig-
nifikant korrelation med en justerad R 2 = 0,4501 mellan Trumps tweets och marknadens
volatilitet. Resultaten inkluderar 78 ord som de när en är en del av president Trumps
tweets visar en signifikant korrelation till volatiliteten på börsen. Börsen är ett stort och
komplext system av många okända, som försvårar processen att förenkla och kvantifiera
data från endast en källa till en regressionsmodell med hög förutsägbarhet.

                                               3
Contents
1 Introduction                                                                                             6
  1.1   Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       6
  1.2   Purpose and Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . .            6
  1.3   Earlier research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     7
        1.3.1   Volfefe Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      7
        1.3.2   Stock Price Expectations and Stock Trading          . . . . . . . . . . . . . . . . . .    8
        1.3.3   Twitter mood predicts the stock market . . . . . . . . . . . . . . . . . . . . .           8

2 Economical Theory of the Study                                                                          10
  2.1   The financial market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      10
        2.1.1   The efficient market hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . .       10
        2.1.2   The stock market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      10
        2.1.3   New’s impact on the financial market . . . . . . . . . . . . . . . . . . . . . .          11
        2.1.4   Volatility and Cboe VIX Index . . . . . . . . . . . . . . . . . . . . . . . . . .         12
  2.2   Twitter and Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        13

3 Mathematical Theory of the Study                                                                        14
  3.1   Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        14
        3.1.1   Assumptions of the linear regression model . . . . . . . . . . . . . . . . . . .          14
        3.1.2   Ordinary Least Squares estimation . . . . . . . . . . . . . . . . . . . . . . . .         15
        3.1.3   Indicator variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     16
        3.1.4   Residual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     16
  3.2   Model assessment and verification       . . . . . . . . . . . . . . . . . . . . . . . . . . . .   19
        3.2.1   Leveraged and Influential points . . . . . . . . . . . . . . . . . . . . . . . . .        19
        3.2.2   Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     20
        3.2.3   Methods for dealing with multicollinearity . . . . . . . . . . . . . . . . . . . .        20
        3.2.4   Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      21
        3.2.5   Mallows Cp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      22
  3.3   Quantitative Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      23
        3.3.1   Selection using TF-IDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        23
        3.3.2   Stemming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      24
  3.4   Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      24
        3.4.1   Box-Cox Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        24

4 Methodology                                                                                             26

                                                    4
4.1   Data Gathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      26
  4.2   General transformation of data points . . . . . . . . . . . . . . . . . . . . . . . . . .         27
        4.2.1   Transformation of Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . .      27
        4.2.2   Transformation of dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       29
  4.3   Initial models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    29
        4.3.1   Model 1 - statistics of tweets . . . . . . . . . . . . . . . . . . . . . . . . . . .      29
        4.3.2   Model 2 - Words from Volfefe Index . . . . . . . . . . . . . . . . . . . . . . .          29
  4.4   Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      29
        4.4.1   Data selection using TF-IDF . . . . . . . . . . . . . . . . . . . . . . . . . . .         30
        4.4.2   Variable selection using Forward Selection . . . . . . . . . . . . . . . . . . . .        31
        4.4.3   Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    32

5 Results                                                                                                 33
  5.1   Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    33
        5.1.1   Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    34
        5.1.2   Top regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      35
  5.2   Residual analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     36

6 Discussion                                                                                              41
  6.1   Analysis of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     41
  6.2   Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     42
  6.3   Conclusion    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   42
  6.4   Further studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     43

                                                    5
1     Introduction

1.1    Background
On June 16 in 2015, Donald Trump, a controversial and not unknown figure, at least in the United
States announced his intention to run for president of the United States of America as the republican
party’s candidate. The day before, Donald Trump’s account @realdonaldtrump had just under three
million followers on the social media platform Twitter, reading the twenty-three-thousand tweets he
had posted, not including retweets. By the start of 2020 his audience had reached sixty-eight million
accounts and his tweet legacy amounted to forty-one thousand tweets, again not counting retweets.
During this period Donald Trump has transitioned from being a famous person, predominantly
in the United States to becoming a household name worldwide. Soon ending his first period of
presidency and just putting his reelection campaign into gear with four more years in the White
House as his target, his twitter account continues to deliver daily tweets and replies. This form of
direct communication from one the world’s truly elite politicians is unprecedented.

Financial markets have always had a flavour of speculation to it as the cycle of boom, bust, rinse
repeat that keeps on iterating ever since the seventeenth century Tulip mania.[1] Motives of these
speculative moves are hard to formulate explicitly, someone able to do this would surely be able to
retire rather quickly. That political decisions play its part to some degree is something that most, if
not all, would agree on. Therefore, it is of interest to investigate the connection between president
Trump’s tweets about his work and the financial market, this will in this thesis be done using a
multiple regression analysis.

The response of Trump’s tweeting in the market will be measured with volatility. Volatility in the
financial market is of high interest as it is used in pricing derivatives and therefore play a great part
in how financial markets move in the short-to-medium time span. For this thesis the daily closing
price of the VIX index will be used.

1.2    Purpose and Problem Statement
The main purpose of this thesis is to investigate whether @realdonaldrump tweets does have an
impact on market volatility, and if so, how much and in what way? This connection will be
examined using a multiple linear regression model, with the characteristics of the tweets of a day
as the regressors and the day close for the VIX that day as response variable.

However, we must first understand that president Trump tweets multiple times almost every day,
and that very few of his tweets can be considered to influence market movements. Trumps tweets

                                                   6
vary considerably in content, which is a problem when performing the regression. Tweets concerning
trade and monetary policy, that are more likely to affect the market are prone to drown in the noise
of misspelled and self-praising tweets. Therefore, this thesis will first deal with a selection of tweets
in order to determine and sort out which tweets to include in the regression. Further, a connection
between the tweets of significance and the market volatility will be quantified using a multiple linear
regression.

This paper and its findings might be of interest to a large variety of people, including the gen-
eral public and international traders investing in markets worldwide. Although this analysis will
be based on the American markets historical evidence implies that there to a varying degree ex-
ists a correlation between the market returns around the world.[2] Specifically the findings given
the methods used to could be of interest for traders deploying quantitative trading models where
volatility could determine size, timing and risk in potential trades.

Traditional model assessment and verification techniques used with regression such as analysis of
residuals and multicollinearity will be utilized. Also, a regressor of whether Donald Trump was or
was not president at the time of publishing his tweet will be implemented in the model to investigate
if the impact of Trump’s tweets before and after his presidency vary.

An important demarcation to note is that the research is limited to the effect of Donald Trumps
tweets on the market volatility on the same day of the tweet. The ambitions are not that the
findings will be able to explain market volatility fully, rather to examine which parameters of
Donald Trump’s tweets correlate and possibly impact market volatility.

1.3     Earlier research
1.3.1    Volfefe Index

The interest for President Trumps controversial behavior in Social Media have been covered in
many different angles. In September of 2019, the bank JP.Morgan Chase created a index called the
Volfefe Index, based on President Trumps Tweets.[4]

The Volfefe index was created to predict movements in treasury bonds, and in order to do this
JP Morgan had to build an algorithm for assessing tweets. To do this, every tweet’s impact was
categorised as significant or non-significant. Significant tweets were those that were followed by move
of ±0.25 basis points in 10 year Treasury yields after 5 minutes of trading from the publication of
the tweet.

Those words are, in order of decreasing significance.

                                                   7
1. China                6. Dollars             11. President           16. Years
        2. Billion              7. Tariffs             12. Congressman         17. Farmers
        3. Products             8. Country             13. People              18. Going
        4. Democrats            9. Muller              14. Korea               19. Trade
        5. Great               10. Border              15. Party               20. Never

This research and its findings show that there, on a small time frame exists a correlation between
Donald Trump’s Tweets and the financial market. Performing this classification of tweets is outside
the scope of this project and will not be attempted. However, the 20 most influential words for
market moving tweet shared in the article by JP Morgan will be used.

1.3.2   Stock Price Expectations and Stock Trading

In a study from 2012 researchers from RAND published a paper for the National Bureau of Economic
Research investigating stock price expectations in relations to market events. The findings from the
paper suggests that on average, subjective expectations of stock market behavior depend on stock
price changes, meaning that past performance will influence the future expectations on a stock.
Moreover, stock trading responds to changes in expectations in a delayed manner i.e. that stock
operators execute trades now even if the change in expectations occurred several ago. Implying
that news impact the market also after the time of punishment by building subjective momentum
but that the initial reaction to an event is of importance. Further the paper also discusses and
concludes the vast complexity behind market reactions and summarizes that we still don’t fully
understand how expectation on events are translated into action. [5]

1.3.3   Twitter mood predicts the stock market

Behavioral economics tells us that sentiment profoundly can influence decision-making and in-
dividual behavior. In a paper form 2011 researchers from Indiana University and University of
Manchester investigate whether this can be applied to in a larger scale that is, can societies have
states of mood that affect collective decision making? In the paper the researchers use twitter as a
data base of sentiment in society. More specifically the paper investigates whether measurements of
collective mood states derived from large-scale Twitter feeds are correlated to the value of the Dow
Jones Industrial Average (DJIA). This is done using OpinionFinder and Google-Profile of Mood
States (GPOMS). The results from the study indicate that predictions of the DJIA significantly
can be improved by including some specific public mood dimensions. The model presented in the
paper is based Self-Organizing Fuzzy Neural Network and found an accuracy of 86.7% in predicting

                                                 8
the daily up and down changes in the closing values of the DJIA. Which compared to methods not
including the sentiment model reduces the Mean Average Percentage Error by over 6%.[6]

                                              9
Theoretical framework
2     Economical Theory of the Study
2.1     The financial market
2.1.1    The efficient market hypothesis

The efficient market hypothesis (EMH) made famous by Eugene Fama in the 1970s is theoretical
concept used to model financial markets, EMH puts certain demands on a market and the pricing
of securities listed on a market. A financial market is said to be efficient if the prices of securities
on the market fully reflect all available information at the time. By definition, the market is said to
be efficient to a set of information, if that information when raveled to all participants at the same
time would leave security prices unchanged. Having an efficient market with respect to some set of
information, φ, implies that there exists no opportunities of arbitrage and that it is impossible to
make economic profit trading on the already known information φ.[7]

2.1.2    The stock market

A stock market is a platform were buyers and sellers of stocks can meet and trade stocks. Each stock,
also known as a share, traded on a stock market represents a piece of ownership in the company
associated with the share. It is common to associate the term stock market to big companies listed
on big and well-known stock exchanges such as the New York Stock Exchange, NASDAQ or Dow
Jones. However, in 2020 there exists heaps of other exchanges that facilitate the same function,
but for other markets.

Historically stock markets were physical places were people met and came to an agreement on the
price and number of shares. This way of selling and buying stocks are today outdated and instead
almost all transactions occur via some form of digital platform. Different stocks trade on different
stock markets. This division is partly due to practicality, addressing the implications of different
time-zones and currencies around the world. However, there are also other factors that divide stocks
into different markets, e.g. market value. Some stock exchanges only trade publicly listed shares,
while other stock exchanges may also include securities that are privately traded. Exchanges are
not only for equities but may also list other securities for instance bonds or derivatives.[8]

The price change or movement of a stock is function of supply and demand. Depending on the
relative share of buyers and sellers the price is prone to fluctuate, and in a scenario where the
numbers of sellers exceed the number of buyers the price will decrease until the price becomes low

                                                  10
enough to encourage buyers and thereby increasing the demand. The current price per share is a
function of all the current stock owners view on the company and the company’s future potential.
The difficulty of a stock market is clearly to predict price movement, since present and historical
data usually does not suffice to make accurate future projections. When predicting future prices,
one must take every person’s sentiment of the company into calculation. Because of the complexity
of this estimation future share prices are often seen as unattainable.[9]

From theoretical and empirical studies it is evident that the stock market has played a significant
role within both the advanced economy and the emerging market.[10] The stock market sentiment is
in a way a reflection of the larger economical sentiment in society. Government controlling policy’s,
professional and recreational investors, companies and media all playing their own role the stock
market. All these institutions are ultimately controlled by humans which are known to not always
act rationally. The mood on a market participant at a given point in time is referred to as Market
psychology. Emotions, including, greed, fear, expectations, and circumstances are all factors that
can contribute to market psychology at any time. As early as 1936 periodic John Maynard Keynes
described how of these sentiments in society can trigger periods of “risk-on” and risk-off”. [11]
Something that conventional financial theory mainly EMH fails to explain the emotion involved
in investing, and how this contribute to irrational behavior. In other words, theories of market
psychology are in conflict with the belief that markets are rational when they in reality never fully
are. This aspect of market psychology further adds to the complexity of predicting individual stocks
and markets performance based on fundamental facts.[12]

2.1.3    New’s impact on the financial market

The stock market is driven by and relies on new information to be unveiled. As part of EMH the
expected news is priced into the price of a stock or an index, while unexpected news is not. Stock
price movement depends on the constant change in supply and demand, making this relationship
is highly sensitive to the news of the moment. That said the anticipation of an event might be
already price in the expected event even before it’s published. On the other hand unexpected news
disclaiming something new and not priced in must first be interpreted, making chasing the news a
tricky strategy for trading.[13]

Financial markets never rest and constantly react to new information, making isolating which event
resulted in which price movement even more difficult. Generally indicators of general economic
news are found to be better than firm specific news when predicting price changes on the stock
market.[14]

Nevertheless, in a study from early 2017 not long after Trump’s presidential inauguration, re-

                                                 11
searchers from Harvard and the university of Zurich published a paper trying to model asset price
responses to unexpected news in and around the election. In the morning of election day Donald
Trump was a fairly unlikely winner in the election, with betting services pricing bets with the
chance of Trump being elected to between 18-27 %. When Trump to a lot of people’s surprise
won the election, markets reacted quickly. In the paper the model for price Pn and return Rn are
modeled around the presidential election of 2016 but in theory the model can be applied to all
events with expectations on outcome. Given two outcomes X and Y with probabilities πX and πY
and respectively the current price before the event is given by

                                      Pn = πX Pn,X + πY Pn,Y

Where PX and PY are the expected price given outcome X or Y. The expected return of given
outcome X then becomes
                                             Pn,X − Pn
                                         Rn =
                                                 Pn
Clearly a straightforward model including expectations and the outcome of an event. Using this
model, the researchers found that the individual stock price reactions to the election reflect the
unexpected change in investor expectations on economic growth, taxes, and trade policy. More
specifically, the market reacted quickly to the expected consequences of the election for US growth
and tax policy while, it took the market longer to incorporate the consequences of shifts in trade
policy. By evaluating the impacts of different news under a ten-day period after the election the
researchers found that one-day response varied between about 30-80% of their ten-day response.
Implying that the stock market reacts differently to different events and headlines, sometimes the
implications of news are straightforward to interpret while other times the effects of a headline is
more cumbersome to asses.[15]

2.1.4    Volatility and Cboe VIX Index

Volatility in a stock market measures the frequency and magnitude of price changes, both for
movements up and down. This applies to all traded financial instruments during a certain period
of time. The more dramatic the price fluctuation in that instrument, the larger the volatility. The
volatility is defined and measured either using historical prices, called realized volatility or as a
measurement of implied volatility by the use of options prices.[16]

The VIX, which stands for Volatility Index, is an index introduced by Chicago Board Options Ex-
change in 1993 and tries to capture the 30-day implied volatility of the underlying equity index. The
VIX uses the later and is therefore a measurement expected future volatility.[17] Cboe continuously
updates the VIX index values during trading hours. The VIX measures the implicit volatility for

                                                 12
the index SP 500, which is one of the most common equity indices, that many consider to be one
of the best representations of the U.S. stock market. The VIX can be seen as a leading indicator
of investor attitudes and market volatility relating to the listed options upon which the index is
based.[18]

There is a phenomenon called volatility asymmetry which refers to the volatility being higher in
down markets than in up markets.[19] This means that volatility generally is low during longer
period of economic growth and, high during economic recessions. As a result, trading volatility
either through options or special derivatives can be used as a hedge against a downturn in the stock
market.

2.2    Twitter and Sentiment Analysis
Twitter was founded in 2006 and is one of the first social media platforms launched that still ex-
isting in 2020. The platform was launched in San Francisco, California and is now an international
microblogging and social networking service. On twitter all users can post and interact with short
statements or messages known as ”tweets”. The platform is also open for unregistered users with
the restriction being that unregistered users are limited to reading. Originally, tweets were re-
stricted to a maximum of 140 characters but in November 2017 this restriction increased to 280
characters. Twitter is accessible on both its website interface and on its mobile-device application
software.[20]

The growth of social media and social networking sites have been exponential in the past decade
for platforms such as Twitter and Facebook. This widespread phenomenon of social media raises
the possibility to track the preferences of citizens in an unprecedented manner. At the end of
2019 twitter averaged 152 million daily users.[21] The opinions and flow of information spreading
instantaneously on Twitter represents a valuable source of data that can be useful for a general
sentiment of topics.[22] This source of data comes with a complexity of analyzing emotions on social
media is due to non-standard linguistics, intensive use of slang, emojis and incorrect grammar.
Aspects that people have no problem understanding but nevertheless is troublesome for models to
interpret. Another concern is that the results of sentiment analysis on social medias such as twitter
assumes that the findings are representative for the entire population. [23] Something that might
not always be true since not everyone is connected to social media.

                                                 13
3     Mathematical Theory of the Study
This part will walk the reader through the more rigorous mathematical aspects of the study. Unless
otherwise stated the theory found in section 3 is extracted from Montgomery, D.C., Peck, E.A. and
Vining, G.G. (2012). [24]

3.1     Multiple Linear Regression
The hypothesis is that the the volatility on a daily basis can be explained by using a multiple linear
regression using the measurements supplied in the data set. As presented in the following linear
model for predicting the VIX index

                             y = β0 + β1 x1 + β2 x2 + β3 x3 + ... + βk xk + 

The interpretation of this formula is that xi is the measurement of a tewwt and the goal is to find
the corresponding coefficient, βi , to be inserted into the model in order to produce the best estimate
the VIX value, here represented by y. With n observations and k covariates the model in matrix
notations is described as follows

                                              y = Xβ + 

where

                                                                       
                                                                β0                
                    y1             1       x11   ···     x1k       β              ε1
                                                                    1
                   y2           1       x21   ···     x2k                      ε2 
                                                                               
                                                                   β2 ,
                                                                    
                y=
                   .. ,
                               X=
                                   ..      ..   ..       .. 
                                                             ,β =              ε=
                                                                                   .. .
                                                                                       
                  .             .         .      .      .       ..          .
                                                                    
                                                                   .
                    yn             1       xn1   ···     xnk                       εn
                                                                    βk

Here βi explains by how much the VIX is expected to change by every unit change in the measure-
ment, xi , and β0 is the intercept for the model.

3.1.1    Assumptions of the linear regression model

In the study of regression analysis major assumptions are stated. In order for the regression model to
be valid these assumptions must be proved to be true. Otherwise model inadequacies are inevitable.
The assumptions are

                                                    14
1. The relationship between the response variable y and the regressors x is approximately linear.

   2. Error term  has mean µ = 0 and constant variance = σ 2 .

   3. Errors are uncorrelated. i.e. Corr(i , j ) = 0,

   4. The observation of y is fixed in repeated samples. Meaning that resampling with the same
        independent variable values is possible.

   5. The number of observations, n is larger than the number of regressors k. Also there are no
        exact linear relationships between the xi ’s.

When evaluating these conditions residual analysis is a very useful method for diagnosing violations
of the basic regression assumptions, this will be further explained later on in the study.

3.1.2      Ordinary Least Squares estimation

The values of β will be estimated using linerar model lm() function in R which utilises the ordinary
least square method and minimizes the sum of squares of the residuals. This means that the
estimation of β is given by a solution to the normal equations where the residual is defined as

                                               e = y − Xβ

Minimizing the sum of suqares of the residuals (SSRes = e0 e) where the objective function S is
given by
                                             β̂ = arg min S(β)
                                                          β

can be written as
                                       n            p
                                       X            X              2              2
                              S(β) =         yi −         Xij βj       = y − Xβ
                                       i=1          j=1

This minimization problem has a unique solution, provided that the k columns of the matrix are
linearly independent

                                              (X0 X)β̂ = X0 y

Finally rewriting this we end up with the OLS estimate

                                             β̂ = (X0 X)−1 X0 y

After the estimates of β has been produced these have to be evaluated further to assure congruence
with the assumptions relating to theory on quality of results.

                                                      15
3.1.3    Indicator variables

Unlike other regressor variables that have a quantitative value, the indicator variables or dummy
variables are instead qualitative variables. Since they have no natural numeric value they will in
the regression model be represented via levels either 1 or 0 assigned to them. In this study the
indicator variable is Donald trumps occupation. This indicator variables is divided in to civilian
(0) or president (1).

3.1.4    Residual Analysis

The key assumption which constitutes as the backbone of the whole project is that between VIX
and the regressors there are at least a reasonable linear relationship. By examining the produced
residual through various standardised tests and measurement, there is a higher chance of detecting
model in-adequacy.

Normal residuals

Normal residuals are defined as the difference between the observed value yi and the fitted value of
the model ŷi .

                                            ei = yi − ŷi

The residual is interpreted as the deviation between the model and the actual data, making plotting
the residual an effective method for quickly detecting violation of model assumptions. In the best
of worlds where the model is effective the sum of all residuals should be zero and their distribution
be of the Gaussian type. In the case where this is not the case something with the model is flawed
and examining the residual will give important clues to what is wrong.

The residuals have zero mean, E(e) and their approximate average variance can be estimated
using the residual sum of squares, which has n − k degrees of freedom associated with it since k
parameters are estimated in the regression model. An estimation of the variance residuals is given
by the residual mean square M SRes

                                 Pn            2
                                   i=1 (ŷi − ȳ)   SSres
                                                  =       = M Sres
                                      n−k           n−k

                                                   16
Scaled Residuals

Scaled residuals are obtained by transforming the normal residuals. The purpose of scaling is
to make residuals comparable with both each other and residuals from other models. These new
residuals can offer further clues whether something is wrong with the model and how it could benefit
from modifications. In order to understand the notations in the transformation of residuals in the
following part we will now introduce some concepts.

The total sum of squares SST is partitioned into a sum of squares due to regression, SSR , and a
residual sum of squares, SSRes .

                                       SST = SSR + SSRes

                                                 Pn              2            Pn             2
Where the terms are defines as follows, SST =       i=1   (yi − ȳ) , SSR =    i=1   (ŷi − ȳ) and
        Pn                Pn
SSRes = i=1 (yi − yˆi )2 = i=1 (ei )2 = e0 e.

Standardized Residuals

By normalising the residuals so that they can be estimated by a Gaussian distribution with mean
equal to zero and variance of approximately one unit. This modified residual makes it easier to
analyse as the residual can be compared with other standardised residuals. As a rule of thumb a
value of di > 3 is an indication of a possible outlier. The estimation is given by
                                                     ei
                                           di = √
                                                    M SRes

Where M SRes is an unbiased estimator of σ 2 .

Studentized Residuals

Studentized residuals builds on the standardized residuals that were obtained by nomalizing using
an estimate of variance with M SRes . To calculate studentized residuals the scaling is based of the
exact standard deviation of every i :th observation. This is calculated by dividing ei with the exact
standard deviation of the given observation i.

Writing the residual by use of the hat matrix H = X(X 0 X)−1 X 0 gives

                                           e = (I − H))y

which through substitution in
                                            y = Xβ + 

                                                 17
gives the following
                                               e = (I − H)

Showing that the they are the same transformations of y as of . Variance of the error,  is given
by Var() = σ 2 I and since I − H is symmetric and idempotent the residuals covariance matrix
is
                     Var(e) = Var[(I − H)] = (I − H)Var()(I − H)0 = σ 2 (I − H)

With the variance of each residual given by the covariance matrix according to

                                           Var(ei ) = σ 2 (1 − hii )

Where hii is an element in the hat-matrix H.

Using the found variance of ei we have that the studentized residual is calculated by

                                                           ei
                                          ri = p
                                                   M SRes (1 − hii )

The takeaway from the above formulas is that in general a xi closer to the center has a larger
variation and thus model assumptions are more probable to be challenged further out towards the
edges and that if everything about the model is sound the residual will have the variance = 1. It
could also be of use to know that as n goes to infinity, studentized residuals usually converge with
standardised residuals. As in most cases with residuals, one lonely point far away from the rest
may be influential on the whole fit. These points should be further analysed.

R-Studentized Residuals

When constructing R-Studentized residuals the variance is estimated by calculating Si2 , where i is
the an observation removed from the estimation. This is done to examine how single datapoints
influence the results, much as later described in the PRESS Residuals-section. The formula for
calculating Si2 is
                                          (n − p)M SRes − e2i /(1 − hii )
                                  Si2 =
                                                   n−k−1
This estimate of σ 2 is then used to calculate the R-student according to:
                                                  ei
                                     ti = p    2
                                                                , i = 1, ..., n
                                              Si (1 −   hii )

An observation i that gives a R-studentized residual wich differs greatly from the result obtained
from estimating σ 2 using M SRes indicates that the observation, i, is an influential point.

                                                      18
PRESS Residuals

PRESS, Prediction Error Sum of Squares, is another method to examine the influence of specified
observation, i, in the set. It is produced by calculating the error sum of squares from for every
observation except i. The PRESS residual is defined as

                                                 ei
                                    e(i) = p                , i = 1, ..., n
                                               (1 − hii )

Where hii , that is the elements of the hat matrix H is large so will the PRESS residuals also be.
If this sum greatly differs from the value obtained from the whole set and the sums obtained from
excluding the other observations one-by-one the isolated point, i, has an disproportional effect on
the regression and may skew the model. This means that a point that stands out in the PRESS
diagram is a point where the model fits well but an model excluding this point will have poor results
when predicting.

3.2     Model assessment and verification
3.2.1    Leveraged and Influential points

There are many forms of outliers that can be identified, and in this section we will be looking at
leveraged points and influential points. A point of high leverage is an observation with an unusual
high x-value. If this point’s y-value is in line with the rest of the regression it won’t affect the fit
of the model too much. However, if this point also has an deviating y-value, the point becomes
an influential point. Influential points have a large effect on the model, since it pulls the entire
regression towards it. Concluding, not all leverage points are influential for the fit.

When identifying these points the Hat-matrix H = X(X 0 X)−1 X is crucial. Each element of
the hat matrix hii tells us the leverage of yi and regressors xii on the optimal fitted value ŷ i . As
a general rule a point is said to be a leveraged point if the diagonal in the Hat-matrix for that
observation exceeds double the average, 2p ≯ n.

Cook’s Distance

In order to find these points of interest, a useful diagnostics tool is Cook’s Distance. Cook’s Distance
takes into account both the x-value for the observation as well as the response variable by taking
the least square value from the observation to the fit. Cook’s distance for the i :th observation is
calculated by deleting that observation and looking at the change in the model that results from

                                                      19
doing so. Cook’s distance for the observation i removed can be calculated as below, where n is the
number of observations.

                                   ri2 V ar(ŷ i )  r2 hii
                            Di =                   = i         , i = 1, 2, ..., n
                                   k V ar(ei )       k 1 − hii

Where ri it the i :th standardized reidual hii i a diagonal element of the hat matrix H. A rule of
thumb points with Di >1 is considered to be influential points.

3.2.2    Multicollinearity

Multicollinearity occurs if the regressors are almost perfectly linear. Having multicollinearity in the
data may cause different degrees of interference in the model, symptoms range from inaccuracy of
the estimation to the model being straight out misleading and wrong. Understanding the data-set
and the source of the multicollinearity is key in treating it.

3.2.3    Methods for dealing with multicollinearity

Variance Inflation Factor

Another method for detecting multicollinearity is to look at the variance inflation factor, or VIF
for short. The VIF is defined as follows,

                                                          2 −1
                                      V IFi = Cii = (1 − Rii )

Where C = (X 0 X)−1 , R2 denotes the coefficient of determination and each observations is Cii =
      −1
(1 − Rii ).

If xi is nearly linearly dependent to some subset of regressors, Cii becomes very large. A V IFi
value of 10 indicate multicollinearity which can result in poor estimations of β.

Eigenvalue Analysis

One of the most common analysis for detecting multicollinearity is to look at the eigenvalues of
X’X in our system. The easiest way to determine whether there is multicollinearity is to look at
the condition number of X’X, defined as

                                                     λmax
                                                k=
                                                     λmin

                                                     20
The common rule of thumb is that

    • k = 1 implies perfectly orthogonal regressors and no multicollinearity

    • k < 100 implies week multicollinearity

    • 100 < k < 1000 relates to moderate to strong multicollinearity

    • k > 1000 is sign of severe multicollinearity.

3.2.4    Variable Selection

A contradiction that occurs in any regression model is the problem with the number of variables.
Firstly, it would be preferred to include as many regressors as possible for the purpose of having
the largest scope of information. On the other hand, too many regressors inflate variance which
will have a negative impact on the overall performance of the model.

All Possible Regression

This method fits all the possible regression equations involving one candidate regressor, two candi-
date regressors, and so on. The optimal regression model is then selected based on some criterion,
in our case Baysian Information Criteria, Mallows Cp and adjusted R2 . This technique is rather
computational heavy and is not suited for models with many regressors. A model with k candidate
regressors result in that there are 2 k total equations to be estimated and examined. For depending
on the number of regressors in model this technique may not be possible. For models under 30
regressors this method is acceptable using the computers of 2020. For models containing more
than 30 regressors variable selection can be achieved using either forward, backward or stepwise
elimination.

Forward Selection

Forward elimination starts with a blank model solely including the interception β0 . Then the model
adds one regressor at a time in order to to find the optimal subset of regressors. The first variable is
chosen based on the largest simple correlation with the response variable. When adding the second
regressor, the method again chooses the regressor with the largest correlation to the response
variable, after adjusting for the first variable. The regressors having the highest partial correlation
will produce the largest value of F statistic for testing the significance of the regression.

                                                  21
Backward Elimination

The inverse of forward selection. Starting the model with all the regressors being in the model.Then
the F statistic is evaluated for all regressors as if it was the last to enter the model. Then the model
simply removes the regressor with the smallest F statistic. Repeat.

Baysian Information Criterion

Bayesian information criterion, or BIC, is a criterion that balances the number of regressors to
the number of observations. The BIC is used for variable selection and places penalty on adding
regressors to the model. The BIC can be computed as

                                                           
                                                    SSRes
                                   BIC = n ln                   + k ln(n)
                                                      n

The variable k denotes the number of coefficients including the intercept and n is the number of
observations.

R2 and adjusted R2

Another way to evaluate the adequacy of the fitted model is to look at the R2 for the generated
models.

                                                      SSR
                                               R2 =
                                                      SST

Where SST is the total sum of squares and SSR is sum of squares due to regression. However R2
dose not take the number of variables in to consideration. it never decreases when new variable is
added to the model. This is why adjusted R2 is used as a criteria instead, this takes the number of
regressors in to consideration.

                                       2             SSR /(n − p)
                                      Radj =1−
                                                     SST /(n − 1)

The definition of adjusted R2 for a model with n observations and k regressors.

3.2.5     Mallows Cp

Mallow’s Cp presents a variance based criterion, defined as
                                               SSRes
                                        Cp =         − n + 2p,
                                                σ̂ 2

                                                    22
where σ̂ 2 is an estimator of the variance, e.g. M SRes . It can be shown that if the p-term is without
bias, the estimated value of the Cp equals p. When using the Cp criterion, it can be helpful to
visualize it in a plot of Cp as a function of p for each regression equation, this is exemplified in
figure 1. Models with little bias will have values of Cp that fall near the line Cp = p. While
regression equations with bias e.g. point B, are illustrated above this line. Generally, small values
of Cp are desirable. On the other hand small bias may be preferred for the sake of a simpler model,
in the case illustrated in figure 1 C can be preferred over A even tough in includes bias.

                                  Figure 1: A Cp plot example

3.3      Quantitative Selection
3.3.1     Selection using TF-IDF

TF-IDF is a quantitative measure reflecting the importance of a single word in a sentence or
collection of words. [25] TF-IDF is defined as the product of two terms, term frequency (TF) and
inverse document frequency (IDF).

The term frequency is used to measure how frequent a word is in the given document. TF treats
the problem of documents having different total word counts. To compensate for documents having
unequal lengths the TF takes the occurrence of the specific word and divides it with the total word
count.

                                  Occurrences of word X in document Y
                           TF =
                                       Word count in document Y

                                                  23
For the second part, the inverse document frequency attempts to distinguish relevant and non-
relevant terms by observing whether the term is common or rare across all documents. The IDF
assigns lower values to common words and assigns larger values for the words that are rare. This is
done by the logarithmically scaled inverse fraction of the documents that contain the word.

                                                                           
                                             Number of documents
                        IDF = log
                                        Documents that contain the term X

And as previously stated the TF-IDF is simply the frequency multiplied by the inverse document
frequency. Calculating the TF-IDF for all terms in a corpus will assign a numeric value of sig-
nificance to each word in each document. This value represents how important a specific word is
to the collection of documents. The higher the TF-IDF value, the greater the importance of the
word.

However, the method of TF-IDF are not without limitations. For instance, it does not retain the
semantic context of words in the initial text. Moreover TF-IDF is unaware of synonyms or even
plural form of words. [26] This can be handled trough the process of stemming.

3.3.2    Stemming

The process of stemming is in morphology and information retrieval of reducing words to their core
form. For instance the words consultant, consultants, consultancy, consulting are all reduced to
their stem-form, that is consult. The word do not need to be an inflection of a word, it is enough
that related words map to the same stem. [27] Even if the word in itself is not a valid root. All this
is accomplished through algorithms. The process is implanted in our every day life, for instance
many search engines treat word of the same stem as the same as synonyms as a way to expand the
query.[28]

3.4     Transformation
3.4.1    Box-Cox Transformation

Box-Cox is used in order to investigate whether the data set requires transformation to correct for
non-constant variance or non-normality. If the model needs transformation this can be done by
utilising the method presented by, and named after, Box and Cox. The method deploys the fact
that y λ can be used to adjust for non-normality or non-constant variance. Lambda is a constant
estimated by maximizing

                                                  24
1
                                     L(λ) = − n ln (SSRes (λ))
                                             2

Plotting L(λ) and drawing vertical lines marking the the horizontal lines L(λ̂) − 21 χ2α,1 on the y-
axis, two intersections are found. χα,1 is the upper α percentage point of the chi-square distribution
with one degree of freedom. Mening that for α = 0.05 the x-values of the vertical lines indicate the
border of a 95 per cent CI. If 1 is inside of this CI it implies that no transformation is needed. In
other cases the recommended transform is.

                                               λ
                                               yi − 1   if λ 6= 0,
                                      (λ)
                                     yi     =      λ
                                               ln yi     if λ = 0,
                                              

This is the one-parameter Box–Cox transformations that will be used to transform data. The
exact value of λ is within the 95 per cent CI confidence interval but not exactly known. The
transformation process becomes because of this one of trail and error and the method can be
repeated if one transformation was unsatisfying.

                                                  25
4     Methodology
In this section the iterative process of finding the final model is described. Before obtaining the
final model and evaluating it two initial models were tested and discarded. For the model building
the majority of the time was spent on sorting and selecting different aspects of the tweets. In order
to understand the selection and transformation of the data, the two data sets used are described
below.

4.1      Data Gathering
Two data sets that were used to carry out this analysis are firstly, the regressors containing all of
Trump’s tweets with a date- and time stamp, as well as the number of favourites and retweets.

The data of President Trumps Tweets were found as an open source csv-file at Kaggle.com.[29]
This data set contains all of Donald Trump’s tweets from his very first tweet the May 4th 2009 all
the way to January 20th 2020, summing to 41 060 unique tweets. This data do not include any
retweets.

                      Figure 2: Tweet attributes and their descriptions

The second data set needed is that of the volatility of the stock market. Here there are plenty of
options and a wealth of different data. The Cboe Volatility Index ($VIX) will be used. There are
two reasons, one the index measuring implicit volatility and has the advantage of having unweighted
data. This data is found directly on Cboe’s website.[30]

                                                 26
Figure 3: VIX attributes and their descriptions

In the following analysis a transformed value of VIX Close will be utilized.

4.2     General transformation of data points
4.2.1    Transformation of Volatility

When performing the regression analysis it is key that the response variable has a normally dis-
tribution. Below is a graph showing the VIX index with a mean value of 18.34, a maximum and
minimum value of 82.69 and 9.14 respectively during the last 10 years.

                              Figure 4: VIX index historical prices

As can be seen in figure 5 the distribution of the VIX price clearly is not normally distributed. This
can be seen in the histogram of VIX Close prices below.

                                                 27
Figure 5: VIX index histogram              Figure 6: Transformed VIX index histogram

Box Cox transformation was used to transform the VIX data. This resulted in a transformation
of.
                                   VIXtransformed = log(VIX)−2

Observing figure 6, the new histogram of transformed data understandably normalised the VIX
closing prices. The transformation is further strengthened by the Box-Cox intervals in figure 7,
were one can ovserve λ = 1 within the confidence interval of 95 per cent. Indicating that no further
transformation of the response variable is necessary.

                  Figure 7: BoxCox parameter λ with the 95 % CI shown.

                                                28
4.2.2    Transformation of dates

The data of tweets include both date and an exact time of when the tweet was published. Since the
VIX data set only has day resolution, this had to be taken in to consideration. The closing price
for the VIX data set is set at 16:00 eastern time, while all tweets sent after 16:00 are regarded as if
they belong to the next trading day. This is done with the argument that tweets sent after closing
hours simply are unable to impact the volatility the same day.

4.3     Initial models
4.3.1    Model 1 - statistics of tweets

The first thing that was tried was building a model with only the quantitative data from given by
the tweets, disregarding the content of the tweets. In order to achieve this the statistics from each
day was summed. The regressors for this model are.

    • Number of retweets for tweets posted that day

    • Number of favorites for tweets posted that day

    • Length of tweet in terms of characters

    • Number of tweets that day

This model presented an adjusted R2 of 0.024. Clearly using the statistics of the tweets will not
tell us much about volatility.

4.3.2    Model 2 - Words from Volfefe Index

The second model constructed took the words of importance stated in the Volfefe study in to
consideration. This model simply sorted out all the tweets that did not contain any of the 20 key
words given in the Volfefe study by JP.Morgan. This model did not perform well with an adjusted
R2 of 0.072. The main problem identified with this approach was that with only 20 words to many
days when none of these words where tweeted gave no input to the model. From this attempt the
key takeaway was that as many days as possible need to be considered, and tha using only 20 words
left to many days and tweets out of the model.

4.4     Regression Model
Learning form the first two attempts of models the final model focuses on selected words mentioned
per day in the tweets of Donald Trump.

                                                  29
4.4.1    Data selection using TF-IDF

The first step was to gather all words tweeted per day in a bag of words for this day. This bag of
words will act as a representation of that day. In this step of the process we perform the first step
in clearing the data by removing puncts, hyphens and other symbols. This operation is justified
by arguing that symbols by them self are without meaning, unless there is context. By the same
logic all stop words are removed. Stop words are words, that just like symbols are meaningless
without context. Examples of stop words are, whom, this, that, these, am, is, in total 175 words
are considered and stripped using the package quanteda in R.[31]

By creating a matrix with days represented as rows and each word represented as a column, we have
a matrix of 3 116 days by 38 516 words. Were each word is represented by an integer indicating
how many times that word appeared in his tweets on that day. With the words set out to act as
regressors later it is easy to understand that the number of words further need to be reduced.

Secondly the process of stemming is applied. After removing stop words and applying stemming
to all words remaining the number of unique words was reduced to 31 608. Stemming also, more
importantly gathers information of the same kind. For instance, the word China will be reopened
not only by occurrences of China but also by Chinese, China’s, etc. The thought behind this is that
the words of the same stem essentially have the same meaning and refers to the same thing.

The next step is to calculate the TF-IDF of the matrix. This do not change the dimensions of the
matrix. After this calculation each word each day is represented by a a number. This number is
the TF-IDF which if the word is tweeted during the day is a decimal numeral and 0 if the word
in question don’t appear during the day. This TF-IDF number can be interpreted as, how many
times Trump tweeted that word on that day compared to the other days. In figure 6 below a small
sample of the TF-IDF data table.

                                                 30
Figure 8: TF-IDF data table

Thirdly, and removing the bulk of words the appearance of words is considered. The words that do
not appear more than 25 times are stripped from the data. Arguing that it’s hard to know what
to make of these words since they appear so infrequently. Weekends when markets are closed are
also removed. This results in regressors now having the dimensions of 2 502 days by 1 706 words,
with each word being a regressor. The regressor value is represented by the words TF-IDF value
for that day.

Finally, a last regression variable was added and another one was made binary. This regressor
represents whether Donald Trump is president of not at the date of his tweet. This was represented
as a factor of two levels, 0 = civilian and 1 = president. Moreover, the regressor ’pic.twitter.com’
which represents if there was a picture attached to tweet was transformed to be binary, either one
or zero.

4.4.2      Variable selection using Forward Selection

In order to further reduce the number of regressors forward selection is used. This is done to avoid
over fitting and remove the features that don’t contribute to the performance of the model. Forward
selection was chosen over the all possible regression method of regressors, which in the case of 1707
regressors is infeasible since the algorithm have to search over 21707 features combinations. Thus,
forward selection was used, the forward selection was then evaluated using Baysian Information
Criterion, Mallows Cp and adjusted R2 . Where we would like to minimize the BIC as well as the
Cp with regards to the number of regresses, and of course maximize the adjusted R2 .

                                                 31
Figure 9: BIC                Figure 10: Mallows Cp           Figure 11: adjusted R2

Evaluating with regards to BIC we find the optimal model to consist of 79 regressors. Mallows
Cp suggests that 392 regressors should be used in the model. Finally evaluating on adjusted R2
recommends 981 variables to be used.

These criterion’s of evaluation all recommend quite different models to be used. And there is no
easy correct answer to witch to use. Despite its popularity and being intuitive, the adjusted R2 is
not as well motivated in statistical theory as BIC, and Cp[32].

To decide which model to chose as the final one the model recommended both by BIC and Mallows
Cp was evaluated. This was done by performing the regression using the lm() command in R and
observing the p-values of individual regressors for the model. For the model of regressors selected
by BIC all p-values are close to or equal to zero.

For the model of regressors selected by Cp quite a few coefficients have large p-value which is
undesirable. More over, with many features we lose interpretability, while with less words we can
have more insights on them. Concluding, that even though a lower adj-R2 for the model selected by
Baysian Information Criterion this model will be used. The selected regressors for the final model
is stated in section 5.1 Findings.

4.4.3    Regression

At this stage a multiple linear regression was carried out using the transformed value of VIX
1/log(V IX)2 as response variable and the TF-IDF data table with the 79 words suggested by BIC
as regressors. The regression is carried out in R using the command lm().

                                                 32
5     Results
5.1   Findings
The Output from the final regression model is

                             Figure 12: Output from regression

                                                33
Figure 11 shows the full model of all regressors. We find the adjusted R2 = 0.4501 and the sum
of residuals to −1.553852e−18. Some of the regressor’s names will look a bit strange due to the
process of stemming, for instance the stem of leaving and leave are both included in the stem leav.
Here the regressors marked in grey ’president’ is a indicator variable and ’pic.twitter.com’ is binary.
Apart from these two exceptions all other regressors are represented by their TF-IDF value.

5.1.1       Interpretation

The β’s in the model are hard to interpret in their current state due to the transformation of
volatility. The reverse transform is given by

                                                r                    
                                                           1
                                   VIX = exp
                                                     VIXtransformed

This means that the intercept β0 which in our model has a value of 0.1334 transforms to 15.45,
which is a bit lower than the median VIX of 15.60. Another big part to this transformation is that
negative coefficients of βi contribute to a higher price of the VIX, not lower. A positive βi such as
that for ’oil’ contribute to a lower price of the VIX.

To calculate the expected VIX one would calculate the TF-IDF for the word during a day. Then
putting the values of these words into the model. For example, a day only counting one of President
Trump’s more known tweets

Why would Kim Jong-un insult me by calling me "old," when I would NEVER call him "short
and fat?" Oh well, I try so hard to be his friend - and maybe someday that will happen!

Would not contribute to the model since none of the words in the tweet is in the model. The model
would the output β0 = 0.1334, which transformed relate to a VIX of 15.45. If trump however were
to tweet,

Canada will now sell its oil to China because @BarackObama rejected Keystone.                      At least
China knows a good deal when they see it.

Were two key words are mentioned, oil and china. Calculating the TF-IDF values for these words
they are 0,092301 and 0,11255 respectively. Inserted in the model this gives

                VIXtransformed = β0 + βpresident + βoil TF-IDFoil + βchina TF-IDFchina

With the values put in our VIXtransformed becomes equal 0,07940 to which corresponds to an estimate
of the VIX of 34,7685. Comparing this to the median value of 15.6 for the VIX during the last 10

                                                  34
You can also read