Swedish finance Twitter accounts short term impact on Swedish small cap companies

Page created by Jon Tyler
 
CONTINUE READING
Swedish finance Twitter accounts short term impact on Swedish small cap companies
DEGREE PROJECT IN TECHNOLOGY,
FIRST CYCLE, 15 CREDITS
STOCKHOLM, SWEDEN 2021

Swedish finance
Twitter accounts
short term impact on
Swedish small cap
companies
John Janér and Noah Rahimzadagan

KTH ROYAL INSTITUTE OF TECHNOLOGY
ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
Abstract
Over the last five years, the amount of retail investors has increased immensely.
Trying to make informed decisions, many of the more active investors look
to social media as a source of information.      In early 2021, the eyes of the
world focused on retail investors as Gamestop, a video game retailing company,
experienced an immense price surge over the course of a few weeks on the
stock market. This event, among others, lead the SEC (Securities and Exchange
Commission) to open up a discussion about the impact of social media on the
stock market. It seemed individual social media accounts were able to increase the
volatility in a number of different stocks. This study investigates the immediate
impact of larger Swedish Twitter accounts on the volatility and price of Swedish
small­cap companies.     Sentiment analysis and data modeling in the Python
programming language were used in order to compare volatility and price changes
before and after tweets of different sentiments were made about the companies.
Our study was unable to find any correlation between an immediate change in
price or an immediate increase in volatility and tweets made, suggesting Swedish
finance Twitter accounts have little to no immediate impact on Swedish small­cap
companies.

Keywords

Human behavior, Financial markets, Sentiment analysis, Twitter

                                                                                 i
Sammanfattning
Under de senaste fem åren har antalet privata investerare ökat markant. När
privata investerare försöker göra välgrundade investeringsbeslut brukar de ofta
använda inlägg på sociala medier som ledstjärna. Tidigt på år 2021 vändes
blickarna mot privata investerare när priset på spelåterförsäljningsföretaget
Gamestops aktier ökat med flera hundratals procent under bara loppet av några
få veckor. Denna prisökning fick SEC (Securities and Exchange Commission i
USA) att inleda en diskussion om inverkan av sociala medier på aktiehandeln.
Mycket påvisade att individuella konton på sociala medier hade förmågan
att öka volatilitet av aktiepriser för vissa bolag. Det här forskningsprojektet
ämnar att undersöka den omedelbara inverkan av svenska twitterkonton på
pris och volatilitet av pris av svenska småföretags aktier.       Sentimentanalys
och datamodellering gjordes i programmeringsspråket Python för att jämföra
volatilitet och prisändringar innan och efter tweets av olika sentiment gjordes
om de olika företagen.     Studien lyckades inte visa på korrelation mellan en
omedelbar ändring i pris eller omedelbar ökning i volatilitet och gjorda tweets,
vilket tyder på att twitterkonton har inget eller väldigt lite inflytande på svenska
småföretag.

Nyckelord

Mänskligt beteende, Finansiella marknader , Sentimentanalys, Twitter

Authors
John Janér and Noah Rahimzadagan
Information and Communication Technology
KTH Royal Institute of Technology

Place for Project
Stockholm, Sweden

                                                                                  ii
Examiner
Pawel Herman
Stockholm, Sweden
KTH Royal Institute of Technology

Supervisor
Chris Peters
Stockholm, Sweden
KTH Royal Institute of Technology

                                    iii
Contents

1 Introduction                                                                            1
    1.1   Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . .       1
    1.2   Scope of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . .      2
    1.3   Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    2

2 Theoretical Background                                                                  4
    2.1   Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     4
    2.2 Market data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       4
    2.3 Twitter data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      5
    2.4 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . .           7
    2.5 Economic theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
    2.6 Previous research . . . . . . . . . . . . . . . . . . . . . . . . . . . .        11

3 Methods                                                                                13
    3.1   Determining eligible Twitter accounts . . . . . . . . . . . . . . . . . 14
    3.2 Scraping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
    3.3 Natural language processing of the outputted CSV file . . . . . . . . 14
    3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4   Result                                                                               17
    4.1   Positive sentiment tweets . . . . . . . . . . . . . . . . . . . . . . . .      17
    4.2 Negative sentiment tweets . . . . . . . . . . . . . . . . . . . . . . . 18
    4.3 Overall volatility for both sentiment tweets . . . . . . . . . . . . . . 19

5 Discussion                                                                             20
    5.1   RQ1 ­ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
    5.2 RQ2 ­ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
    5.3 RQ3 ­ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
    5.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
    5.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6 Conclusion                                                                             24

References                                                                               25

                                                                                         iv
1     Introduction
With the rise of social media, the ability to share feelings, opinions, and gossip
globally is no longer monopolized by traditional institutions such as newspapers
and TV networks. Most individuals are able to create an account at any given
social media platform and share thoughts and statements with the rest of the
world. On the 27th of January of 2021, Gamestop, a video game retailing company
experienced a price surge on the stock market. The price of one Gamestop stock
reached as much as 350 USD, an immense increase considering one share’s value
being only a small fraction of the price two weeks prior to this unusual increase
in price [15]. Price fluctuations in the stock market can rarely be explained
with absolute certainty, this also applies in the case of the Gamestop price surge
of January 2021. Although, many speculate that the increase of price can be
to some extent accredited to the cyber feud between institutional investors and
retail investors that brewed up on the social media platform Reddit before the
price surge, which drove many retail investors to buy shares in Gamestop. The
speculations went as far as the American Congress having Keith Gill, one of many
advocates for buying GME stocks on Reddit, brought in for a hearing in which he
had to testify and answer questions regarding events surrounding Gamestop on
the stock market [7]. Elon Musk, an entrepreneur and founder of Tesla, Space
X, and PayPal has been reported tweeting about companies prior to price surges
[18]. While the tweets cannot be entirely accounted for the increase in price, many
believe that the tweets had some significance.

1.1   Problem statement

Since there is clear evidence that social media, to some extent, influences investors
[16], the question of whether there is a correlation between information on social
media and stock prices is raised. Therefore the aim of this project is to determine
high profile Swedish finance Twitter account’s short­term impact on the stock
price of Swedish, public small­cap companies. Specifically:

    • RQ1 In what way are larger Swedish finance Twitter accounts able to move
      the price of a stock?

                                                                                   1
Looking at small­cap companies could finance Twitter accounts in general above
a certain follower threshold move the price of a stock in an immediate way.
Using sentiment analysis and data modeling in Python, can a correlation be found
between certain tweets and market movements?

   • RQ2 Are tweets from these finance accounts contributing to increased risk
      in small­cap companies?

Using the Parkinson volatility formula to evaluate volatility before and after the
tweets, do these tweets have an immediate impact on the volatility of the stock,
contributing to a riskier investment in the short term?

   • RQ3 Does the nature of tweet sentiment affect the outcome of the change?

The outcome of change is in other words described as the nature of the price
change, which means if a certain stock increases or decreases in price. An increase
or decrease in volatility are also two different outcomes of change. Studies have
suggested that positive information is perceived as more credible than negative
information [3]. Will positive sentiment tweets have a higher possibility to move
the stock price or increase the volatility?

1.2   Scope of the study

This research project will encompass sentiment analysis of the content of tweets
made by certain Swedish finance Twitter accounts in order to investigate a
correlation between the content of tweets and the state of the stock market. The
tweets that will be analyzed are scraped with a web scraper, a tool that fetches
data from websites. Tweets are chosen only if they happen to mention a certain
Swedish small­cap company. Tweets will be analyzed with natural language
processing and later their effect on the stock prices of the companies mentioned
will be investigated.

1.3   Thesis outline

In the following chapter, the theoretical background of the project will be
presented, in which theory about the subject sentiment analysis will be provided.
The third chapter will cover the methodology of this project, and go into detail

                                                                                 2
specifically on how the scraping of tweets was done and how the tweets are
evaluated. Chapter 4 will entail the project’s results, the findings, and namely the
data that was gathered. Chapter 5 will discuss the findings and reflect on previous
work. In chapter 6, a conclusion will be presented.

                                                                                  3
2     Theoretical Background
This section intends to describe the data sets used. Furthermore, the methods
applied in order to collect data sets will be described. Following directly below
will be definitions of terminology used throughout the paper. An introduction to
previous research related to the subject will also be given.

2.1    Definitions

2.1.1 Small Cap

A company with a market capitalization under one billion USD is considered
a ”small­cap” company [1]. Institutional investors are generally not allowed to
hold large stakes in small­cap companies making them more accessible for retail
investors.

2.1.2 Retail Investor

Retail investors are non­professional investors [9]. Also known as individual
investors.

2.1.3 Volatility

Volatility is a measurement for the range of possible returns of a security. Higher
volatility usually indicates a riskier security [4].

2.1.4 Volume of Trade

Volume of Trade or just ”volume” refers to the total quantity of shares being traded
of a specific security, in any given time frame. [23]

2.2    Market data

A data set was received from Nordic Growth Market, a Swedish stock exchange
housing companies with small market capitalization. The data was compiled
in a JSON file containing ticker, name, price, volume, and date for every trade
made on the exchange dating back five years. Price data is of importance since

                                                                                  4
this research project aims to investigate the correlation between tweets and their
content related to certain Swedish small­cap companies and how the stock prices
of said companies change in relation to these tweets.

2.2.1 Market data format

             Figure 2.1: Example JSON object from market data set.

2.3   Twitter data

The sentiment analysis is performed on tweets made by certain Swedish finance
accounts. Web scraping was used in this work in order to access the data in
the HTML document that Twitter consists of. The tweets, in other words, were
collected using web scraping tools and stored in a CSV file for processing.

2.3.1 Twitter

Twitter is an American social network and microblog platform enabling users to
post and interact with messages called ”tweets.” The platform has more than 180
million daily active users [21] and an average of 330 million active monthly users
in 2019 [22]. While registered users can like, comment, and retweet any user can
view and read tweets, making Twitter a powerful tool to spread information or
opinions.

Twitter is used extensively by politicians and other public figures because of its
accessibility [6]. Due to a large number of users, Twitter contains extensive

                                                                                5
amounts of noise. Filtering through this noise is, for a human, nearly impossible.
However, individuals with a large enough following on the platform can pierce
through the noise using their followers as vessels for likes and retweets. This
allows certain people to spread information, true or false, to large groups within
our society.

2.3.2 Document Object Model

Ever since the birth of websites, the most fundamental part of a website has been
the HTML document, which models how the different components of a website
should be arranged. Furthermore, the HTML document contains the data that is
shown on the website.

Document Object Model or DOM is a common way to define the logical structure
of an HTML document [12]. A DOM structure is easily accessed and manipulated
because of its forest­like structure. Furthermore, the DOM is modeled to work
with any programming language. The DOM is of importance in this research
project since it provides a way to access tweets that will undergo sentiment
analysis.

2.3.3 Web scraping

In order to obtain tweets for this research project, they had to be extracted from
Twitter.

Many websites such as Twitter have the majority of their data available for
everyone in the form of a feed, that can be accessed with any web browser.
Searching through this feed in order to gather data that could stretch over a
specific time frame is not only cumbersome but also time­consuming. Web
scraping is a method used to collect information that is usually on display for
human consumption on a website. Also known as a crawler, a web scraper
commonly navigates the underlying HTML document of a website in order to find
specific, often pre­decided, strings of texts across many different pages or profiles.
This data is then collected and compiled into any format desired by the user. One
type of web scraping is called DOM scraping. Since websites implement the DOM
model through the underlying HTML document, the data of a website, Twitter

                                                                                    6
included, will be in the form of a tree or forest­like structure. This in turn with the
help of any programming language can easily be accessed. When DOM scraping
was performed in this research project, the Python library Twint was used.

   Figure 2.2: Visualisation of DOM (Document Object Model) tree structure.

2.4    Natural Language Processing

Natural Language Processing is the process of interpreting human language with
the help of computers [20]. In this work’s case, the language that is interpreted
is the content of the web scraped tweets. When performing natural language
processing in other words, sentiment analysis is done. Determining a tweet’s
sentiment entails classifying its content, which is text, with a label, in this research
project, the two labels are either negative or positive. Furthermore, Natural
Language Processing consists of multiple steps that are thoroughly explained in
the following sections.

2.4.1 Tokenization

Computers are unable to interpret human language as is. The breaking down of
text is done during the tokenization process [25]. Breaking down a text string into
tokens as a first step when performing natural language processing is a common
practice since it is easier to train a computer to classify separate words than a

                                                                                      7
group of words. A text string is split into multiple tokens that are inserted in a list.
The following text string: ”Well yes. I would love a cinnamon bun!”

Would generate the tokens:

[well, yes, ., I, would, love, a, cinnamon, bun, !,.]

In the above example, the sentence is split on all white spaces and then turned
into a list of tokens. This kind of tokenization is what was used in this work. The
token arrays are later evaluated word for word by the classifier.

2.4.2 Normalization of tokens

The act of normalization in the context of natural language processing is to convert
multiple words that have the same meaning but different forms, into the same
form. For instance, the words ”sing”, ”sang” and ”sung” all have the same meaning
but come in different forms. It is time­consuming and unnecessary to train
a classifier the same words but in different forms, therefore, normalization is
a common practice in natural language processing. Furthermore, stop words,
words such as ”and” and punctuations are removed. Since stop words convey no
meaning and only serve to make reading texts easier for human readers, it is best
to remove these words before performing the sentiment analysis.

2.4.3 Naïve Bayes Message Coding

A common practice when classifying text is the Naïve Bayes Method. As the
name suggests the method is naïve in the sense that it classifies word streams
independent of what other words are in the same text stream. This is easily
demonstrated with an example.         The text stream ”Dear friend” and ”Friend
dear” are assigned the same classification score.         Naturally, there are more
sophisticated classifying methods, although, Naïve Bayes has been proven to
perform exceptionally well [17]. The name Bayes stems from the mathematician
Thomas Bayes and his mathematical formula, Bayes’ Theorem, a formula that
determines the probability of one event given a certain condition. Bayes’ Theorem
is depicted on the following page.

                                                                                      8
P (B | A)P (A)
                                P (A | B) =                   .                           (1)
                                                   P (B)

P (A | B) is the conditional probability of event A occurring with the condition of
B occurring.

P (B | A) is the conditional probability of event B occurring with the condition of
A occurring.

P (A) and P (B) are the probability of A or B occurring. Naïve Bayes is derived from
the above formula and more specifically, in this case, it is used for the purpose
of assigning a score for each word in a stream of words. The score is a unitless
number that represents how fitting the stream of words is for a certain class. Naïve
Bayes Formula can be broken down into the following parts.

A priori probability: P (A), is the probability of an event occurring without any
other information given. The posteriori probability P (A | T ) is the probability
of event A occurring given T. Just as the Latin words ”priori” and ”posteriori”
suggest, they refer to the probabilities. Priori is initially known, and the posteriori
is known only after having used the Naïve Bayes formula.                       The posteriori
probability is also known as the score of the tweet for a certain class. The last part
of Naive Bayes is all the probabilities of a certain attribute Tn given the condition A.
Tn is any word that is in the stream and A is one of the classes. These probabilities
are also called likelihoods and can be represented with a sum of multiplication as
               Qn
seen below.     i=1   P (Ti | A).

The equation for Naïve Bayes can be seen below.

                                    P (T1 | A)P (T2 | A)...P (Tn | A)P (A)
                    P (A | T ) =                                           .              (2)
                                                    P (T )

Simply put, the above formula is executed twice for each tweet and the class that
generates the highest posteriori probability with the tweet is assigned to the tweet.
It is executed twice for each tweet in this research project’s case since two classes
are used. The classes in this case are either negative or positive.

                                                                                           9
2.5    Economic theory

Twitter, in this study, is viewed as a stream of information in which retail investors
seek new potential trades. Enabling it to have the potential to influence price
movements, especially for the stock of smaller companies. This section aims to
briefly explain the widely accepted Efficient Market Theory in which this study
has its basis.

2.5.1 Efficient Market Theory

According to the efficient market hypothesis (EMH), a widely accepted theory a
generation ago, the price of and potential gain in any security or stock is dependent
on the availability of information to all participants. In a fully efficient market
a certain set of information, α, would not impact the market price of a stock if
revealed to all participants [13].

Figure 2.3: Graph showing company HSTK B releasing a sales report on December
1st 2020. The price rose almost 57% on the day. Since the company issued a report
this observation was omitted from this study’s results.

This is based on the notion that information travels quickly and is subsequently
incorporated into the market price without delay [14]. Figure 2.1 shows an

                                                                                   10
example of how the market quickly reacts to new information.

However, due to discrepancies between EMH and measured volatility in the
market many have questioned EMH and also questioned how efficient our
markets actually are [19].

2.5.2 Algorithmic trading

Algorithmic trades are transactions in the stock market made by computer.
In the United States, algorithmic trading makes up around 50% of market
liquidity. These computer­executed trades can also sometimes lead to unexpected
movements in stocks.     The algorithms are often proprietary and identifying
algorithmic trading is almost impossible in most cases [10].

Although this study does not involve algorithmic trading, its undetectable effects
might limit findings in the study.

2.6   Previous research

In this section, we introduce studies that have evaluated the possibility of
prediction market movements using Twitter, as well as the impact of CEO’s
tweeting on their company’s stock performance. Methods used in these studies
have been replicated and applied in this study, more specifically, sentiment
analysis and evaluation of changes in stock pricing.

2.6.1 Twitter mood predicts the stock market

In J. Bollen and H. Mao’s study conducted in 2010, they used sentiment analysis
on Twitter to determine the broader ”mood” of the general public at a given point
[2]. This was later combined with machine learning algorithms to predict the
value movement of the Dow Jones Industrial Average (DJIA). They found an
accuracy of 87.6% in predicting the daily up and down changes of the closing
values of the DJIA as well as a Mean Average Percentage Error reduction by more
than 6%.

                                                                                11
2.6.2 How Social Media usage by managers affects corporate value: The
       case of Elon Musk

M. Corte investigates high­profile CEOs’ social media usage and its impact on their
company’s stock prices. This master thesis focuses primarily on Elon Musk, the
CEO of Tesla [5]. Using sentiment analysis on Musk’s tweets and comparing it to
the movements of the Tesla stock Corte evaluated 188 tweets made by Musk in the
first quarter of 2020. When evaluating Tesla­related tweets, his models resulted
in a p­value of 0.08 which is not statistically significant. However, the same model
used on none Tesla­related tweets received a much higher p­value, a result Corte
was unable to explain. Further evidence of the stock price moving several percent
minutes after Elon Musk’s tweets made Corte believe that a statistically significant
result could be obtained using more advanced models.

2.6.3 Stock Price Forecasting via Sentiment Analysis on Twitter

The conference paper titled ”Stock Price Forecasting via Sentiment Analysis on
Twitter” conducted by J. Kordonis, S. Symeonidis, and A. Arampatzis investigates
stock market predictions using Twitter sentiment analysis [11]. In their study, they
analyze tweets relating to 16 of the most popular technology stocks on the Nasdaq
stock exchange. They then used machine learning (Support Vector Machine) to
predict the movement and daily closing prices of the stock market based on the
daily Twitter sentiment of the stock. Conclusively, they achieved an accuracy of
87% in predicting the movement of the stock and averaged a 1.669% error margin
in predicting the closing price on 23/6 ­ 2016.

                                                                                 12
3    Methods
Determining eligible Twitter accounts whose tweets in turn can be used as
data sets for this research project is the first part of this work. When Twitter
accounts have been determined, the tweets that mention certain Swedish small­
cap companies of those accounts will be scraped with the help of a scraping
library in the Python programming language called Twint. Twint is a web scraper,
a piece of software that fetches data that is available on a website. In this
research project, the website is Twitter and the data consists of tweets. The tweets
will be outputted in a CSV file, CSV is a format for displaying data in tabular
format. Natural language processing is later performed on the CSV file in order
to interpret the sentiment of the tweets gathered. Natural language processing is
most easily described as a computer’s way of interpreting human language, in the
natural language processing part, the tweets are assigned sentiment scores. Price
information for the stocks was retrieved from the Nordic Growth Market data set.
The stocks were connected to their respective tweets and evaluated in Python to
calculate price changes as well as volatility using the Parkinson volatility formula.
Lastly, t­tests were conducted to determine if the results carried any statistical
significance.

Figure 3.1: Visualisation of method workflow. The method could be divided in to
three parts, building the twitter data set using Twint and it’s sentiment analysis
by the help of NLTK, secondly retrieving all needed market information from the
NGM data set using Python. Lastly, connecting the two data sets and conducting
the evaluation.

                                                                                  13
3.1   Determining eligible Twitter accounts

The Swedish ”Finance Twitter” community is a relatively small group with a set
group larger accounts having significant follower engagement. This engagement,
likes, retweets, etc, enables wide reach. When selecting these accounts we wanted
an unbiased approach.       A threshold was selected of 500 followers, and the
accounts selected needed to have a sole focus on financial markets and stocks.
A list of one hundred eligible accounts was compiled and thirty of these accounts
were selected at random. This to ensure no prior bias in the accounts’ possible
ability to affect the market.

3.2   Scraping

An easy way to retrieve data from Twitter is to use the Twitter API that Twitter Inc
has designed [24]. Unfortunately, the Twitter API can only fetch one­week­old
tweets, that is why a web scraper was used when fetching tweets for this research
project. Scraping or more commonly known as web scraping is the process of
extracting data from a website. The website that was scraped in this case was
Twitter.com. The scraping in this work was done through Twint. Twint is a Python
library that allows users to access all tweets in a specified time period with certain
filtering conditions. In this work’s case, one of the filtering condition was whether
the tweet mentioned any of the companies listed on the dataset provided to us by
NGM (Nordic Growth Market), the second filtering condition was to omit Tweets
that were retweets, the third condition was to omit replies, namely, tweets that
are part of a conversation, the motivation behind the filtering conditions is to
only fetch stand­alone tweets that mention Swedish small­cap companies. The
tweets were later outputted as rows in a CSV file. As mentioned in chapter 2,
DOM scraping was used when performing web scraping in this project.

3.3   Natural language processing of the outputted CSV file

The Python library NLTK was used when performing natural language processing
on the gathered tweets.         NLTK has the ability to perform natural language
processing and therefore has the ability to perform Naive Bayes Coding with an
in­built classifier. In this work, the classifier was trained with 750 positive tweets

                                                                                   14
and 250 negative tweets. The stream of words in this case is a single tweet, and the
labels are either positive or negative. A python script was run in order to assign
every tweet a class, either positive or negative.

3.4    Evaluation

Before the evaluation began, the dates of the tweets were checked to make sure
no other information was made public on the day of the tweet. This included any
information distributed by the companies themselves, such as earnings reports,
order announcements, and general news as well as any third­party institutional
news involving the companies.

A Python program was built to extract all necessary information from the market
data file received from Nordic Growth Market, for the day of the tweet as well as
nine days prior for all tweets. This ten­day period is equal to two weeks of trading
days. The information used was a date, close price of the stock, as well the highest
and lowest trade price of the day.

The program then calculated the change in price for all days, in percent, using the
simple equation:

                                      P ricei
                    Changei = (                − 1) ∗ 100,   i = date              (3)
                                     P ricei−1

To calculate the volatility the Parkinson volatility formula was implemented:

                               v
                               u
                               u     1 X    T
                                                   ht 2
               V olatility =   t               ln ( ) ,   T = timeperiod           (4)
                                   4T ln 2 t=1     lt

The Parkinson volatility formula was used due to its incorporation of the daily
high price, ht , and the daily low price, lt , instead of using only the closing price.
This allows for the detection of price swings within a day of trading.

Volatility was calculated for two different time periods, the nine days prior to the
tweet and the day of the tweet.

The evaluation involved comparing the sentiment of the tweets to the outcome

                                                                                    15
of its corresponding trading day. As well as the volatility compared to the nine
prior trading days. For a positive sentiment tweet, the expected outcome was an
increase in price, the reverse was expected for negative tweets. An increase in
volatility was expected in both scenarios. A comparison was also made between
the outcome of the different sentiments.

T­tests for all different cases were conducted to determine if there was any
statistical significance found between tweets and their impact on the market.

                                                                                16
4     Result
After the set of tweets had been cleaned and process 85 observations remained.
The observations were then grouped based on sentiment type and evaluated in
accordance with the process outlined in the prior chapter.

4.1   Positive sentiment tweets

Figure 4.1: The amount of observations, in percent, which had a positive or
negative change in price on the day of the tweet. This result only includes positive
sentiment tweets.

Figure 4.2: The amount of observations, in percent, which had a increase or
decrease in volatility on the day of the tweet, compared to the volatility of the
stock the nine prior days. This result only includes positive sentiment tweets.

On 56% of the days when a positive sentiment tweet was posted, the price of
the mentioned stocks increased. However, no statistical significance was found
regarding the impact of the tweets on the stock price on the day of the tweet.

Examining the impact of positive tweets on the different stocks’ volatility, the

                                                                                 17
volatility only increased in 36% of the observations. With a P­value of 0.08, no
statistical significance was found.

4.2   Negative sentiment tweets

Figure 4.3: The amount of observations, in percent, which had a positive or
negative change in price on the day of the tweet. This result only includes negative
sentiment tweets.

Figure 4.4: The amount of observations, in percent, which had a increase or
decrease in volatility on the day of the tweet, compared to the volatility of the
stock the nine prior days. This result only includes negative sentiment tweets.

Approximately 38% of negative tweets resulted in a negative impact on their
respective stocks price on the day of the tweet. No statistical significance was
found between the tweet sentiment and its impact on the stock market.

The volatility of the respective stocks increased in approximately 62% of the
observations, in compression to the volatility of the stocks nine days prior.

                                                                                 18
4.3   Overall volatility for both sentiment tweets

Figure 4.5: The amount of observations, in percent, which had a increase or
decrease in volatility on the day of the tweet, compared to the volatility of the
stock the nine prior days.. The result includes both negative and positive tweets.

For all observations, the volatility of the observed stocks decreased in around 45%
of the cases. With a P­value of 0.326, no statistical significance was found, for the
tweet’s impact on the respective stock’s volatility.

                                                                                  19
5     Discussion

5.1   RQ1 ­

When evaluating the impact of the tweets on price, no statistically significant
correlation was found for either positive or negative sentiment tweets, as shown
in Figures 4.1 and 4.3. These results may depend on many different parameters,
such as investors not acting particularly fast to new tweets from accounts they are
following or investors using these Twitter accounts as inspiration for their own
research rather than direct investment recommendations. In M.Corte’s study on
the impact of Tesla CEO Elon Musk’s Twitter usage he strongly suggests, although
without showing statistical significance, that his tweets do in fact move the price
of Tesla stock, section 2.6.2 [5]. This is most likely due to the fact that Musk
is the acting CEO of the company and is, therefore, the most knowledgeable
person when it comes to Tesla and its business. The Twitter accounts used in this
study are not, as far as their profiles say, in an active role in the companies they
discuss, basing their knowledge on information already available to the general
public. Another key factor is the size of the accounts, with Musk’s following being
in the tens of millions while the Swedish Twitter accounts usually have below
20 thousand. Furthermore, even though this study focuses on small companies
that are more reactive to market or trade volume changes, the trade volumes
needed to significantly move the price seem larger than the volumes possibly
generated by a tweet. As described in the theoretical background, section 2.4.2,
today’s markets are dominated by algorithmic trading which might counteract
any potential larger change in price for these companies. Since these algorithms
are often proprietary ”black boxes”, it is difficult for non­insiders to determine
how and where these algorithms are working. When receiving our market data
set from Nordic Growth Market, our contact said he was positive one would be
able to find a correlation between Twitter and its impact on the companies their
market houses. Since this study was conducted in a general manner, with no
preconceived notion of which accounts or stocks might have a greater chance
of generating a favorable outcome, interesting further research might include
investigating specific companies highlighted by NGM themselves.

                                                                                 20
5.2   RQ2 ­

The results closest to generating a statistically significant result was the change
in volatility for positive sentiment tweets, as seen in figure 4.2. However, the
results were the opposite of the expected outcome, with lowered volatility on the
day of the tweets. This might be due to some inherent limitations for the generally
accepted way in which volatility is calculated. The volatility formula used in the
study uses the highest and lowest price traded on any given day. This allows for
detecting large changes within the day of trading, instead of relying solely on the
closing price. However, what follows is for example that a day in which a stock
steadily increases 5% from its opening price without going negative during the
day would generate lower volatility than a day where the open and close prices are
the same but the price at some point oscillates between negative 3% and positive
3%. Taking this into account one reason for the results seen in figure 4.2 might be
that positive sentiment tweets reduce periods of negative price movements and
therefore reducing the total measured volatility of the stock. Another possible
reason could be that a tweet is a reaction to the previous days’ price changes
in a certain stock. A tweet could for example be posted in reaction to a certain
stock’s recent decrease in price, implying the company is now undervalued. A
common theme in the previous research papers used in this study is the difficulty
of reaching a statistically significant result. The modern financial markets are
enormously complex with many moving parts and participants. Determining
how the market will move and why it moves with absolute certainty is in most
cases impossible. However, certain trends and suggested correlations can be
found, which are often used as a basis for many investment strategies. With this
said, even though no statistical significance was found in this case, completely
discarding the reduced volatility result in figure 4.2 is unnecessary.

5.3   RQ3 ­

Comparing the changes in the price of the positive sentiment tweets to the results
of the negative, no significant difference was found. However, the difference
in volatility between positive and negative tweets, although not statistically

                                                                                21
significant, is noticeable. As studies have suggested a bias in credibility towards
positive information compared to negative information [3], these results might
suggest that investors are more likely to act on a tweet containing a positive
sentiment. Therefore reducing negative swings in pricing, as mentioned in the
paragraph above. Bollen et al. concluded in the research paper titled ”Twitter
mood predicts the stock market” [2], an 86,7 % accuracy when taking public mood
into account from tweets when predicting the value of DJIA. Bollen et al., on the
other hand, used a data set of tweets that accounted for approximately 10 million
tweets from 2.7 million users, which could from a quantitative point of view be
more accurate than this work’s data set. The DJIA value is a stock index that
reflects how well the thirty largest companies in the United States perform on the
stock market [8].

5.4   Limitations

This research project did not intend to forecast stock prices and fluctuation.
Nor does this study evaluate the over­time performance of stocks mentioned by
individuals on Twitter. It is rather a study that investigates human behavior
and the impact of the ever­growing ubiquity of social media on retail investor’s
immediate activity on the stock market.

5.5   Future Work

This study has suggested that larger Swedish Twitter accounts, in general, do
not move the prices nor the volatility of certain stocks. However, since other
studies have implied that larger Twitter accounts of people such as Elon Musk can
impact the movements of certain securities interesting future work might include
determining the size of following needed to impact the stock prices of companies
mentioned in tweets. Furthermore, the study only focuses on the immediate
impact of the tweets. Many of the accounts we used in this study tweeted about the
same company multiple times over a longer period of time. Therefore future work
might include comparing the performance of stocks popular at Twitter compared
to companies receiving less or no exposure from on Twitter. Evaluating whether or
not long periods of positive reinforcement on Twitter lead to greater performance

                                                                                22
in small­cap corporations.

                             23
6    Conclusion
The results suggest that there is no correlation between the movement of small­
cap stocks and tweets from larger finance Twitter accounts mentioning the
companies. Although no statistical significance was found, it could be noted
that positive tweets may have an immediate effect on reduced volatility in the
mentioned stock.    Furthermore, the type of sentiment, whether positive or
negative, seem to have little effect on the impact of the tweets on the price of a
stock. In conclusion, this study was unable to detect any direct impact on the
stocks mentioned by the Swedish finance Twitter accounts used in the study.
However, due to the general nature of this study with regards to the selection
of Twitter accounts and companies, further studies need to be conducted to
determine individual Twitter accounts’ possibility to impact the market.

                                                                               24
References
 [1] Barone, Adam. “Small Cap”. In: Investopedia (2020). URL: https://www.
       investopedia.com/terms/s/small-cap.asp.

 [2] Bollen, Johan, Mao, Huina, and Zeng, Xiaojun. “Twitter mood predicts the
       stock market”. In: Journal of computational science 2.1 (2011), pp. 1–8.

 [3] Callison, Coy. “Do PR practitioners have a PR problem?: The effect
       of associating a source with public relations and client­negative news
       on audience perception of credibility”. In: Journal of Public Relations
       Research 13.3 (2001), pp. 219–234.

[4] Chen, James. “Volatility”. In: Investopedia (2021). URL: https : / / www .
       investopedia.com/terms/v/volatility.asp.

 [5]   Corte, Miguel Alexandre Barbeira. “How social media usage by managers
       affects corporate value: the case of Elon Musk”. PhD thesis. 2020.

[6] Duncombe, Constance. “The politics of Twitter: emotions and the power of
       social media”. In: International Political Sociology 13.4 (2019), pp. 409–
       429.

 [7]   Fitzgerald, Maggie. “Roaring Kitty’ Keith Gill defends GameStop posts, says
       he is as bullish as ever on the stock ”. In: CNBC (2021).

[8]    Ganti, Akhilesh. “Dow Jones Industrial Average (DJIA)”. In: Investopedia
       (2021). URL: https://www.investopedia.com/terms/d/djia.asp.

[9] Hayes, Adam. “Retail Investor”. In: Investopedia (2021). URL: https://
       www.investopedia.com/terms/r/retailinvestor.asp.

[10] Hendershott, Terrence, Riordan, Ryan, et al. “Algorithmic trading and
       information”. In: Manuscript, University of California, Berkeley (2009).

[11] Kordonis, John, Symeonidis, Symeon, and Arampatzis, Avi. “Stock price
       forecasting via sentiment analysis on twitter”. In: Proceedings of the 20th
       Pan­Hellenic Conference on Informatics. 2016, pp. 1–6.

[12] Le Hégaret Lauren W, Jonathan R. “What is the Document Object Model?”
       In: W3c (2000).

                                                                               25
[13] Malkiel, Burton G. “Efficient market hypothesis”. In: Finance. Springer,
       1989, pp. 127–134.

[14] Malkiel, Burton G. “The efficient market hypothesis and its critics”. In:
       Journal of economic perspectives 17.1 (2003), pp. 59–82.

[15]   Ossinger, Joanna. “GameStop’s Volatile Rally Smashes Wall Street Price
       Targets”. In: Bloomberg (2021).

[16] Pineiro­Chousa, Juan, Vizcaino­Gonzalez, Marcos, and Perez­Pico, Ada
       Maria. “Influence of social media over the stock market”. In: Psychology
       & Marketing 34.1 (2017), pp. 101–108.

[17]   Rish, Irina et al. “An empirical study of the naive Bayes classifier”. In: IJCAI
       2001 workshop on empirical methods in artificial intelligence. Vol. 3. 22.
       2001, pp. 41–46.

[18] Shead, Sam. “Elon Musk’s tweets are moving markets — and some investors
       are worried”. In: CNBC (2021).

[19] Shiller, Robert J. “From efficient markets theory to behavioral finance”. In:
       Journal of economic perspectives 17.1 (2003), pp. 83–104.

[20] Sun, Shiliang, Luo, Chen, and Chen, Junyu. “A review of natural language
       processing techniques for opinion mining systems”. In: Information fusion
       36 (2017), pp. 10–25.

[21] Tankovska. “Leading countries based on number of Twitter users as of
       January 2021)”. In: Statista (2021). URL: https://www.statista.com/
       statistics/242606/number-of-active-twitter-users-in-selected-
       countries/.

[22] Tankovska. “Number of monthly active Twitter users worldwide from 1st
       quarter 2010 to 1st quarter 2019”. In: Statista (2021). URL: https : / /
       www.statista.com/statistics/282087/number- of- monthly- active-
       twitter-users/.

[23] Twin, Alexandra. “Volume of Trade”. In: Investopedia (2021). URL: https:
       //www.investopedia.com/terms/v/volumeoftrade.asp.

[24] “Twitter API”. In: Twitter (2021). URL: https : / / developer . twitter .
       com/en/docs/twitter-api.

                                                                                    26
[25] Webster, Jonathan J and Kit, Chunyu. “Tokenization as the initial phase in
      NLP”. In: COLING 1992 Volume 4: The 15th International Conference on
      Computational Linguistics. 1992.

                                                                            27
TRITA-EECS-EX-2021:443

    www.kth.se
You can also read