Swedish finance Twitter accounts short term impact on Swedish small cap companies
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2021 Swedish finance Twitter accounts short term impact on Swedish small cap companies John Janér and Noah Rahimzadagan KTH ROYAL INSTITUTE OF TECHNOLOGY ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
Abstract Over the last five years, the amount of retail investors has increased immensely. Trying to make informed decisions, many of the more active investors look to social media as a source of information. In early 2021, the eyes of the world focused on retail investors as Gamestop, a video game retailing company, experienced an immense price surge over the course of a few weeks on the stock market. This event, among others, lead the SEC (Securities and Exchange Commission) to open up a discussion about the impact of social media on the stock market. It seemed individual social media accounts were able to increase the volatility in a number of different stocks. This study investigates the immediate impact of larger Swedish Twitter accounts on the volatility and price of Swedish smallcap companies. Sentiment analysis and data modeling in the Python programming language were used in order to compare volatility and price changes before and after tweets of different sentiments were made about the companies. Our study was unable to find any correlation between an immediate change in price or an immediate increase in volatility and tweets made, suggesting Swedish finance Twitter accounts have little to no immediate impact on Swedish smallcap companies. Keywords Human behavior, Financial markets, Sentiment analysis, Twitter i
Sammanfattning Under de senaste fem åren har antalet privata investerare ökat markant. När privata investerare försöker göra välgrundade investeringsbeslut brukar de ofta använda inlägg på sociala medier som ledstjärna. Tidigt på år 2021 vändes blickarna mot privata investerare när priset på spelåterförsäljningsföretaget Gamestops aktier ökat med flera hundratals procent under bara loppet av några få veckor. Denna prisökning fick SEC (Securities and Exchange Commission i USA) att inleda en diskussion om inverkan av sociala medier på aktiehandeln. Mycket påvisade att individuella konton på sociala medier hade förmågan att öka volatilitet av aktiepriser för vissa bolag. Det här forskningsprojektet ämnar att undersöka den omedelbara inverkan av svenska twitterkonton på pris och volatilitet av pris av svenska småföretags aktier. Sentimentanalys och datamodellering gjordes i programmeringsspråket Python för att jämföra volatilitet och prisändringar innan och efter tweets av olika sentiment gjordes om de olika företagen. Studien lyckades inte visa på korrelation mellan en omedelbar ändring i pris eller omedelbar ökning i volatilitet och gjorda tweets, vilket tyder på att twitterkonton har inget eller väldigt lite inflytande på svenska småföretag. Nyckelord Mänskligt beteende, Finansiella marknader , Sentimentanalys, Twitter Authors John Janér and Noah Rahimzadagan Information and Communication Technology KTH Royal Institute of Technology Place for Project Stockholm, Sweden ii
Examiner Pawel Herman Stockholm, Sweden KTH Royal Institute of Technology Supervisor Chris Peters Stockholm, Sweden KTH Royal Institute of Technology iii
Contents 1 Introduction 1 1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Scope of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Theoretical Background 4 2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Market data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Twitter data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Economic theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Previous research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Methods 13 3.1 Determining eligible Twitter accounts . . . . . . . . . . . . . . . . . 14 3.2 Scraping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Natural language processing of the outputted CSV file . . . . . . . . 14 3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Result 17 4.1 Positive sentiment tweets . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Negative sentiment tweets . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 Overall volatility for both sentiment tweets . . . . . . . . . . . . . . 19 5 Discussion 20 5.1 RQ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2 RQ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.3 RQ3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6 Conclusion 24 References 25 iv
1 Introduction With the rise of social media, the ability to share feelings, opinions, and gossip globally is no longer monopolized by traditional institutions such as newspapers and TV networks. Most individuals are able to create an account at any given social media platform and share thoughts and statements with the rest of the world. On the 27th of January of 2021, Gamestop, a video game retailing company experienced a price surge on the stock market. The price of one Gamestop stock reached as much as 350 USD, an immense increase considering one share’s value being only a small fraction of the price two weeks prior to this unusual increase in price [15]. Price fluctuations in the stock market can rarely be explained with absolute certainty, this also applies in the case of the Gamestop price surge of January 2021. Although, many speculate that the increase of price can be to some extent accredited to the cyber feud between institutional investors and retail investors that brewed up on the social media platform Reddit before the price surge, which drove many retail investors to buy shares in Gamestop. The speculations went as far as the American Congress having Keith Gill, one of many advocates for buying GME stocks on Reddit, brought in for a hearing in which he had to testify and answer questions regarding events surrounding Gamestop on the stock market [7]. Elon Musk, an entrepreneur and founder of Tesla, Space X, and PayPal has been reported tweeting about companies prior to price surges [18]. While the tweets cannot be entirely accounted for the increase in price, many believe that the tweets had some significance. 1.1 Problem statement Since there is clear evidence that social media, to some extent, influences investors [16], the question of whether there is a correlation between information on social media and stock prices is raised. Therefore the aim of this project is to determine high profile Swedish finance Twitter account’s shortterm impact on the stock price of Swedish, public smallcap companies. Specifically: • RQ1 In what way are larger Swedish finance Twitter accounts able to move the price of a stock? 1
Looking at smallcap companies could finance Twitter accounts in general above a certain follower threshold move the price of a stock in an immediate way. Using sentiment analysis and data modeling in Python, can a correlation be found between certain tweets and market movements? • RQ2 Are tweets from these finance accounts contributing to increased risk in smallcap companies? Using the Parkinson volatility formula to evaluate volatility before and after the tweets, do these tweets have an immediate impact on the volatility of the stock, contributing to a riskier investment in the short term? • RQ3 Does the nature of tweet sentiment affect the outcome of the change? The outcome of change is in other words described as the nature of the price change, which means if a certain stock increases or decreases in price. An increase or decrease in volatility are also two different outcomes of change. Studies have suggested that positive information is perceived as more credible than negative information [3]. Will positive sentiment tweets have a higher possibility to move the stock price or increase the volatility? 1.2 Scope of the study This research project will encompass sentiment analysis of the content of tweets made by certain Swedish finance Twitter accounts in order to investigate a correlation between the content of tweets and the state of the stock market. The tweets that will be analyzed are scraped with a web scraper, a tool that fetches data from websites. Tweets are chosen only if they happen to mention a certain Swedish smallcap company. Tweets will be analyzed with natural language processing and later their effect on the stock prices of the companies mentioned will be investigated. 1.3 Thesis outline In the following chapter, the theoretical background of the project will be presented, in which theory about the subject sentiment analysis will be provided. The third chapter will cover the methodology of this project, and go into detail 2
specifically on how the scraping of tweets was done and how the tweets are evaluated. Chapter 4 will entail the project’s results, the findings, and namely the data that was gathered. Chapter 5 will discuss the findings and reflect on previous work. In chapter 6, a conclusion will be presented. 3
2 Theoretical Background This section intends to describe the data sets used. Furthermore, the methods applied in order to collect data sets will be described. Following directly below will be definitions of terminology used throughout the paper. An introduction to previous research related to the subject will also be given. 2.1 Definitions 2.1.1 Small Cap A company with a market capitalization under one billion USD is considered a ”smallcap” company [1]. Institutional investors are generally not allowed to hold large stakes in smallcap companies making them more accessible for retail investors. 2.1.2 Retail Investor Retail investors are nonprofessional investors [9]. Also known as individual investors. 2.1.3 Volatility Volatility is a measurement for the range of possible returns of a security. Higher volatility usually indicates a riskier security [4]. 2.1.4 Volume of Trade Volume of Trade or just ”volume” refers to the total quantity of shares being traded of a specific security, in any given time frame. [23] 2.2 Market data A data set was received from Nordic Growth Market, a Swedish stock exchange housing companies with small market capitalization. The data was compiled in a JSON file containing ticker, name, price, volume, and date for every trade made on the exchange dating back five years. Price data is of importance since 4
this research project aims to investigate the correlation between tweets and their content related to certain Swedish smallcap companies and how the stock prices of said companies change in relation to these tweets. 2.2.1 Market data format Figure 2.1: Example JSON object from market data set. 2.3 Twitter data The sentiment analysis is performed on tweets made by certain Swedish finance accounts. Web scraping was used in this work in order to access the data in the HTML document that Twitter consists of. The tweets, in other words, were collected using web scraping tools and stored in a CSV file for processing. 2.3.1 Twitter Twitter is an American social network and microblog platform enabling users to post and interact with messages called ”tweets.” The platform has more than 180 million daily active users [21] and an average of 330 million active monthly users in 2019 [22]. While registered users can like, comment, and retweet any user can view and read tweets, making Twitter a powerful tool to spread information or opinions. Twitter is used extensively by politicians and other public figures because of its accessibility [6]. Due to a large number of users, Twitter contains extensive 5
amounts of noise. Filtering through this noise is, for a human, nearly impossible. However, individuals with a large enough following on the platform can pierce through the noise using their followers as vessels for likes and retweets. This allows certain people to spread information, true or false, to large groups within our society. 2.3.2 Document Object Model Ever since the birth of websites, the most fundamental part of a website has been the HTML document, which models how the different components of a website should be arranged. Furthermore, the HTML document contains the data that is shown on the website. Document Object Model or DOM is a common way to define the logical structure of an HTML document [12]. A DOM structure is easily accessed and manipulated because of its forestlike structure. Furthermore, the DOM is modeled to work with any programming language. The DOM is of importance in this research project since it provides a way to access tweets that will undergo sentiment analysis. 2.3.3 Web scraping In order to obtain tweets for this research project, they had to be extracted from Twitter. Many websites such as Twitter have the majority of their data available for everyone in the form of a feed, that can be accessed with any web browser. Searching through this feed in order to gather data that could stretch over a specific time frame is not only cumbersome but also timeconsuming. Web scraping is a method used to collect information that is usually on display for human consumption on a website. Also known as a crawler, a web scraper commonly navigates the underlying HTML document of a website in order to find specific, often predecided, strings of texts across many different pages or profiles. This data is then collected and compiled into any format desired by the user. One type of web scraping is called DOM scraping. Since websites implement the DOM model through the underlying HTML document, the data of a website, Twitter 6
included, will be in the form of a tree or forestlike structure. This in turn with the help of any programming language can easily be accessed. When DOM scraping was performed in this research project, the Python library Twint was used. Figure 2.2: Visualisation of DOM (Document Object Model) tree structure. 2.4 Natural Language Processing Natural Language Processing is the process of interpreting human language with the help of computers [20]. In this work’s case, the language that is interpreted is the content of the web scraped tweets. When performing natural language processing in other words, sentiment analysis is done. Determining a tweet’s sentiment entails classifying its content, which is text, with a label, in this research project, the two labels are either negative or positive. Furthermore, Natural Language Processing consists of multiple steps that are thoroughly explained in the following sections. 2.4.1 Tokenization Computers are unable to interpret human language as is. The breaking down of text is done during the tokenization process [25]. Breaking down a text string into tokens as a first step when performing natural language processing is a common practice since it is easier to train a computer to classify separate words than a 7
group of words. A text string is split into multiple tokens that are inserted in a list. The following text string: ”Well yes. I would love a cinnamon bun!” Would generate the tokens: [well, yes, ., I, would, love, a, cinnamon, bun, !,.] In the above example, the sentence is split on all white spaces and then turned into a list of tokens. This kind of tokenization is what was used in this work. The token arrays are later evaluated word for word by the classifier. 2.4.2 Normalization of tokens The act of normalization in the context of natural language processing is to convert multiple words that have the same meaning but different forms, into the same form. For instance, the words ”sing”, ”sang” and ”sung” all have the same meaning but come in different forms. It is timeconsuming and unnecessary to train a classifier the same words but in different forms, therefore, normalization is a common practice in natural language processing. Furthermore, stop words, words such as ”and” and punctuations are removed. Since stop words convey no meaning and only serve to make reading texts easier for human readers, it is best to remove these words before performing the sentiment analysis. 2.4.3 Naïve Bayes Message Coding A common practice when classifying text is the Naïve Bayes Method. As the name suggests the method is naïve in the sense that it classifies word streams independent of what other words are in the same text stream. This is easily demonstrated with an example. The text stream ”Dear friend” and ”Friend dear” are assigned the same classification score. Naturally, there are more sophisticated classifying methods, although, Naïve Bayes has been proven to perform exceptionally well [17]. The name Bayes stems from the mathematician Thomas Bayes and his mathematical formula, Bayes’ Theorem, a formula that determines the probability of one event given a certain condition. Bayes’ Theorem is depicted on the following page. 8
P (B | A)P (A) P (A | B) = . (1) P (B) P (A | B) is the conditional probability of event A occurring with the condition of B occurring. P (B | A) is the conditional probability of event B occurring with the condition of A occurring. P (A) and P (B) are the probability of A or B occurring. Naïve Bayes is derived from the above formula and more specifically, in this case, it is used for the purpose of assigning a score for each word in a stream of words. The score is a unitless number that represents how fitting the stream of words is for a certain class. Naïve Bayes Formula can be broken down into the following parts. A priori probability: P (A), is the probability of an event occurring without any other information given. The posteriori probability P (A | T ) is the probability of event A occurring given T. Just as the Latin words ”priori” and ”posteriori” suggest, they refer to the probabilities. Priori is initially known, and the posteriori is known only after having used the Naïve Bayes formula. The posteriori probability is also known as the score of the tweet for a certain class. The last part of Naive Bayes is all the probabilities of a certain attribute Tn given the condition A. Tn is any word that is in the stream and A is one of the classes. These probabilities are also called likelihoods and can be represented with a sum of multiplication as Qn seen below. i=1 P (Ti | A). The equation for Naïve Bayes can be seen below. P (T1 | A)P (T2 | A)...P (Tn | A)P (A) P (A | T ) = . (2) P (T ) Simply put, the above formula is executed twice for each tweet and the class that generates the highest posteriori probability with the tweet is assigned to the tweet. It is executed twice for each tweet in this research project’s case since two classes are used. The classes in this case are either negative or positive. 9
2.5 Economic theory Twitter, in this study, is viewed as a stream of information in which retail investors seek new potential trades. Enabling it to have the potential to influence price movements, especially for the stock of smaller companies. This section aims to briefly explain the widely accepted Efficient Market Theory in which this study has its basis. 2.5.1 Efficient Market Theory According to the efficient market hypothesis (EMH), a widely accepted theory a generation ago, the price of and potential gain in any security or stock is dependent on the availability of information to all participants. In a fully efficient market a certain set of information, α, would not impact the market price of a stock if revealed to all participants [13]. Figure 2.3: Graph showing company HSTK B releasing a sales report on December 1st 2020. The price rose almost 57% on the day. Since the company issued a report this observation was omitted from this study’s results. This is based on the notion that information travels quickly and is subsequently incorporated into the market price without delay [14]. Figure 2.1 shows an 10
example of how the market quickly reacts to new information. However, due to discrepancies between EMH and measured volatility in the market many have questioned EMH and also questioned how efficient our markets actually are [19]. 2.5.2 Algorithmic trading Algorithmic trades are transactions in the stock market made by computer. In the United States, algorithmic trading makes up around 50% of market liquidity. These computerexecuted trades can also sometimes lead to unexpected movements in stocks. The algorithms are often proprietary and identifying algorithmic trading is almost impossible in most cases [10]. Although this study does not involve algorithmic trading, its undetectable effects might limit findings in the study. 2.6 Previous research In this section, we introduce studies that have evaluated the possibility of prediction market movements using Twitter, as well as the impact of CEO’s tweeting on their company’s stock performance. Methods used in these studies have been replicated and applied in this study, more specifically, sentiment analysis and evaluation of changes in stock pricing. 2.6.1 Twitter mood predicts the stock market In J. Bollen and H. Mao’s study conducted in 2010, they used sentiment analysis on Twitter to determine the broader ”mood” of the general public at a given point [2]. This was later combined with machine learning algorithms to predict the value movement of the Dow Jones Industrial Average (DJIA). They found an accuracy of 87.6% in predicting the daily up and down changes of the closing values of the DJIA as well as a Mean Average Percentage Error reduction by more than 6%. 11
2.6.2 How Social Media usage by managers affects corporate value: The case of Elon Musk M. Corte investigates highprofile CEOs’ social media usage and its impact on their company’s stock prices. This master thesis focuses primarily on Elon Musk, the CEO of Tesla [5]. Using sentiment analysis on Musk’s tweets and comparing it to the movements of the Tesla stock Corte evaluated 188 tweets made by Musk in the first quarter of 2020. When evaluating Teslarelated tweets, his models resulted in a pvalue of 0.08 which is not statistically significant. However, the same model used on none Teslarelated tweets received a much higher pvalue, a result Corte was unable to explain. Further evidence of the stock price moving several percent minutes after Elon Musk’s tweets made Corte believe that a statistically significant result could be obtained using more advanced models. 2.6.3 Stock Price Forecasting via Sentiment Analysis on Twitter The conference paper titled ”Stock Price Forecasting via Sentiment Analysis on Twitter” conducted by J. Kordonis, S. Symeonidis, and A. Arampatzis investigates stock market predictions using Twitter sentiment analysis [11]. In their study, they analyze tweets relating to 16 of the most popular technology stocks on the Nasdaq stock exchange. They then used machine learning (Support Vector Machine) to predict the movement and daily closing prices of the stock market based on the daily Twitter sentiment of the stock. Conclusively, they achieved an accuracy of 87% in predicting the movement of the stock and averaged a 1.669% error margin in predicting the closing price on 23/6 2016. 12
3 Methods Determining eligible Twitter accounts whose tweets in turn can be used as data sets for this research project is the first part of this work. When Twitter accounts have been determined, the tweets that mention certain Swedish small cap companies of those accounts will be scraped with the help of a scraping library in the Python programming language called Twint. Twint is a web scraper, a piece of software that fetches data that is available on a website. In this research project, the website is Twitter and the data consists of tweets. The tweets will be outputted in a CSV file, CSV is a format for displaying data in tabular format. Natural language processing is later performed on the CSV file in order to interpret the sentiment of the tweets gathered. Natural language processing is most easily described as a computer’s way of interpreting human language, in the natural language processing part, the tweets are assigned sentiment scores. Price information for the stocks was retrieved from the Nordic Growth Market data set. The stocks were connected to their respective tweets and evaluated in Python to calculate price changes as well as volatility using the Parkinson volatility formula. Lastly, ttests were conducted to determine if the results carried any statistical significance. Figure 3.1: Visualisation of method workflow. The method could be divided in to three parts, building the twitter data set using Twint and it’s sentiment analysis by the help of NLTK, secondly retrieving all needed market information from the NGM data set using Python. Lastly, connecting the two data sets and conducting the evaluation. 13
3.1 Determining eligible Twitter accounts The Swedish ”Finance Twitter” community is a relatively small group with a set group larger accounts having significant follower engagement. This engagement, likes, retweets, etc, enables wide reach. When selecting these accounts we wanted an unbiased approach. A threshold was selected of 500 followers, and the accounts selected needed to have a sole focus on financial markets and stocks. A list of one hundred eligible accounts was compiled and thirty of these accounts were selected at random. This to ensure no prior bias in the accounts’ possible ability to affect the market. 3.2 Scraping An easy way to retrieve data from Twitter is to use the Twitter API that Twitter Inc has designed [24]. Unfortunately, the Twitter API can only fetch oneweekold tweets, that is why a web scraper was used when fetching tweets for this research project. Scraping or more commonly known as web scraping is the process of extracting data from a website. The website that was scraped in this case was Twitter.com. The scraping in this work was done through Twint. Twint is a Python library that allows users to access all tweets in a specified time period with certain filtering conditions. In this work’s case, one of the filtering condition was whether the tweet mentioned any of the companies listed on the dataset provided to us by NGM (Nordic Growth Market), the second filtering condition was to omit Tweets that were retweets, the third condition was to omit replies, namely, tweets that are part of a conversation, the motivation behind the filtering conditions is to only fetch standalone tweets that mention Swedish smallcap companies. The tweets were later outputted as rows in a CSV file. As mentioned in chapter 2, DOM scraping was used when performing web scraping in this project. 3.3 Natural language processing of the outputted CSV file The Python library NLTK was used when performing natural language processing on the gathered tweets. NLTK has the ability to perform natural language processing and therefore has the ability to perform Naive Bayes Coding with an inbuilt classifier. In this work, the classifier was trained with 750 positive tweets 14
and 250 negative tweets. The stream of words in this case is a single tweet, and the labels are either positive or negative. A python script was run in order to assign every tweet a class, either positive or negative. 3.4 Evaluation Before the evaluation began, the dates of the tweets were checked to make sure no other information was made public on the day of the tweet. This included any information distributed by the companies themselves, such as earnings reports, order announcements, and general news as well as any thirdparty institutional news involving the companies. A Python program was built to extract all necessary information from the market data file received from Nordic Growth Market, for the day of the tweet as well as nine days prior for all tweets. This tenday period is equal to two weeks of trading days. The information used was a date, close price of the stock, as well the highest and lowest trade price of the day. The program then calculated the change in price for all days, in percent, using the simple equation: P ricei Changei = ( − 1) ∗ 100, i = date (3) P ricei−1 To calculate the volatility the Parkinson volatility formula was implemented: v u u 1 X T ht 2 V olatility = t ln ( ) , T = timeperiod (4) 4T ln 2 t=1 lt The Parkinson volatility formula was used due to its incorporation of the daily high price, ht , and the daily low price, lt , instead of using only the closing price. This allows for the detection of price swings within a day of trading. Volatility was calculated for two different time periods, the nine days prior to the tweet and the day of the tweet. The evaluation involved comparing the sentiment of the tweets to the outcome 15
of its corresponding trading day. As well as the volatility compared to the nine prior trading days. For a positive sentiment tweet, the expected outcome was an increase in price, the reverse was expected for negative tweets. An increase in volatility was expected in both scenarios. A comparison was also made between the outcome of the different sentiments. Ttests for all different cases were conducted to determine if there was any statistical significance found between tweets and their impact on the market. 16
4 Result After the set of tweets had been cleaned and process 85 observations remained. The observations were then grouped based on sentiment type and evaluated in accordance with the process outlined in the prior chapter. 4.1 Positive sentiment tweets Figure 4.1: The amount of observations, in percent, which had a positive or negative change in price on the day of the tweet. This result only includes positive sentiment tweets. Figure 4.2: The amount of observations, in percent, which had a increase or decrease in volatility on the day of the tweet, compared to the volatility of the stock the nine prior days. This result only includes positive sentiment tweets. On 56% of the days when a positive sentiment tweet was posted, the price of the mentioned stocks increased. However, no statistical significance was found regarding the impact of the tweets on the stock price on the day of the tweet. Examining the impact of positive tweets on the different stocks’ volatility, the 17
volatility only increased in 36% of the observations. With a Pvalue of 0.08, no statistical significance was found. 4.2 Negative sentiment tweets Figure 4.3: The amount of observations, in percent, which had a positive or negative change in price on the day of the tweet. This result only includes negative sentiment tweets. Figure 4.4: The amount of observations, in percent, which had a increase or decrease in volatility on the day of the tweet, compared to the volatility of the stock the nine prior days. This result only includes negative sentiment tweets. Approximately 38% of negative tweets resulted in a negative impact on their respective stocks price on the day of the tweet. No statistical significance was found between the tweet sentiment and its impact on the stock market. The volatility of the respective stocks increased in approximately 62% of the observations, in compression to the volatility of the stocks nine days prior. 18
4.3 Overall volatility for both sentiment tweets Figure 4.5: The amount of observations, in percent, which had a increase or decrease in volatility on the day of the tweet, compared to the volatility of the stock the nine prior days.. The result includes both negative and positive tweets. For all observations, the volatility of the observed stocks decreased in around 45% of the cases. With a Pvalue of 0.326, no statistical significance was found, for the tweet’s impact on the respective stock’s volatility. 19
5 Discussion 5.1 RQ1 When evaluating the impact of the tweets on price, no statistically significant correlation was found for either positive or negative sentiment tweets, as shown in Figures 4.1 and 4.3. These results may depend on many different parameters, such as investors not acting particularly fast to new tweets from accounts they are following or investors using these Twitter accounts as inspiration for their own research rather than direct investment recommendations. In M.Corte’s study on the impact of Tesla CEO Elon Musk’s Twitter usage he strongly suggests, although without showing statistical significance, that his tweets do in fact move the price of Tesla stock, section 2.6.2 [5]. This is most likely due to the fact that Musk is the acting CEO of the company and is, therefore, the most knowledgeable person when it comes to Tesla and its business. The Twitter accounts used in this study are not, as far as their profiles say, in an active role in the companies they discuss, basing their knowledge on information already available to the general public. Another key factor is the size of the accounts, with Musk’s following being in the tens of millions while the Swedish Twitter accounts usually have below 20 thousand. Furthermore, even though this study focuses on small companies that are more reactive to market or trade volume changes, the trade volumes needed to significantly move the price seem larger than the volumes possibly generated by a tweet. As described in the theoretical background, section 2.4.2, today’s markets are dominated by algorithmic trading which might counteract any potential larger change in price for these companies. Since these algorithms are often proprietary ”black boxes”, it is difficult for noninsiders to determine how and where these algorithms are working. When receiving our market data set from Nordic Growth Market, our contact said he was positive one would be able to find a correlation between Twitter and its impact on the companies their market houses. Since this study was conducted in a general manner, with no preconceived notion of which accounts or stocks might have a greater chance of generating a favorable outcome, interesting further research might include investigating specific companies highlighted by NGM themselves. 20
5.2 RQ2 The results closest to generating a statistically significant result was the change in volatility for positive sentiment tweets, as seen in figure 4.2. However, the results were the opposite of the expected outcome, with lowered volatility on the day of the tweets. This might be due to some inherent limitations for the generally accepted way in which volatility is calculated. The volatility formula used in the study uses the highest and lowest price traded on any given day. This allows for detecting large changes within the day of trading, instead of relying solely on the closing price. However, what follows is for example that a day in which a stock steadily increases 5% from its opening price without going negative during the day would generate lower volatility than a day where the open and close prices are the same but the price at some point oscillates between negative 3% and positive 3%. Taking this into account one reason for the results seen in figure 4.2 might be that positive sentiment tweets reduce periods of negative price movements and therefore reducing the total measured volatility of the stock. Another possible reason could be that a tweet is a reaction to the previous days’ price changes in a certain stock. A tweet could for example be posted in reaction to a certain stock’s recent decrease in price, implying the company is now undervalued. A common theme in the previous research papers used in this study is the difficulty of reaching a statistically significant result. The modern financial markets are enormously complex with many moving parts and participants. Determining how the market will move and why it moves with absolute certainty is in most cases impossible. However, certain trends and suggested correlations can be found, which are often used as a basis for many investment strategies. With this said, even though no statistical significance was found in this case, completely discarding the reduced volatility result in figure 4.2 is unnecessary. 5.3 RQ3 Comparing the changes in the price of the positive sentiment tweets to the results of the negative, no significant difference was found. However, the difference in volatility between positive and negative tweets, although not statistically 21
significant, is noticeable. As studies have suggested a bias in credibility towards positive information compared to negative information [3], these results might suggest that investors are more likely to act on a tweet containing a positive sentiment. Therefore reducing negative swings in pricing, as mentioned in the paragraph above. Bollen et al. concluded in the research paper titled ”Twitter mood predicts the stock market” [2], an 86,7 % accuracy when taking public mood into account from tweets when predicting the value of DJIA. Bollen et al., on the other hand, used a data set of tweets that accounted for approximately 10 million tweets from 2.7 million users, which could from a quantitative point of view be more accurate than this work’s data set. The DJIA value is a stock index that reflects how well the thirty largest companies in the United States perform on the stock market [8]. 5.4 Limitations This research project did not intend to forecast stock prices and fluctuation. Nor does this study evaluate the overtime performance of stocks mentioned by individuals on Twitter. It is rather a study that investigates human behavior and the impact of the evergrowing ubiquity of social media on retail investor’s immediate activity on the stock market. 5.5 Future Work This study has suggested that larger Swedish Twitter accounts, in general, do not move the prices nor the volatility of certain stocks. However, since other studies have implied that larger Twitter accounts of people such as Elon Musk can impact the movements of certain securities interesting future work might include determining the size of following needed to impact the stock prices of companies mentioned in tweets. Furthermore, the study only focuses on the immediate impact of the tweets. Many of the accounts we used in this study tweeted about the same company multiple times over a longer period of time. Therefore future work might include comparing the performance of stocks popular at Twitter compared to companies receiving less or no exposure from on Twitter. Evaluating whether or not long periods of positive reinforcement on Twitter lead to greater performance 22
in smallcap corporations. 23
6 Conclusion The results suggest that there is no correlation between the movement of small cap stocks and tweets from larger finance Twitter accounts mentioning the companies. Although no statistical significance was found, it could be noted that positive tweets may have an immediate effect on reduced volatility in the mentioned stock. Furthermore, the type of sentiment, whether positive or negative, seem to have little effect on the impact of the tweets on the price of a stock. In conclusion, this study was unable to detect any direct impact on the stocks mentioned by the Swedish finance Twitter accounts used in the study. However, due to the general nature of this study with regards to the selection of Twitter accounts and companies, further studies need to be conducted to determine individual Twitter accounts’ possibility to impact the market. 24
References [1] Barone, Adam. “Small Cap”. In: Investopedia (2020). URL: https://www. investopedia.com/terms/s/small-cap.asp. [2] Bollen, Johan, Mao, Huina, and Zeng, Xiaojun. “Twitter mood predicts the stock market”. In: Journal of computational science 2.1 (2011), pp. 1–8. [3] Callison, Coy. “Do PR practitioners have a PR problem?: The effect of associating a source with public relations and clientnegative news on audience perception of credibility”. In: Journal of Public Relations Research 13.3 (2001), pp. 219–234. [4] Chen, James. “Volatility”. In: Investopedia (2021). URL: https : / / www . investopedia.com/terms/v/volatility.asp. [5] Corte, Miguel Alexandre Barbeira. “How social media usage by managers affects corporate value: the case of Elon Musk”. PhD thesis. 2020. [6] Duncombe, Constance. “The politics of Twitter: emotions and the power of social media”. In: International Political Sociology 13.4 (2019), pp. 409– 429. [7] Fitzgerald, Maggie. “Roaring Kitty’ Keith Gill defends GameStop posts, says he is as bullish as ever on the stock ”. In: CNBC (2021). [8] Ganti, Akhilesh. “Dow Jones Industrial Average (DJIA)”. In: Investopedia (2021). URL: https://www.investopedia.com/terms/d/djia.asp. [9] Hayes, Adam. “Retail Investor”. In: Investopedia (2021). URL: https:// www.investopedia.com/terms/r/retailinvestor.asp. [10] Hendershott, Terrence, Riordan, Ryan, et al. “Algorithmic trading and information”. In: Manuscript, University of California, Berkeley (2009). [11] Kordonis, John, Symeonidis, Symeon, and Arampatzis, Avi. “Stock price forecasting via sentiment analysis on twitter”. In: Proceedings of the 20th PanHellenic Conference on Informatics. 2016, pp. 1–6. [12] Le Hégaret Lauren W, Jonathan R. “What is the Document Object Model?” In: W3c (2000). 25
[13] Malkiel, Burton G. “Efficient market hypothesis”. In: Finance. Springer, 1989, pp. 127–134. [14] Malkiel, Burton G. “The efficient market hypothesis and its critics”. In: Journal of economic perspectives 17.1 (2003), pp. 59–82. [15] Ossinger, Joanna. “GameStop’s Volatile Rally Smashes Wall Street Price Targets”. In: Bloomberg (2021). [16] PineiroChousa, Juan, VizcainoGonzalez, Marcos, and PerezPico, Ada Maria. “Influence of social media over the stock market”. In: Psychology & Marketing 34.1 (2017), pp. 101–108. [17] Rish, Irina et al. “An empirical study of the naive Bayes classifier”. In: IJCAI 2001 workshop on empirical methods in artificial intelligence. Vol. 3. 22. 2001, pp. 41–46. [18] Shead, Sam. “Elon Musk’s tweets are moving markets — and some investors are worried”. In: CNBC (2021). [19] Shiller, Robert J. “From efficient markets theory to behavioral finance”. In: Journal of economic perspectives 17.1 (2003), pp. 83–104. [20] Sun, Shiliang, Luo, Chen, and Chen, Junyu. “A review of natural language processing techniques for opinion mining systems”. In: Information fusion 36 (2017), pp. 10–25. [21] Tankovska. “Leading countries based on number of Twitter users as of January 2021)”. In: Statista (2021). URL: https://www.statista.com/ statistics/242606/number-of-active-twitter-users-in-selected- countries/. [22] Tankovska. “Number of monthly active Twitter users worldwide from 1st quarter 2010 to 1st quarter 2019”. In: Statista (2021). URL: https : / / www.statista.com/statistics/282087/number- of- monthly- active- twitter-users/. [23] Twin, Alexandra. “Volume of Trade”. In: Investopedia (2021). URL: https: //www.investopedia.com/terms/v/volumeoftrade.asp. [24] “Twitter API”. In: Twitter (2021). URL: https : / / developer . twitter . com/en/docs/twitter-api. 26
[25] Webster, Jonathan J and Kit, Chunyu. “Tokenization as the initial phase in NLP”. In: COLING 1992 Volume 4: The 15th International Conference on Computational Linguistics. 1992. 27
TRITA-EECS-EX-2021:443 www.kth.se
You can also read