CORONAVIRUS-RELATED SENTIMENT AND STOCK PRICES - DIVA

Page created by Dale Ramsey
 
CONTINUE READING
CORONAVIRUS-RELATED SENTIMENT AND STOCK PRICES - DIVA
DEGREE PROJECT IN FINANCE
 PROGRAM: REAL ESTATE AND FINANCE
 FIRST CYCLE, 15 CREDITS
 STOCKHOLM, SWEDEN 2020

Coronavirus-Related Sentiment and Stock Prices
 Measuring Sentiment Effects on Swedish Stock Indices

 Olga Piksina
 Patricia Vernholmen

 KTH

 INSTITUTIONEN FÖR FASTIGHETER OCH BYGGANDE
 1
CORONAVIRUS-RELATED SENTIMENT AND STOCK PRICES - DIVA
Bachelor of Science Thesis

 Title: Coronavirus-Related Sentiment and Stock Prices: Measuring
 Sentiment Effects on Swedish Stock Indices

 Authors: Olga Piksina, Patricia Vernholmen

 Institution: Institution of Real Estate and Construction Management

 Bachelor Thesis number: TRITA-ABE-MBT-20482

 Archive number:

 Supervisor: Andreas Fili

 Keywords: market sentiment, behavioural finance, market efficiency, coronavirus, Swedish
 stock market, text analytics, sentiment analysis, news mining

Abstract
This thesis examines the effect of coronavirus-related sentiment on Swedish stock market returns
during the coronavirus pandemic. We study returns on the large cap and small cap price indices
OMXSLCPI and OMXSSCPI during the period January 2, 2020 – April 30, 2020. Coronavirus sentiment
proxies are constructed from news articles clustered into topics using latent Dirichlet allocation and
scored through sentiment analysis. The impact of the sentiment proxies on the stock indices is then
measured using a dynamic multiple regression model. The results show that the proxies representing
fundamental changes in our model — Swedish Politics and Economic Policy — have a strongly significant
impact on the returns of both indices, which is consistent with financial theory. We also find that
sentiment proxies Sport and Coronavirus Spread are statistically significant and impact Swedish stock prices.
This implies that coronavirus-related news influenced market sentiment in Sweden during the research
period and could be exploited to uncover arbitrage. Finally, the amount of sentiment-inducing news
published daily is shown to have an impact on stock price volatility.

 2
CORONAVIRUS-RELATED SENTIMENT AND STOCK PRICES - DIVA
Examensarbete kandidatnivå

 Titel: Coronavirus-relaterat sentiment och aktiepriser: En
 studie av sentimenteffekter på svenska aktieindex

 Författare: Olga Piksina, Patricia Vernholmen

 Institution: Fastigheter och Byggande

 Examensarbete kandidatnivå nummer: TRITA-ABE-MBT-20482

 Arkiv nummer:

 Handledare: Andreas Fili

 Nyckelord: marknadssentiment, beteendefinans, marknadseffektivitet,
 coronaviruset, svensk aktiemarknad, textanalys,
 sentimentanalys, news mining

Sammanfattning
Denna studie undersöker den effekt coronavirus-relaterat sentiment haft på avkastningen på svenska
aktieindex under coronaviruspandemin. Vi studerar avkastningen på large cap- och small cap-prisindexen
OMXSLCPI och OMXSSCPI under perioden 2 januari 2020 – 30 april 2020. Proxier för coronavirus-
sentiment konstrueras från nyhetsartiklar som klustrats i ämnen genom latent Dirichlet-allokering och
poängsatts genom sentimentanalys. Sentimentproxiernas påverkan på aktieindexen mäts sedan med en
dynamisk multipel regressionsmodell. Resultaten visar att proxierna som representerar fundamentala
förändringar i vår modell — svensk politik och ekonomisk policy — har en starkt signifikant inverkan på
avkastningen på båda indexen, vilket är konsekvent med finansiell teori. Vi finner även att
sentimentproxierna sport och spridning av coronaviruset är statistiskt signifikanta i sin påverkan på svenska
aktiepriser. Detta innebär att coronavirus-relaterade nyheter påverkade marknadssentiment i Sverige
under undersökningsperioden och skulle kunna användas för att upptäcka arbitrage. Slutligen visas
mängden sentimentframkallande nyheter publicerade per dag ha en inverkan på aktieprisvolatilitet.

 3
CORONAVIRUS-RELATED SENTIMENT AND STOCK PRICES - DIVA
Acknowledgements
We would like to extend our genuine gratitude to our supervisor, Dr. Andreas Fili, whose guidance
and support made this work possible. We wish to express our sincere thanks to Dr. Bertram Steininger
for sharing meaningful insights into text analytics and recent research in the field. We would also like
to thank Dr. Olga Rud for her recommendations on the material used for this study. Our special
thanks goes to Stephen Rosewarne who kindly agreed to proofread this thesis. Furthermore, we are
thankful to our family members and friends Christoffer Linné, Marina and Laura Vernholmen, Manne
Svensson, Helene Törnqvist and Leo P. Thank you all for your unwavering support and inspiration.

 4
CORONAVIRUS-RELATED SENTIMENT AND STOCK PRICES - DIVA
Table of Contents

1. INTRODUCTION 8
1.1 Research Purpose and Questions 8
1.2 Contribution to the Field 10
1.3 Disposition 10

2. REVIEW OF THE LITERATURE 10
2.1 Fundamental Analysis vs. Technical Analysis 10
2.2 Efficient Market Hypothesis 11
2.3 Behavioural Finance 12
2.4 Event Studies and EMH 16
2.5 Text Analytics in Finance 16
 2.5.1. Sentiment Analysis through Computational Linguistics 17

3. METHOD AND MATERIALS 18
3.1 Description 18
3.2 Limitations 19
3.3 Stock Indices 20
3.4 News 20
 3.4.1 Collection 20
 3.4.2 Preprocessing Textual Data 21
 3.4.3 Topic Modelling and Scoring 21
 3.4.4 Sentiment Proxies 23
 3.4.5 Sentiment Analysis 26
 3.4.6 Allocation of News to Dates 27
3.5 Economic Indicators 27
3.6 Multiple Regression 27
 3.6.1 Model Specification 27

4. RESULTS 28
4.1 Autocorrelation Analysis 28
4.2 Cross-Correlation Matrices 29
4.3 Specified Regression Model 30
 4.3.1 Variance Inflation Factors 30

 5
CORONAVIRUS-RELATED SENTIMENT AND STOCK PRICES - DIVA
4.4 Regression Outputs 30
4.5 Market Volatility and Coronavirus-Related News 32

5. ANALYSIS 32
5.1 Sustainability Aspects 34
5.2 Further Research 35

6. CONCLUSION 35

REFERENCES 37

 6
Terminology list

OMXSLCPI Price index of all Large Cap companies listed on Stockholm Stock
 Exchange (Market value of 1 billion euro or more. Nasdaq, 2020).
OMXSSCPI Price index of all Small Cap companies listed on Stockholm Stock
 Exchange (Market value below 150 million euro. Nasdaq, 2020).
Market Sentiment “[...] a belief about future cash flows and investment risks that is not
 justified by the facts at hand” (Baker and Wurgler, 2007).
 Market value of an asset = fundamental value + sentiment value.

Proxy A proxy is “a variable used instead of the variable of interest when
 that variable of interest cannot be measured directly” (Oxford
 University Press, 2009). Proxies used in this study fall into two mutually
 exclusive categories: 1. Fundamental proxies, representing news which
 can cause fundamental change to asset values, and 2. Sentiment
 proxies, which reflect investor sentiment. Proxies belonging to one
 category have no influence on the other.

Web Scraping Automated gathering of data from the internet through any means
 other than a program interacting with an API (Mitchell, 2018).

Latent Dirichlet Allocation Unsupervised machine learning algorithm used to cluster previously
(LDA) unlabelled text data according to topics (method known as topic
 modelling). It finds the most common words appearing in the text
 and clusters them, thus uncovering themes in text (see Blei, Ng and
 Jordan, 2003).

Text Analytics “[...] large-scale, automated processing of plain text language in
 digital form to extract data that is converted into useful quantitative
 or qualitative information” (Das, 2014).

Corpus Collection of text documents which can be readily processed in an
 automated way.

Tokenisation Part of data preprocessing identifying basic units, known as tokens, in
 text corpora. Some methods tokenise by words or entities delimited
 by blank spaces while others make tokens of more complex entities
 such as idioms or expressions (Webster and Kit, 1992).

 7
1. Introduction
On the 11th of March, 2020, the World Health Organisation (WHO) declared the novel coronavirus
disease 2019 (COVID-19) outbreak a global pandemic. The outbreak originated in Wuhan, China in
December 2019 and has since spread throughout the entire world. In addition to a serious health
emergency, the spread of the disease in the majority of the world’s countries has led to a deep
economic crisis predicted by many to become a recession similar to the Great Depression. Stock
markets all over the globe have reacted with varying degrees of panic, and widespread future
uncertainty has resulted in two of the largest single day drops in the Dow Jones Industrial Average.
The Swedish stock market has also experienced days of historic decline (see Figure 1). At the time of
writing, the pandemic is ongoing and there is no clear outlook on how it is going to develop or when
it will end.

The coronavirus outbreak has also been remarkable due to its receiving unprecedented, near total
media coverage across the globe. With a huge part of the world’s population being isolated or confined
to their homes, the current pandemic has become a unique event with news rapidly spreading around
the world. Information on the spread of the coronavirus and measures taken by governments in
response to the crisis have alternated with news of skyrocketing unemployment rates, industries at risk
of collapse, and the total economic impact of the pandemic on the global community.

Our primary hypothesis is that non-economic news has made a significant impact on how investors
on the Swedish stock market have valued assets during the pandemic. We assume movement on the
market is influenced by blanket media coverage of the pandemic as well as various measures the
Swedish government and central bank have introduced in response to the crisis. Our hypothesis has
its basis in the theory of behavioural finance that indicates investors’ mood, fear and emotions impact
the decision making process (Kahneman and Tversky, 1979; Statman, 1995; Shleifer and Vishny, 1997;
Donadelli, Kizys and Riedel, 2017; Bukovina, 2016).

1.1 Research Purpose and Questions
In this paper, our aim is to analyse how coronavirus-related news has impacted the stock market in
Sweden during the period January 2, 2020 – April 30, 2020, when several European countries and the
United States became epicentres of the COVID-19 pandemic.

 8
This study aims to address the following questions:

 1. Did coronavirus-related news in Sweden generate sentiment that could be observed on the
 Swedish stock market?
 2. Which type(s) of news had the greatest impact on stock returns?
 3. Is it possible to use sentiment proxies extracted from coronavirus-related news to make
 profitable investments?

Figure 1: Source: NASDAQ Nordic, http://www.nasdaqomxnordic.com/

* News of the coronavirus outbreak was first reported in the media after 17:00 CET on the evening of 11th March, 2020.
The arrow points to the following day (12th March) because these first news reports could not impact stock prices on the
day of the announcement.

 9
1.2 Contribution to the Field
Sentiment analysis is a relatively new method of studying financial markets. The emerging field of
behavioural finance has provided analysts with numerous studies on how sentiment impacts asset
pricing (Baker and Wurgler, 2007; Kaplanski and Levy, 2008; Statman, 2014). Investor sentiment is
presumably unique for each market, which is why applying known sentiment effects from one market
to another is ill-advised (Lang and Schaefers, 2015). This work contributes to existing research by
performing an empirical study that analyses the effects of coronavirus-related news on Swedish stock
prices, combining text mining techniques with multiple regression methodology. This exploratory
study deals with a unique and extreme sequence of events, and it is unclear whether these findings will
prove useful to future sentiment research. However, we believe our conclusions could be interesting
for future research on Swedish stock market changes related to media coverage of low-probability and
high-consequence events.

1.3 Disposition
The structure of this thesis is as follows: Section 2 reviews relevant literature to present the current
state of research. Section 3 details our research method. Section 4 presents our empirical findings
which are then analysed in Section 5. Finally, Section 6 includes discussion of our results and
conclusions.

2. Review of the Literature
2.1 Fundamental Analysis vs. Technical Analysis
Stock markets have always attracted investors willing to grow their capital. Due to their dynamic and
volatile nature, stock market investments are associated with high levels of risk. Financial analysis has
been used for decades to understand movement of capital markets and to forecast stock price
development. When analysing capital markets to support their decisions in buying, selling or holding
stocks, investors have mainly been using two techniques — fundamental analysis and technical analysis.
Which type of analysis investors choose depends on what they believe about the characteristics of the
market. According to Murphy (1999), both methodologies aim to satisfy the same need, namely
understanding in which direction the market moves. The only difference is that a fundamentalist
would want to know why the market behaves as it does, while a technician would solely analyse market
action itself (Murphy, 1999).

 10
Fundamental analysis aims to identify mispriced assets, thus the main belief of fundamentalists is that
the market prices are often incorrect. The main purpose of this type of analysis is to identify the
intrinsic value of securities and compare it with the actual market price. The intrinsic value is set at
the equilibrium on the market by supply and demand forces (Griffioen, 2003). A fundamental analyst
would look at companies’ financial statements and calculate important multiples and ratios. While this
form of economic analysis is time consuming and tedious, it is not sufficient to provide a complete
picture of why an asset may be mispriced. Other aspects of fundamental analysis include broader
industry analysis, and subjecting individual companies to deeper levels of scrutiny (Griffioen, 2003).
The fundamental analysis is therefore based not only on mathematical calculations and financial
statements, but also on analysts’ knowledge of the market and on their assumptions and beliefs. This
could hold as an explanation of why there are investors on the market willing to buy and sell the same
assets at the same time. Investors presumably reach different interpretations and conclusions when
presented with the same information.

Technical analysis is based on the premise that prices on the market move in trends and that those
trends tend to repeat, leading to market swings. Technicians believe market prices could react to
multiple factors and analysing causes of the market moves would be excessive, because stock prices
get instantaneously corrected reacting to these underlying factors. Essential for the technicians is to
recognise the direction of the market and even more essential doing this before others recognise the
move. Thus technical analysts assume the fall in price of a security could be derived from higher supply
or lower demand on that security which in turn reflects changes in the fundamentals (Murphy, 1999).
Technical analysis is therefore limited to studying charts and graphs with the past market price
(returns) movements to make predictions about the future.

2.2 Efficient Market Hypothesis
While a detailed description of traditional finance evolution lies outside the scope of this study, we
assume mentioning one of the most important assumptions of standard finance is of great importance.
According to traditional finance theories, investors act rationally and always aim to maximise their
profits while minimising their risks (De Bondt, 1995). The efficient market hypothesis (EMH)
introduced and described by Eugene Fama represents one of the pillars of standard finance. The
theory suggests no analysis of publicly available information could be used to outperform the market
since stock prices adjust to new information instantaneously and unbiasedly, making the idea of
‘beating the market’ a utopia. The theory implies financial analysts are worthless, and the best
investment strategy would simply be following market indices. To gain abnormal returns investors

 11
would need to have access to information that is not known to the public, insider information (Fama,
1970).

EMH stands out among traditional finance theories because it recognises the existence of irrational
investors. However, Fama concludes that irrational investors’ trading is insufficient in volume to
impact asset prices significantly. He points out that such investors would quickly be corrected by
arbitrageurs forcing security prices to move to their fundamental values. Fama also suggests that not
all market participants have to process the entire body of information available in the market because
even a significant proportion of informed investors would take care of the efficient pricing of market
securities (Fama, 1970).

Whether financial analysis is capable of providing investors with necessary insights for outperforming
the market has been highly debatable. De Bondt (1995) calls financial analysts’ roles paradoxical. He
wonders, on the one hand, how efficient market hypothesis proponents could explain the existence
of well paid financial analysts on the market if their job is worthless. On the other hand, he implies a
competing view of an irrational market has difficulty explaining why professional investors
consistently fail to beat market indices. At the AIMR conference in 1995 De Bondt said: “Despite its
many insights, modern finance offers only a set of asset-pricing theories for which no empirical
support exists and a set of empirical facts for which no theory exists.”

2.3 Behavioural Finance
In standard or traditional finance humans are rational, therefore they aim to maximise their utility
taking as little risk as possible. Pricing assets on capital markets is thus an unemotional and
straightforward process and a correct price of an asset should be equal to the discounted present value
of all future cash flows. For a long period of time, traditional finance theories were used to explain
market movements and develop investment strategies. The reality, however, has been steadily
challenging the standard finance assumption of rational and unemotional market participants. In the
light of efficient markets, events such as price bubbles (e.g. the Swedish real estate bubble of the 1990s,
the dot-com bubble) or stock market crashes (e.g. the Black Monday crash of October 1987) have not
been able to find satisfactory explanations in the field of traditional finance. This has given rise to a
new finance paradigm where irrationality of people, individual biases and cognitive factors hit the
spotlight.

In an attempt to explain market anomalies and improve market development analysis, financial
researchers came to direct their views to other disciplines. Behavioural finance then emerged as an

 12
innovative new discipline combining aspects of sociology, psychology, anthropology and finance
(Ricciardi and Simon, 2000) with the focus of studying the behaviour of people and the transmission
of this behaviour into capital markets. In particular, this newer discipline started challenging the
notions of standard finance such as human rationality and efficiency of the markets, suggesting there
are other factors explaining market anomalies and volatility. As Statman (1995, p.15) put it: “People
in standard finance are rational. People in behavioural finance are normal.”

Early research in the field presented the modern financial paradigm with behavioural investors who
were not consistent in their attitudes towards risk. In accordance with Prospect theory, people are
risk-averse in the domain of gains and risk-seeking in the domain of losses (Kahneman and Tversky,
1979). Simply put, investors feel the loss stronger than a gain of the same amount of money and are
therefore prepared to pay to avoid losses as readily as they pay to generate profits. Prospect theory,
with its empirical evidence, has since become a central concept in behavioural finance.

Another finding of behavioural finance is that investors exhibit bounded rationality, in other words
their decisions are limited by cognitive mistakes, psychological biases and emotions. The existence of
fully rational investors cannot be verified in reality (Shleifer and Vishny, 1997). A constantly growing
number of market participants has also come to influence market movements. Trading is no longer
exclusively accessible to institutional investors. Developing technology and an emerging number of
trading platforms has given rise to non-professional “hobby” traders. They trade more often and act
unpredictably, making markets even more volatile. Their access to professional financial forums is
limited, and they often turn to information sources such as news outlets and social media to support
their decisions on the stock market. These retail investors are also more likely to trade on market
sentiment (Bukovina, 2016; Baker and Wurgler, 2007).

2.3.1 Market Sentiment
In the field of finance, sentiment is often described as the mood or emotions of people which may
influence capital markets. This implies asset-pricing should not be fully associated with fundamental
changes in the economy or individual securities. Kaplanski and Levy (2008) state sentiment is a much
broader concept and could be described as “any misperception leading to asset mispricing”. Mood and
fear can therefore be considered examples of how the potential for irrational behaviour in investors can
result in shifts in market sentiment. Baker and Wurgler (2007) define market sentiment as “investors’ beliefs”
about future returns and risks when these beliefs are not necessarily supported by available information
related to fundamentals. In this paper we use “market sentiment” as a broader interpretation described in
the aforementioned works so that any factor beyond changes in fundamentals is seen as sentiment.

 13
Behavioural finance studies have been aiming at better understanding whether market sentiment has
an impact on investors’ decisions while pricing assets. Statman (2014) concluded the phenomenon
that investors sell securities when the market reaches its bottom could be explained by fear of growing
risk and smaller belief of future returns. He also explains that many investors tend to buy securities
when the market is already overheated because of the excitement that mitigates risk perception and
exaggerates expectations of future returns. These future returns expectations could probably explain
irrational exuberance behind stock market speculative bubbles much better than standard finance
theories could. In a similar way the disposition effect theory is based on a number of misleading
emotions that influence investors’ decisions. Regret of realising losses is described as an emotion
forcing investors to hold on bad stocks for a longer period of time than it is rational. Pride and thrill,
on the other hand, make investors sell winning stocks too early in a hurry to realise their gains (Shefrin
and Statman, 1984). Symeonidis, Daskalakis and Markellos (2010) inferred sunny weather could be
associated with increased volatility of the markets in the US. They conclude good weather could result
in good moods and increased communication among investors on the market driving up trading
volumes and thereby volatility. Events such as disease outbreaks or epidemics are known to increase
the overall anxiety level and pessimism in the society and thus create negative sentiment (Donadelli,
Kizys and Riedel, 2017). Evidence is prolific that financial decisions are driven by sentiment as
behavioural finance studies provide us with results ascertaining the existing relationship between stock
returns and market sentiment.

Sentiment studies of recent years often do not question whether sentiment has an impact on stock
prices. Researchers have seemingly accepted that it does. A new direction of research does not
simply study sentiment impact on the stock market but also points out that different stocks are
disproportionately reactive to market sentiment. Baker and Wurgler (2007) ascertained that stocks of
smaller, younger, high volatility and growth companies are more prone to market sentiment than large
cap and mature stocks. They explain this disproportional sensitivity depends on two factors: smaller and
younger stocks are more difficult to arbitrage and value. Baker and Wurgler (2007) pointed out that
during the dot-com bubble, the majority of speculative stocks were small start-ups with no historical
data to lean against, thus valuation mistakes were very probable considering general excitement around
the Internet at that time. Consequently, while studying market sentiment, it is more reasonable to look
at small and large cap stocks separately than analyse dependency of aggregate stock prices on sentiment.

The presence of market sentiment raises a question of how it could be extracted, measured and
analysed. Studying news and social media content to extract market sentiment has become common
in recent behavioural research. Many attempts have been made to use qualitative textual content for

 14
quantitative analysis and predictions (Bukovina, 2016). Researchers have been steadily studying how
financial news could be processed and categorised with the help of different computer based
techniques (Lee et al., 2014). The amount of data involved is enormous and its effective and quick
analysis by humans is no longer feasible. Traders see an advantage in developing algorithms for textual
data analysis and in creating stock price predictive models and automated trading systems (Atkins,
Niranjan and Gerding, 2018). Evidence suggests such models can improve stock prices predictions
made by traditional financial analysts.

Studies on how non-financial news influences market sentiment and stock markets are numerous but
plagued by a persisting difficulty to extract and distill news relevant to the analysis. Also, these studies
have most often analysed social media driven market sentiment and been carried out on the US market.
It is therefore questionable if the results could be replicated on other markets. Moreover, the research
results on sentiment impact on stock markets are contradictory (Lang and Schaefers, 2015) and prove
that market sentiment can change over time. It is also obvious that sentiment differs depending on
event, market and even culture. Kaplanski and Levy (2014) showed through the example of the
football sentiment from 2010 FIFA World Cup study how sophisticated investors adjusted their
trading strategies and weakened the football sentiment effect on the US stock market at the last stage
of the tournament. A real challenge, however, has been in creating a model that is reliable in
establishing a connection between news and evolving stock prices, because adjustments on the
fundamental changes in the economy have to be made. The authors suggested that when studying
market sentiment, one should analyse economic news in terms of its potential for both positive and
negative impact, and include the economic news as a non-sentiment fundamental variable that impacts
stock prices in the predictive model.

Another difficulty for sentiment analysis is to distinguish what market sentiment at different periods
of time consists of, and which component of this sentiment plays the most significant role. Baker and
Wurgler (2007) propose measuring market sentiment by breaking it down into sentiment proxies.
According to them, extracting potential sentiment proxies can be useful for future models measuring
sentiment impact on stock markets, even though these proxies are imperfect and noisy. They suggest
combining different imperfect measures such as surveys on investors’ beliefs, option implied volatility
measures, trading volume, retail investor trades and mood proxies. Baker and Wurgler (2007) have
constructed a sentiment index level that includes several of these proxies to smooth out idiosyncrasies.
Studying existing research on sentiment measures we arrive at a conclusion that there are not many
established methods at the moment, and that researchers are actively exploring new methods by
merging existing models and constructing their own ones.

 15
2.4 Event Studies and EMH
One of the most frequently used methods to determine whether market sentiment related to a certain
event gets effectively incorporated into stock prices is the event study method. Event studies are most
suitable for analysing the sentiment effect of a single event or a series of rare events on a stock’s price
or a sector’s returns. One must define the event and estimation windows and subsequently calculate
the stock’s abnormal return. The abnormal return is the stock’s return over the event window minus
the expected return of the stock over the event window (MacKinley, 1997). There are several methods
to calculate abnormal returns to carry out an event study. MacKinley (1997) suggests models for
calculating expected returns with a constant mean return, or market models. Measuring and analysing
abnormal returns for the stock provides a researcher with insights on whether the market has
efficiently incorporated this event into the stock’s price. Discussion of an event study often deals with
implications of EMH and the market’s capacity to timely and rationally assess relevant information.
Some events are more commonplace than others and are generally easier to analyse. It is not surprising
that event studies often arrive at different conclusions in regard to market efficiency.

2.5 Text Analytics in Finance
The importance of analysing text springs from its “nuances and behavioural expression which is not
possible to convey using numbers” (Das, 2014, p.4). Text analysis is used to convert textual data to
quantitative or qualitative information. It includes everything from simple methods for summarising
and visualising large bodies of text in order to make it easier to comprehend to complex methods of
quantifying vast amounts of unstructured text data. Depending on the goal, there are many different
tools which can be employed. Text analytics in finance has primarily focused on measuring effects on
stock prices and indices as well as analysing corporate reports (Cohen, Malloy and Nguyen, 2020). It
can be performed by either the use of dictionaries, lexicons or machine learning (Das, 2014). The
choice of method depends on the type of inputs and desired outputs, for example, in machine learning,
a regression model is used to predict continuous output variables whereas classification models are
used for discrete output variables. Also, supervised models are used for classification with prespecified
outputs whereas unsupervised ones are used for clustering inputs in previously unspecified ways
(Mitchell, 1997; Das, 2014; Cohen, Malloy and Nguyen, 2020).

Several researchers have underlined the usefulness of unsupervised machine learning models on
financial data, in quantifying and visualising financial stability tendencies (Li, et al., 2017), modelling

 16
the structure of the stock market (Doyle and Elkan, 2009) and specifically on financial text data for
which no a priori categorisation of their content exists (Feuerriegel and Pröllochs, 2018). One example
of an unsupervised machine learning model for clustering is topic modelling using Latent Dirichlet
Allocation (LDA). It is a generative probabilistic model for collections of discrete data such as text
corpora which finds the words that occur the most throughout the corpus, clustering them into topics
and calculating the probabilities of each document belonging to each topic respectively (Blei, Ng and
Jordan, 2003). Feuerriegel and Pröllochs (2018) measured the impact of topics within corporate filings
on the stock market to identify topics which are of relevance to investors, motivating the use of LDA
by the fact that previous studies on the effect of specific disclosure topics on the market (Tetlock,
2007; Vuolteenaho, 2002 and Chan, 2003 cited in Feuerriegel and Pröllochs, 2018, p. 3) had evaluated
the effect of one topic at a time and ignored disclosures not belonging to any of the given topics. They
stated the advantages of employing LDA as avoiding subjective bias due to manual topic extraction,
greater flexibility with topic selection accordingly with the text corpus and the ability to process vast
amounts of text, which would be “prohibitively difficult and costly with manual labelling” (p.4).

2.5.1 Sentiment Analysis through Computational Linguistics
Another form of text analysis, which has recently been highly recognised for its usefulness among
academics and business people, is automated sentiment analysis. Part of mathematical language theory
known as quantitative linguistics, it is a way of extracting subjective expressions from unstructured
text and classifying them according to their sentiment (Alessia, et al., 2015). Many programming
languages support this type of analysis, i.e. R with the packages dplyr, tidyr (Wickham and Henry,
2020), textdata (Hvitfeldt and Silge, 2020) and tidytext (Robinson and Silge, 2020) to mention a few.
The steps in sentiment analysis, as described by Alessia, et al. (2015), are data collection, text
preparation, sentiment detection, sentiment classification and presentation of output, and can be done
using either a lexicon-based or machine learning-based approach or a hybrid of the same. Lexicons
are dictionaries with words which are each given a sentiment orientation or score. The lexicon is joined
with preprocessed, so called “tidy” data (Wickham and Henry, 2020), to determine the affective
content of the text and its polarity (Devitt and Ahmad, 2007). Sentiment analysis processes within
computational linguistics have previously been widely used in financial contexts, with a few examples
including forecasting stock prices (Tetlock, 2007; Day and Lee, 2016) and analysis of the market’s
response to sentiment in financial press releases (Federal Reserve Bank of St. Louis, 2006).

 17
3. Method and Materials
3.1 Description
Our findings were expected to explain whether or not investors’ perceptions of news on the disease,
mortality and the sequence of events associated with the coronavirus influenced the behaviour of the
Swedish stock market. Keeping in mind that small and large cap stocks can move differently and
exhibit different levels of volatility, we considered them separately by analysing both OMXSSCPI and
OMXSLCPI indices. Our model was intended to shed light upon the types of coronavirus-related
news that had the biggest impact on investor sentiment and explore the possibility of using those news
categories to adjust future trading strategies. We aimed to examine whether uncommon events may
bring with them new arbitrage opportunities.

The impact of coronavirus-related sentiment on the Swedish stock indices OMXSSCPI and
OMXSLCPI was measured using a multiple regression model. The independent variables were created
through a three-step process: first, news articles related to the coronavirus were gathered as a source
of text containing fundamentals and sentiment. Second, relevant news topics were chosen
automatically from the text data by the degree of importance in the total body of extracted news
through the use of latent Dirichlet allocation. The topics were weighted for each news article in
accordance to probability measures, giving articles topic scores. Third, each news article was assigned
a sentiment score using sentiment analysis to account for negative or positive events in each topic.
The topic score was then combined with the sentiment score to provide an overall article-specific
proxy score. Then, proxy scores per article were merged to the same date to create single daily
observations per business day. Finally, daily returns on OMXSSCPI and OMXSLCPI were regressed
on the sentiment proxy variables.

A possible source of uncertainty in the models was that the news sentiment variables might show a
certain degree of multicollinearity, i.e. news about deaths might affect the probability of upcoming
news about restrictions due to containment of the virus spread. This was assessed through the use of
the variance inflation factor (VIF) which shows the degree of correlation between independent
variables (Fox and Weisberg, 2018). Statistical software calculated a VIF for each independent variable.

A commonly used method for examining the impact of news on returns is the event study method
previously described in the literature review. However, this method was not employed in the study of
the impact of coronavirus-related news due to the daily media coverage. The extensive supply of news
articles made it nearly impossible to isolate individual events from each other. Due to overlap, an event

 18
window could not be successfully constructed and yield meaningful results. Also, an event study
utilises excess returns on the observed stock or index compared to another index. This was not feasible
to assess due to the coronavirus being a global series of events, meaning there could be no comparison
index left unaffected.

3.2 Limitations
The actualisation of this study in the midst of the ongoing pandemic posed as a limitation as well as
an advantage. Due to the crisis only having begun, only the beginning effect and strong panic were
shown in the news without any crisis reversal which would be needed in order to fully examine the
situation from a time-series perspective. In analysing the text data, a significant amount of noise was
present due to the data being collected in real time. However, performing this research during the
pandemic is also what makes it so valuable. At the beginning, one can make predictions and get results,
then compare them to see what has changed and work on iteratively improving the accuracy of the
model. This also demonstrates how models can be designed in the best possible way to account for
the fact that they are analysing a continuously developing situation.

The lack of prior research on the topic is clear. The idea of analysing the impact of market sentiment
on stock prices is not new and numerous studies have been introduced for several decades, using both
theoretical-manual and automated approaches such as the aforementioned event study and text
analysis methodologies. However, to the best of our knowledge, there are very few studies, if any,
analysing the effects of news-derived sentiment on the Swedish stock market in the context of
uncommon events. The coronavirus pandemic is a unique event for the entire world, and particularly
challenging for Sweden, because an analogy between the disease spread and other tragic events like
natural disasters, disease outbreaks or wars cannot be drawn. Sweden has not experienced any
calamities of this nature in modern times. In our study, we thus rely on international research with a
prevalence of evidence from the United States. We recognise that assumptions about market
inefficiency, weather influencing stock market prices or behaviour of influential figures might not have
the same effect in Sweden as it would in other countries.

We were also limited by the shortage of available data in performing the text analysis. Analysing news
articles published in Swedish, we encountered not only a lack of notable Swedish dictionaries for
sentiment analysis, but also of an epidemic-related lexicon in particular. Creating such a lexicon would
require a study on its own and was outside of our scope. In light of this limitation, we translated the
news articles from Swedish to English using Microsoft Translator. We acknowledge this might have

 19
contributed to uncertain results in topic modelling, however manual random sample controls yielded
satisfactory results with the meaning of the articles intact.

As mentioned in the description of our methodology above, we recognise that stock price patterns are
influenced by real changes in the economy and not primarily by sentiment. A reliable economic indicator
index consisting of main economic factors, such as employment rate, GDP growth, consumption of
durables and non-durables, service consumption and production index are essential for building a robust
statistical model. At this moment, such an index is not present in Sweden and is not feasible to construct
from the data available due to a difference in time-series frequencies for the different indicators. For our
work, we would have needed daily indicator data, however the Swedish official statistics agency publishes
these figures on monthly and quarterly bases. We recognise using economic and political news articles
published in a daily newspaper to model real changes in the Swedish economy is not ideal and
acknowledge there is room to address this particular issue in a better way.

3.3 Stock Indices
Historical data for the Stockholm Small Cap and Large Cap price indices (OMXSSCPI and
OMXSLCPI) are collected from the Nasdaq Nordic website (The Nasdaq Group, 2020) for the period
of January 2nd, 2020, to April 30th, 2020.

3.4 News
The data source for collection of the news data was chosen with considerations regarding the target
group; investors in the Swedish stock market. Considering the features of a reliable source and
readability and preference of the target group, Dagens Nyheter, a renown Swedish news site, was
chosen. The site has a subsection aggregating news articles covering the coronavirus, containing both
economic and non-economic news (Dagens Nyheter, 2020).

3.4.1 Collection
News articles on websites are unstructured strings of text. In order to use the data for text analysis,
the website was scraped using the Google Chrome extension tool “Web Scraper” (Web Scraper, 2020).
Using this tool, the text was extracted from the news website through the site’s CSS selectors (see the
selector graph in figure 2) and rendered as structured text in a .csv file. The start URL was
https://www.dn.se/om/det-nya-coronaviruset/. Pagination was used to navigate to each page of the
website to extract all news to one file.

 20
Figure 2: Selector graph over the coronavirus news sitemap. TitleLink denotes the link to each news article on the start
page and Text the content of the article.

3.4.2 Preprocessing Textual Data
Once the data was collected, it underwent preprocessing in order to be readily usable by quantitative
linguistics programs. The text rendered from web scraping was as presented in the CSS-selectors;
containing punctuation, numbers and both upper and lower casing. The text was stripped of all of
these. Following that, the text was tokenised using a whitespace tokeniser, rendering each part of text
separated with blank spaces as a token and later stemmed using a Swedish Snowball stemmer,
connecting all tokens with the same word stem into the same token. Stop words, extremely common
words of little value to the text (Manning, Raghavan and Schütze, 2008), were omitted. The
preprocessing method differed for the topic modelling and sentiment analysis and will be described
more in-depth in the following sub-chapters for each method.

3.4.3 Topic Modelling and Scoring
As scraped news data is not classified a priori, LDA was employed in order to find the most commonly
appearing words in the text corpus. The LDA was performed using the KNIME Analytics Platform,
an open source data science software (Berthold, et al., 2020). The platform allows the user to perform
data analysis and build, test and deploy models using built in nodes. In order to be able to perform
text analysis, the data, originally in strings, was transformed to a corpus with documents using the
Strings to Documents node. Then, Swedish stop words were removed through the creation of a
custom dictionary filter using a stop word list from a Github repository (Dahlgren, 2019). Further,
punctuation was erased, the documents were converted to lowercase and stemmed using a Swedish
Snowball stemmer. The use of code and text from Github repositories brings with it risks and potential
for errors due to the fact the authors are not renowned and trusted sources. To minimise errors, the
preprocessed text was viewed and examined after the completion of each preprocessing step. The

 21
design choice between using Swedish text translated to English to be able to use built-in stop word
removers, or using the original text and with a sourced stop word list, was made in favour of using the
original text due to concerns regarding how important meaning could be lost in translation and topics
modelled incorrectly as a consequence.

The Topic Extractor (Parallel LDA) node was configured to extract ten topics with 20 terms each
after empirical testing of different combinations of topics and terms. The extracted terms are visualised
in Figure 3.

Figure 3: Word cloud of the most common words appearing in Swedish Coronavirus-related news articles, sized according
to relative weight in the corpus.

Feature extraction and topic assignment is a crucial step in determining which variables to extract as
sentiment proxies from the coronavirus-related news. The words that appear in the news most often
are likely to give a reasonable understanding of what the topic in question deals with.

We understand that one news article might contain information belonging to several topics. Therefore,
we, in contrast to previous research (Feuerriegel and Pröllochs, 2018) did not assign each document
to one topic, but rather to multiple topics. We assigned each article to the topics for which its
probability of belonging exceeded 0.25, yielding a topic score for each article of 1 for belonging topics
and 0 for non-belonging topics. The aforementioned researchers used topic modelling to extract topics
from financial texts that could be relevant for investors. We extracted topics from non-financial texts
that theoretically should not be relevant for investors, but we hypothesised that they are. A trade-off

 22
was made here between allocating individual articles to a single topic, thus missing out on a lot of data,
or introducing noise into the model by allocating articles to several topics each. We deemed it more
important to make our data as exhaustive as possible, and therefore chose to allocate articles to several
topics.

3.4.4 Sentiment Proxies
Coronavirus-related news is not an investor survey or mood proxy, because it does not directly reflect
what investors think or feel. We should not forget that news articles are written by journalists, and not
investors themselves. We aim to model the way the information contained within the articles can
influence investors. We find it reasonable to assume that coronavirus-related news is sentiment driving
information which in turn can be deconstructed and analysed. Thus, the modelled topics are assumed
to be potential proxies for coronavirus-related sentiment caused by the news. Further we detail the
topics and motivate our choice of proxies that can be useful for measuring disease-related news
sentiment.

Swedish Restrictions. The Swedish strategy in response to the coronavirus pandemic has been very
different to that of many other countries in the world. Understandably, it has drawn a lot of attention
from international media as a consequence and remains widely discussed in Sweden, polarising society.
Some people in Sweden had supported the relatively passive line the Public Health Agency of Sweden
(FoHM) and their government had chosen, while others were very critical and urged politicians to
implement a more stringent regime of heavy social restrictions in line with those introduced in other
countries. Although Sweden had at no stage closed down its economic activity or enacted a curfew
for citizens, some considerable restrictions were implemented during March and April, 2020. For
instance, people could not meet in groups of more than 50 people at a time. Travelling from one
region to another within Sweden was not recommended and deemed inappropriate unless work-
related or for serious personal reasons. Swedish institutions responsible for handling the crisis have
held press conferences daily to inform citizens on regulations and recommendations in conjunction
with the emerging crisis. The keywords with highest weights in this topic had roots like Swed-, close,
school, Stockholm, Public Health Agency (one word in Swedish), institution, travel, pupil, spread,
follow. This proxy was named REST.

Swedish Politics. This topic covered how politicians in Sweden had tackled the crisis during the first
months of the coronavirus outbreak. The topic was different from the previous one in that it dealt
with the political actions of Swedish leaders regarding economic, social and political concerns, and not
the actions of the Public Health Agency in addressing the public health emergency. When the

 23
coronavirus pandemic began, Sweden had a social democratic minority government. Making prompt
decisions and implementing active measures in response to the rapidly evolving situation would not
have been possible without oppositional support, which is possibly why politics became a central
topic. The keywords with highest weights in this topic had roots like Swed-, crisis, government,
country, need, coronavirus, leading, politics, measure, economic, state. This proxy was named POLIT.

Economic Policy. Many companies suffered rapid declines in their revenues due to the coronavirus
outbreak. This crisis was completely unpredictable and in no way caused by the businesses themselves,
which is why nobody was prepared to handle it. The Swedish government had to act decisively and
quickly to avoid widespread bankruptcy. Economic support packages were introduced, one after
another, but businesses continued to ask for more help. Unemployment rates increased heavily,
although much of the support was addressing employment issues. The keywords with highest weights
in this topic had roots like company, percent, crowns, state, econom-, billions, employed, Swed-,
government, support. This proxy was named ECON.

Sport. The coronavirus pandemic caused almost all organised sporting events around the world to be
suspended, cancelled, delayed or moved. Delaying the Olympic Games in Tokyo and the UEFA
European Football Championship were unprecedented measures taken in light of the coronavirus
crisis. Because sport affects so many people, and we know from earlier studies that it also affects stock
prices, it is not surprising sport became an important topic. The keywords with highest weights in this
topic had roots like game, sport, match, Olympics, cancel, coronavirus, club, move. This proxy was
named SPORT.

Coronavirus Contemplations. For many individuals, an important part of the debate around the
coronavirus outbreak has centred upon “the new normal”, i.e. how to adapt and carry on living
meaningful and fulfilling lives in an unusual and confronting situation characterised by social
distancing, isolation and loneliness. The spread of the virus has impacted every aspect of our lives and
in turn prompted widespread debate in the media. The keywords with highest weights in this topic
had roots like world, death, person, time, self, life, live, years. This proxy was named FEEL.

Culture. Culture also had a central place in public discourse, as many cultural events were also cancelled
completely or moved to unknown dates in the future. The arts, exactly like sport, depend on people
gathering in large groups, and many culturally based industries rapidly fell into a deep crisis within a
week. The keywords with highest weights in this topic had roots like culture, film, public, music,
cancel, media, concert. This proxy was named CULTURE.

 24
Instructions to Swedish People. The Swedish government and authorities repeatedly called on the public to
follow existing instructions and advice to help slow the spread of the virus and protect the health care
system from collapse. Their guidelines included working from home where possible, maintaining
distance between one another where practicable, limiting social contact to members of one’s own
household, and avoiding crowded places. People above the age of 70 were repeatedly instructed to be
careful and follow these recommendations very carefully. This topic was different from the
aforementioned Swedish Strategy in that it did not deal with restrictions and limitations enforced on the
public, but mostly with calls on personal responsibility and solidarity. The keywords with highest
weights in this topic had roots like at home, keep distance, job, think, help, public, time, try, people.
This proxy was named INSTR.

Swedish Healthcare System. Most countries in the world are simply not prepared to respond to a pandemic
with overwhelming numbers of sick people at a time. The state of the healthcare system became crucial
as it was associated with the potential outcome for both COVID-19 patients and medical staff. Due
to the characteristics of the virus and its transmissibility, elderly care received a lot of attention in the
media due to the high occurrence of deaths in nursing homes. A lack of protective equipment in
hospitals and nursing homes, limited intensive care places and shortages of medical staff sparked
intense debate in the Swedish media. The words with highest weights in this topic had roots like
region, patient, medical care, Stockholm, staff, hospital, protective equipment, intensive care,
commune, nursing home, Karolinska (a Swedish university and hospital institution). This proxy was
named MED.

Coronavirus-Related Events in The World. Delivering information on the spread of coronavirus around the
world has been a main goal of the media during the unfolding crisis. In the modern, globalised world,
countries are dependent on each other's production and trade and it is reasonable to assume Swedish
investors have meticulously followed not only local coronavirus-related news but also sought
information from other countries. This is particularly true of Italy and the United States. The words
with highest weights in this topic had roots like Chin-, coronavirus, Trump, USA, Ital-, country,
president, infect-, close, quarantine. This proxy was named WORLD.

Coronavirus Spread. A lot of news articles published during our research period were dedicated to the
virus and its spread. First there was news about a new virus outbreak in Wuhan, then about deaths
associated with this new disease in China. After the virus had spread to other parts of the world and
the WHO announced that coronavirus could be characterised as a pandemic, news articles began to
centre upon statistical comparisons between countries, and the ways in which governments were

 25
responding to the crisis. The words with highest weights in this topic had roots like infect-, new,
coronavirus, virus, disease, China, case. This proxy was named SPREAD.

3.4.5 Sentiment Analysis
Sentiment analysis was performed on each article separately in R (R Core Team, 2020), using packages
tidyr, tidytext and dplyr. The analysis was performed on the text data translated to English with
Microsoft’s translator in Excel and through the use of the bing sentiment lexicon (Hu and Liu, 2004)
in R. The translation of text can bring with it some sources of error, but due to the sentiment analysis
being done only to get the sentiment orientation of articles, we deemed this error insignificant.
Another path would have been to use a Swedish sentiment lexicon (i.e. Dahlgren’s “sentimentlex”,
2019), but due to bing being a built-in lexicon in R and heavily used in literature, we deemed it more
reliable and exhaustive than sentiment lexicons found in Github repositories. The bing lexicon
contains affective words and their sentiment orientation (positive or negative). The weighted sentiment
score allocated to each article was calculated as follows:

 − 
 = 

where a denotes the article. The net sentiment of each article was normalised through division with the
total amount of words in the article to ensure that the sentiment score was not dependent on the length
of the article, but on the relative proportion of sentiment words used. We believe the length of an article
is not a primary determining factor in its potential impact on the reader, while its relative sentiment score
is of greater importance. Each article’s sentiment score was then combined with the article's topic score
using the following formula, yielding the amplitude for each article’s sentiment proxies.

 , = × , 

where p denotes each proxy respectively (REST, POLIT, ECON, SPORT, FEEL, CULTURE,
INSTR, MED, WORLD and SPREAD) and a denotes the article.

 26
3.4.6 Allocation of News to Dates
The topic and sentiment scoring was performed on individual news articles. To measure the impact
of coronavirus-related news on the stock index, the scores per article were then aggregated to one data
point per day. This was done due to the stock index data having a daily frequency. The news occuring
on weekends or holidays were aggregated to the following trading day.

3.5 Economic Indicators
In literature, the analysis of the effect of sentiment on stock markets has involved the use of
explanatory variables encompassing fundamental economic factors, because important economic
news most likely causes change in stock returns. Due to the short period of our study and daily
frequency of data in use, no suitable economic indicators for the Swedish market can be used at the
day of writing. Thus, we used economic news in Dagens Nyheter as proxies for changes in
fundamentals. Even though Dagens Nyheter does not publish detailed economic reports on main
macro- and microeconomic indicators, it presents the economic news regularly and offers insights into
the present state of the economy.

3.6 Multiple Regression
Two regression models were set up in this study, one explaining the variations in the Stockholm small
cap index, the other the Stockholm large cap index through our coronavirus-related sentiment
indicators. The models were specified in accordance with econometric theory (Brooks and Tsolacos,
2010) and regressions run in R (R Core Team, 2020) using dynlm F(Zeileis, 2019).

3.6.1 Model Specification
The specification of the empirical regression models is done through statistical tests. First was chosen
whether the dependent variables, OMXSSCPI and OMXSLCPI indices, should be in level points or
in growth rates. This decision was made based on examination of the autocorrelation of level indices
and their returns for different lags. Then, a dynamic model with Distributed Lag variables was chosen,
with the dependent variables explained by contemporaneous and lagged sentiment proxies. This was
due to two factors: Firstly, it could take time for news to reach the audience and secondly, some news
articles were published after the market had closed, meaning news could not possibly impact the
indices on the day of publishing.

The sentiment proxies allocated to different topics during the modeling process were used as
numerical variables in the regression as described by Kaplanski and Levy (2014), creating a multiple

 27
You can also read