Predicting market price trends of Magic: The Gathering cards - Masaryk University

Page created by Christian Sullivan
 
CONTINUE READING
Masaryk University
       Faculty of Informatics

Predicting market price trends
of Magic: The Gathering cards

          Bachelor’s Thesis

          Pavel Nedělník

           Brno, Spring 2021
Masaryk University
       Faculty of Informatics

Predicting market price trends
of Magic: The Gathering cards

          Bachelor’s Thesis

          Pavel Nedělník

           Brno, Spring 2021
This is where a copy of the official signed thesis assignment and a copy of the
Statement of an Author is located in the printed version of the document.
Declaration
Hereby I declare that this paper is my original authorial work, which
I have worked out on my own. All sources, references, and literature
used or excerpted during elaboration of this work are properly cited
and listed in complete reference to the due source.

                                                     Pavel Nedělník

Advisor: RNDr. Jaroslav Čechák

                                                                    i
Acknowledgements
It is my genuine pleasure to express my deepest gratitude to my mentor
RNDr. Jaroslav Čechák. His patience and willingness to help and guide
are beyond admiration.

                                                                    ii
Abstract
Magic: the Gathering is a famous trading card game, built upon a
very strong secondary market. We recognize the potential this market
has and explore the possibilities of recent discoveries on the field of
natural language processing to try and predict its movements. We
use word embeddings and large-scale sentiment analysis to test the
hypothesis that the price of cards can be predicted based upon the
activity on the social media platform Reddit. We conclude that there
is no such connection.

                                                                    iii
Keywords
machine learning, NLP, SOBHA

                               iv
Contents
1   Introduction                                                                                             1

Introduction                                                                                                 1
   1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                       2

2   Preliminaries                                                                                           3

3   Related Work                                                                                            5

4   Dataset                                                                                                  7
    4.1 MTGGoldfish . . . . . . . . . . . . . . . . . . . . . . . .                                          7
    4.2 Reddit . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                       9

5   Feature Analysis                                                                                        11
    5.1 Target Variable . . . . . . . . . . . . . . . . . . . . . . .                                   .   11
    5.2 Basic . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                   .   12
    5.3 Word Embedding . . . . . . . . . . . . . . . . . . . . .                                        .   12
         5.3.1 Term Frequency-Inverse Document Frequency                                                .   12
         5.3.2 Bag of Words . . . . . . . . . . . . . . . . . . .                                       .   13
         5.3.3 Word2Vec . . . . . . . . . . . . . . . . . . . . .                                       .   13
    5.4 Sentiment . . . . . . . . . . . . . . . . . . . . . . . . .                                     .   13
    5.5 Mention Detection . . . . . . . . . . . . . . . . . . . .                                       .   14
         5.5.1 Naïve Bayes and Bag of Words . . . . . . . . .                                           .   14
         5.5.2 Word2Vec . . . . . . . . . . . . . . . . . . . . .                                       .   15
    5.6 Composition . . . . . . . . . . . . . . . . . . . . . . . .                                     .   15

6   Predictors                                                                                              16
    6.1 Random Forest . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
    6.2 Logistic Regression . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
    6.3 Naïve Bayes . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
    6.4 Support Vector Machines .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19

7   Results                                                                                                 20
    7.1 Testing Parameters . . . . . . . . . . . . . . . . . . . . .                                        20
    7.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . .                                      21
    7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                     22

                                                                                                            v
7.4   Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 24

8   Conclusion and Future Work                                        25
    8.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 25

Bibliography                                                            26

A An appendix                                                           32

                                                                        vi
1 Introduction
Magic: the Gathering (MTG), created in 1993 by Richard Garfield
Ph.D. and often considered to be the first of its kind, is a trading card
game. It’s played with a player-built deck of cards, these cards can
have immense value, with some being sold for over $20 000 for an
individual copy. The player community consists mainly from the 18-34
old demographic that is estimated to have significant buying power
[1]. While the price of the majority of MTG cards does not reach any-
where near these heights, it should be clear that understanding this
market could be extremely valuable. Many comparisons can be drawn
between the prices of individual cards and stock markets and with
the ongoing advancements in predicting the stock market, a question
proposes itself. Can we predict the fluctuations in MTG card prices?
Due to the highly individualistic nature of both supply and demand
for these cards in combination with all the usual difficulties with pre-
dicting the market, this poses a complex and difficult machine learning
task.

    This area was previously of some interest [2, 3]. We introduce a
novel approach by utilizing Social Media Predictions. This growing
body of research uses metrics such as general activity and sentiment
determined from large-scale social media feeds to create conjectures
about the future. The platform of interest is Reddit [4]. A social media
site that hosts a significant community of MTG players. We calculate
multiple metrics about the activity on this platform and correlate them
to the prices gained from MTG Goldfish (MTGG) [5].

    The goal of this thesis is to explore whether there is a clear connec-
tion between increased social media activity and changes in the prices
of MTG cards. We interpret this as a classification problem and try to
predict the magnitude and the direction of the price change.

   Our results suggest that there is no such connection.

                                                                        1
1. Introduction

1.1 Outline
We present the outline of our work:

   • In chapter 2, we provide the necessary background for the MTG
     game and the Reddit platform, that are the main focus of our
     thesis.

   • In chapter 3, we place our work in the context of the scientific
     community. We highlight differences and similarities to other
     works with similar goals.

   • In chapter 4, we describe the data acquisition process and present
     our reasoning for each choice.

   • In chapter 5 we describe the methods we used to process the
     raw data as obtained in the previous chapter.

   • In chapter 6 we describe the strategies used to perform the clas-
     sification task as previously described.

   • In chapter 7 we present and interpret our experimental results.

   • In chapter 8 we make the conclusion to our work and discuss
     potential further research.

                                                                     2
2 Preliminaries
In this chapter we explain more about the character of MTG, focusing
on the financial side of things. We define the price of a card and explain
the main driving factors behind it.
    For many players, MTG is a highly competitive game, as most of
the player base is organized around tournaments. However, the de-
gree to which players compete, varies greatly, all the way from local
Friday Night Magic events with little to no barrier of entry to the recent
Magic World Championship XXVI with a prize pool of $1 000 000. This
combined with the fact that new cards are constantly being released
and added to the existing formats forces players that want to stay com-
petitive to constantly buy and sell cards from their collection creating
a strong player-driven market, which is the main driving factor behind
the price of most cards. For the sake of completeness it should be
stated that other groups can have significant influence over the price
of MTG cards, for example, card collectors or power sellers, however,
often these groups are not interested in the same cards that the group
of players we described would be interested in. With the use of our
expert knowledge, we sidestep this issue by carefully selecting cards
so the impact these groups might have will be minimized and so, we
take the liberty to not consider them further.

    The addition of new cards is bound by a set released about four
times per year containing from 150 to 200 cards. These cards may have
been printed before, and may be reprinted in the future, with slight
or more significant cosmetic alterations. In this thesis we focus only
on the regular, i.e. the cheapest printing in regular sets, versions of
cards, because the price of alternate versions is often driven by the
collective aspect of the game and thus is less likely to correlate to new
events. The game is played with a certain format in mind. Formats
can differ in the fundamental rules of the game, but for us, the most
important difference is in the legality of cards. We choose to limit
ourselves to only consider cards legal in the Constructed Modern
format, we explain this choice more thoroughly in chapter 4.1.
    Wizards of the Coast [6], the company currently responsible for
maintaining the game, provides multiple options for obtaining these

                                                                        3
2. Preliminaries

cards. However, none of these options is particularly efficient, since
it is either hindered by randomness or selection. This lack of service
leads to a very strong secondary player-based market. Cards can be
traded or sold directly between players, but more commonly third-
party services such as MTGG, ChannelFireball [7], CardMarket [8],
and TCGPlayer [9] are used. There are significant differences between
the types of services that these companies provide, but they generally
can be separated into two groups. More commonly, as is the case with
aforementioned MTGG and ChannelFireball, these companies buy
and sell cards just like any normal store would. In contrast to this,
CardMarket, TCGPlayer, and similar companies only facilitates the
exchange of cards between players that then pay a small fee per use of
the service. Understandably, there is a difference between the prices
as offered by these services. A question poses itself, what is the actual
value of a card and how could we find it? It is common practice within
the community to trade and sell cards according to the price trend
[10] of CardMarket (for Europe) and TCGPlayer (for America) and its
equivalents for other localities. Sadly, none of these services provides
access to historic information on these prices. Fortunately, there is an
observable and quantifiable relationship between the prices of shop-
like and market-like services. The market price trend is consistently
about 70% of the price offered by shop-like services and so, this is how
we define the card price that we would like to predict. We illustrate
this be including this ratio for the cards we focus on in this thesis 2.1.

                                                                        4
Table 2.1: Ratio of MTGG prices and CardMarket prices

                       Card Name           Price Ratio
                   Liliana of the Veil       69.44%
                Jace, the Mind Sculptor      68.65%
                    Karn Liberated           74.12%
                      Thoughtseize           68.18%
                       Tarmogoyf             71.69%
                    Noble Hierarch           72.27%
                   Horizon Canopy            65.53%

3 Related Work
While MTG has sparked a lot of interest [11, 12] very few works focus
on the trading aspect of the game. A very notable exception is [2]. The
authors focused on the MTG Online environment, an alternate version
of the game that is played exclusively online. Despite being online, the
game economy is very tightly tied to real-life currency. Players buy the
in-game currency in the official Wizards of the Coast shop with a 1:1
ratio for dollars. The opposite process is made possible through the
player-based market. They succeeded at creating a trading strategy
that outperforms a Buy and Hold Strategy [13], a commonly used
baseline model. Some of the sources of information they used were
professional MTG blogging sites like ChannelFireball [7] or MTGG.
Even though no direct correlation between the general content of
these sites was found, the authors managed to show a strong influence
of MTGG Budget Magic series on the price of a specific subset of cards.

    Abstracting from the specifics of our inquiry, we can find several
SMP applications in various environments [14, 15, 16, 17, 18]. Of most
interest to us being stock market predictions and finance in general,
as stated in 1.1 Cards as Market Assets. The most famous work [19]
in this area used Google Profile of Mood States to classify the mood
of posts on the social media site Twitter [20] and neural networks
to correlate them to the Dow Jones Industrial Average (DJIA) [21].

                                                                      5
3. Related Work

They succeeded at showing that the collective public mood, derived
from large-scale Twitter feeds, can be used to significantly improve the
accuracy of models predicting the DJIA closing values. Their model
achieved a very high accuracy of 87%. Surprisingly, the most impactful
mood was the “Calm” mood. GPOMS is no longer available, however,
an alternative was developed [22], albeit with a slightly more modest
result of 75.5% score in a sequential k-fold cross validation test. Au-
thors of [23] developed a sentiment classifier for the Twitter platform
and showed that there is a strong connection between the public opin-
ion of a company, expressed through tweets, and changes in its stock
value. The main contribution of their work is the mood classifier, which
managed to achieve an accuracy of 70.5% with N-gram representation
and 70.2% for Word2Vec, see chapter 5.3.3. This is very comparable to
the human concordance on this topic, which is estimated to be around
70% to 79% [24]. Despite the slightly lower accuracy, the authors argue
in favor of the use of Word2Vec citing its "promising accuracy for large
datasets and the sustainability in word meaning."

    While Twitter is the object of interest for most of the recent re-
search, MTG community is far more active on the Reddit site, with
over 450 000 users on the main Reddit page as opposed to 287 000 on
Twitter, and so, this is where our inquiry leads. Despite not being the
focal point of interest of the scientific community, there is still much
work done for the Reddit platform. In [25] the authors used multiple
metrics, some specific for Reddit, like the volume of comments, their
language, and popularity to improve the results of neural networks
trained to predict the value of different cryptocurrencies. As another
example, [26] shows a strong connection between the activity on stock-
related subreddits and changes in stock prices. This is again done with
the use of sentiment analysis.

  According to our formulation of the task, see chapter 1, we use a
number of methods common for text classification [27] described in
more detail in chapter 6.

                                                                      6
4 Dataset
The very first step of our work was acquiring the necessary data. In this
chapter we describe the sources for our data, the way it was acquired,
its characteristics, and comment about our reasoning for each choice.
The dataset consists of two main sources - MTGGoldfish and Reddit.

4.1 MTGGoldfish
The most crucial part of our data collection process was obtaining
the historic prices of the cards themselves. Multiple sources were con-
sidered, some used in the past [3], however, in recent years most of
them became unavailable to the public. Eventually, we were left with
a singular choice. Despite its shortcomings, namely a paywall, general
lack of context such as information about sales, and a lack of API [28],
data from the MTGG was chosen to fill this part. This carries addi-
tional problems, since the site belongs to a company that is native to
America and so, may not represent the entire community. For the most
part, this should not be an issue, since the prices are updated regularly
to match the competitors and, most importantly, the player market
as is said in more detail in chapter 2. However, there are instances
when a significant gap between the markets can arise, for example
when the worldwide distribution of new cards is hindered in some
way. Alternatively, since the player-based market is not moderated in
any way, a player may attempt to artificially raise or lower the price of
a specific card on the local market through various methods of market
manipulation.

    Due to the high computational requirements of many of the tech-
niques we use, and the sheer amount of MTG cards, over 20 000 [29]
at the point of writing this thesis, we chose to only focus on a handful
of handpicked cards. With our expert knowledge, we chose four main
criteria to look at when considering a card:

   • History

   • Demand

                                                                       7
4. Dataset

   • Price

   • Format

    In the first criteria, we looked at cards that were at least seven
years in existence. This was to maximize the size of our dataset. We
also wanted these cards to be predominantly played in non-rotating
formats. In rotating formats, after a certain period of time, cards stop
being legal for deck construction. This greatly reduces the time during
which these cards are likely to be relevant and in turn diminishes our
dataset.
    In the demand criteria, we looked at tournament play. We made
the assumption that heavily played cards are more likely to have both
a more varied price history and, more importantly, a higher social
media traffic. We considered cards that were played in at least 5% of
tournament-winning decks during the majority of their existence.
    Lastly, the Price criteria were set so that there were possible real-life
applications for our discovery. With relatively high postal costs and
other such fees associated with buying MTG cards, we considered
only cards with a current price above $ 20. Price is also, to some extent,
a reflection of demand.
    We also decided to limit our choices to the Constructed Modern
format as the most played non-rotating competitive format. There
are many cards that satisfy our criteria within this format and this
limitation allowed us to pick a more specific subreddit, see chapter 4.2,
with higher concentration of relevant information. It also allowed us
to calculate more metrics about the data, since the total amount was
lessened. With this choice in mind, we picked cards that are predomi-
nantly played in Constructed Modern to limit outside influences.
    Aforementioned process could be done automatically since this
data can be accessed through the combination of ScryFall [30] and
MTGTOP8 [31] APIs.
    With these restrictions in mind, we picked the following cards:
Liliana of the Veil, Jace, the Mind Sculptor, Karn Liberated, Thoughtseize,
Tarmogoyf, Noble Hierarch, Horizon Canopy.

   The prices data of these cards were accessed with the use of Python
Requests library [32]. MTGG does not provide an API access point,

                                                                           8
4. Dataset

so the data was accessed with the use of empirical knowledge.

   The dataset consists from daily updated prices and corresponding
dates, we worked under the assumption that the prices for any given
date were assigned at midnight, however, we were unable to confirm
whether that is the case.

4.2 Reddit
The majority of our work focuses on the social media platform Reddit.
Reddit lets users join communities called subreddits, where they can
post submissions or react to them with comments and votes. Submis-
sions (comments) can be at most 40 000 (10 000) characters long and
can include anything from emoticons, markup to urls. Subreddits have
topics listed in their description. They can range from hobbies to jokes
or world news.

   As suggested in the previous chapter, we decided to focus on the
r/modernmagic subreddit, a subreddit dedicated to the players of the
Modern Constructed format, we then also used r/magicTCG, a general
MTG subreddit, as a control for our findings. For r/modernmagic we
acquired all records of activity during the time frame of seven years,
from 1. 1. 2014 to 1. 1. 2021. For r/magicTCG we only looked at three
years, from 1. 1. 2014 to 1. 1. 2017.

    This data was obtained through the Pushshift [33] API, again with
the use of Requests library. This API was chosen because it provides
the option to search for user activity by timestamps, which was crucial
for our research. It provides three endpoints - subreddit, submission,
and comment. The subreddit endpoint sadly was not accessible at
the time of writing this thesis. It could have been used to gain more
comprehensive insight into the community, seeing that we would not
be limited to manually chosen subreddits. We approached submis-
sions and comments separately, due to the slight differences in their
character and rules.

                                                                      9
4. Dataset

   For r/modernmegic (r/magicTCG), we collected over 53 000 (110 000)
submissions and 1 320 000 (3 000 000) comments.

    Out of the raw data returned by the API, we were only interested
in some parameters. Created_utc was used to place submissions and
comments into the correct time period. We used id and parent_id to as-
sign comments to corresponding submissions. We were also interested
in the score parameter. Users of the platform can express their opinion
on a comment or submission by voting. The score is the difference
between positive and negative votes. We then combined the title and
the text of submissions to a single feature to match comments. We also
preprocessed the text parameter further with:

  1. Regular Expressions
     were used to remove unnecessary data such as urls and markup
     commands and also to replace common occurrences such as user
     tags, references to other subreddits, and numbers with tokens
     to better generalize context.
     An example of such change is replacing "4x" with "token_count".

  2. Stopword Removal
     Stopwords are commonly used words that carry little to no mean-
     ing. For example: "the", "and", or "have".

  3. Tokenization
     Each text is transformed into a list of sentences and each sentence
     is transformed into a list of words. Punctuation is removed.

                                                                     10
5 Feature Analysis
Next step after data collection is to further process this data to be
usable by predictors. We do this either directly, by word embedding,
or indirectly, by calculating descriptive metrics. First, we calculate the
target variable for our classification problem. We then extract input
variables from the Reddit dataset. These features are separated into
three groups:
   • Basic
   • Word Embedding
   • Sentiment
   • Mention Detection

5.1 Target Variable
To construct the target variable for our classification problem, we use
the MTGG dataset, obtained as described in chapter 4.1.
    First, the price difference from the last time interval is calculated.
For a specific card i and the j-th time interval of length n is calculated
as 4.1 where pricea,b refers the the price of a card a on a day b starting
with day zero.
                  diff(i, j) = pricei,j∗n − pricei,( j+1)∗n         (5.1)

   This difference is then transformed into a classification problem in
one of the following ways:
                              (
                                 1 if diff(i, j) > 0
                 sign(i, j) =                                      (5.2)
                                 0 otherwise
                              (
                                1 if |diff(i, j)| > t
                 size(i, j) =                                      (5.3)
                                0 otherwise
Where t is a threshold set as the average price difference in the training
dataset.

                                                                       11
5. Feature Analysis

5.2 Basic
For each submission, we calculate basic descriptive features such num-
ber of comments, score, and average score of comments. For comments,
we only calculate score.

5.3 Word Embedding
One of the crucial features of our model are word embeddings, vector
representations of texts. The central idea of these methods is that texts
with similar meaning should have similar vectors. This similarity can
then be measured, for example by cosine distance [34], as is done in
this thesis.
                                          v1 · v2
                       dist(v1, v2) =                              (5.4)
                                        |v1| · |v2|
The values range between −1 and 1, where −1 indicates opposite
meaning and 1 indicates a synonym. Unlike plain text, these vectors
can then be fed to a predictor as a fixed set of features.

5.3.1 Term Frequency-Inverse Document Frequency
Both the methods we introduce later on are designed to embed words,
however, we would like to embed sentences and even entire para-
graphs. Naïve approach to this could be to average the embeddings of
all the words within the text we would like to embed. This approach
has the unwanted effect that words or phrases that repeat often, and
consequently are less likely to carry meaning, are propagated more.
We can combat this by multiplying the embeddings with the Term
Frequency-Inverse Document Frequency (TF-IDF) of the correspond-
ing word [35].
TF-IDF is the combination of two statistics, term frequency and in-
verse document frequency. Say we wish to calculate the TF-IDF of
a word w within the document di from the corpus D. The term fre-
quency is given by the formula (5.6), inverse document frequency by
the (5.7) formula, and consequently the TF-IDF can be calculated as
(5.8), where wdi denotes the number of occurrences of word w in
the document d1 , |di | denotes the number of word within the docu-
ment d1 , | D | is the number of document within the corpus D, and

                                                                      12
5. Feature Analysis

|{dkw ∈ d, d ∈ D }| is the number of documents containing word w in
the corpus D.
                                   w di
                              tf =                            (5.5)
                                   | di |
                                        |D|
                      idf = ln                                       (5.6)
                                 |{dkw ∈ d, d ∈ D }|
                           w di               |D|
                 tfidf =          · ln                               (5.7)
                           | di |      |{dkw ∈ d, d ∈ D }|

5.3.2 Bag of Words
Bag of Words (BOW) [36] is a simple and intuitive way to perform
word embeddings. For a given corpus, all present words are ordered
and assigned a number based on this ordering. This vocabulary is
then stored. A word is embedded as a vector of zeroes with a one
at the position that corresponds to the position in the vocabulary, if
there is such a word. Despite its simplicity, it is still one of the go-to
algorithms [37, 38] for this task.

5.3.3 Word2Vec
More complex approach is with Word2Vec (W2V) [39, 40]. Very im-
portant difference to the BOW model is that the dimension of the
resulting vector stays the same no matter the size of our corpus. It
uses a shallow, two-layer neural network to construct the embeddings.
There are two approaches to training the model. Continuous bag of
words (CB) trains the network to predict the target word from sur-
rounding context words. Skip-gram (SG) model is trained to predict
the context words within a certain range of the target word.

5.4 Sentiment
Sentiment analysis has been an integral part of SMP [19, 22]. We
use the Bidirectional Encoder Representations from Transformers
(BERT) BASE [41] multilingual model tuned for sentiment analysis of
product reviews, kindly provided by Hugging Face [42]. BERT models
use pre-trained deep-learning algorithms that then can be fine-tuned

                                                                       13
5. Feature Analysis

for specific task to quickly create state of the art models for natural
language processing. BERT BASE is the smaller of the two versions of
the model (as opposed to BERT LARGE).

5.5 Mention Detection

One of the issues we encountered when making this thesis was that
unlike many of our predecessors we don’t have a specific enough
platform to where we could consider all activity as relevant. For each
given subreddit we have up to thousands of potential cards of inter-
est. Naïve approach would be to simply consider only submissions
or comments that directly mention the card name, this would be ill
advised, because it is very common that players refer to cards with
alternate names, that don’t necessarily have anything in common with
the actual name, for example a card named Dark Confidant is most of-
ten referred to simply as Bob. This is very similar to keyword extraction

    We define a relevant submission as a submission directly contain-
ing a word that is either the name of the card that is currently the
object of interest or a word that is detected by one of the following
approaches as a synonym. For comments, we considered all comments
either satisfying the same condition or as assigned to a submission
that does.

5.5.1 Naïve Bayes and Bag of Words

A simple and intuitive approach is a Naïve Bayes classifier, see chapter
6.3, with BOW representation of the corpus, equivalent to the multi-
nomial model from [43], described earlier in this chapter. With the
hypothesis that names or other mentions of the cards have the biggest
impact on their price, we trained the NB BOW and later LR on the
BOW preprocessed dataset with labels set according to the sign of the
price change of a time period from one to seven days. We evaluated
this method by studying the weights assigned to each corresponding
word.

                                                                      14
5. Feature Analysis

5.5.2 Word2Vec
More sophisticated solution to this task is with Word2Vec represen-
tation as described earlier. We can compare the embeddings of the
card name to any given word and, for similarity, see (4.5), above a
certain threshold t, consider them to be a synonym. The threshold
was empirically set to be 0.8.

5.6 Composition
The last step of the feature extraction process is to combine our sources
to one dataset. We do this by splitting the Reddit dataset into individ-
ual days that are then combined to form a single entry to match the
MTGG data. The feature values are averaged over the entire day.

    To maximize training examples, we then use a technique named
Rolling Window to combine multiple days into a single entry. Notably,
this method does not significantly affect the dataset length. It can be
described as follows:
We wish to transform our dataset D with entries a1 , a2 , ...an to the new
dataset with entries b1 , b2 , ...bn−d by combining d + 1 days to a single
entry, then the i-th entry of the new dataset can be described as (5.9),
where Fa j is the set of input variables of the j-th entry of the dataset D.

                                       +d
                                      i[
                               bi =          Fa j                     (5.8)
                                      j =i

                                                                         15
6 Predictors
In this chapter we introduce the algorithms that are used to predict
the price movement. We interpret this as a classification task. Our
model predicts the target variable (class), as defined in chapter 5.1,
from a set of input variables (features).

6.1 Random Forest
Random Forest Classifier (RF) originates from DT, but combats the
tendency to overfit with the use of the Bagging algorithm. It falls
within the category of ensemble learning [44]. It is one of the go-to
algorithms for many tasks, including text classification [45].

Decision Tree

DT can be used for both classification and regression. During the train-
ing phase a tree-like structure is constructed in the following way:
Nodes represent a split in the dataset along the value of a specific
input variable. Leaves correspond to a target variable. Thus, a branch
of nodes leading to a leaf represents a series of decisions based on
the values of input variables that determine the target variable. Im-
portantly, input variables for DT have to be categorical. Variables that
do not satisfy this condition will be discretized as the first step of the
algorithm.

    The tree construction is most often realized top-to-bottom:
Starting with the entire training dataset a feature is chosen based on
a number of metrics, see below. This feature becomes a node. The
dataset is then split along this feature producing subsets of the initial
dataset. This algorithm then repeats on each of these subsets until
there are no more suitable features to select or the subset has only one
value. The tree is then usually pruned [46], non-perspective branches
are combined to a single leaf, to mitigate overfitting. Pruning can also
be done during the construction by modifying the stop condition [47].

                                                                       16
6. Predictors

    The feature selection process has a big impact on the resulting
accuracy and the ability to generalize well. We used two metric in this
thesis: Entropy and Gini Index.
    For a given dataset D, on a given input variable F with 1, ..., n
classes, Entropy measures the purity of the split along that variable. It
is calculated as (6.1), where |i | is the number of entries that belong to
the class i and | D | is the size of the dataset. Its values range from 0 to
1, where 1 signifies the highest possible impurity.
                                     n
                                           |i |      |i |
                         E( F ) =   ∑ − | D| ln | D|                  (6.1)
                                    i =1

    Gini Index is very similar in its function. It is given by the formula
(6.2).
                                            n
                                                   |i | 2
                         GI ( F ) = 1 − ∑ (            )              (6.2)
                                           i =1
                                                  |D|

Bagging
Also called Bootstrap Aggregating [48, 49] is a method used to reduce
variance and avoid overfitting for many different algorithms.

     Given a training dataset, Bagging generates a certain number of
new datasets with lesser or equal size. This is done by bootstrapping
i.e. by sampling from D uniformly, each unique element has equal
chance to be selected, with replacement, each element can be selected
multiple times. On each of the newly generated datasets a predictor,
in our case a DT, is trained and stored.

   Classification is performed by counting all outcomes and selecting
the most occurring, regression is done by averaging over all results.

6.2 Logistic Regression
Logistic Regression [50, 51], despite what its name would suggest, is
a model used for classification. In some sense, it is an adaptation of
the Linear Regression [52] model for classification. First, regression is

                                                                         17
6. Predictors

performed, as it would be with Linear Regression, i.e. the value of each
input variable is multiplied by its bias as determined by the model
during the training process. LR calculates the logarithm of odds of
a target variable as a linear combination of input variables. This is
then transformed by the logistic function to change the range from
h−∞, ∞i to h0, 1i so that classification can be performed by comparing
the result with a threshold, for example 0.5.

     To demonstrate, assuming we want to calculate the probability
of target variable Y being the class 1, based on the input variables
x1 , x2 , ...xn , and during the training process, we have obtained the
weights w1 , w2 , ...wn and bias b we can use the formula:

                              eb+w1 · x1 +w2 · x2 ...wn · xn
                     P (Y = 1 ) =                                               (6.3)
                            1 + eb+w1 · x1 +w2 · x2 ...wn · xn
   To show this is LR as we defined it:

  P(Y = 1) + P(Y = 1) · eb+w1 · x1 +w2 · x2 ...wn · xn = eb+w1 · x1 +w2 · x2 ...wn · xn
                                                                                  (6.4)

            P(Y = 1) = (1 − P(Y = 1)) · eb+w1 · x1 +w2 · x2 ...wn · xn          (6.5)

                    P (Y = 1 )
             ln                  = b + w1 · x1 + w2 · x2 ...wn · xn             (6.6)
                  1 − P (Y = 1 )
    LR makes a number of important assumptions about the dataset
[53]. As described here, it assumes that the target variable is dichoto-
mous, but there are variations that do not have this restriction [54].
It also assumes that all input variables are independent, that there is
no noise in the dataset, and a linear dependence between the input
variables and the target variable. This can lead to drastically lowered
accuracy when not all of these assumptions are met.

6.3 Naïve Bayes
Naïve Bayes (NB) [55] is a classifier based on the Bayes Theorem.
It models the probability of the target variable as the multiplication

                                                                                    18
6. Predictors

of the conditional probabilities of all input variables. It makes the
assumption that all input variables are strongly independent, despite
the fact that this condition is rarely met in real-life examples, it often
outperforms more complex models. Its main advantage is its relative
simplicity and the resulting speed. It is also resilient to noise in the
training dataset [55].
    For fine-tuning we used the Multinomial and Gaussian variants as
defined []

6.4 Support Vector Machines
Support Vector Machines [35, 56] can, with slight differences, be used
for both regression and classification. Since this is a lengthy topic, we
will only explain dichotomous classification. A training dataset of m
entries with n target variables can be represented by an n-dimensional
space, where each entry corresponds to a point p1 , p2 , ...pm given by
the values of its target variables. SVC seeks to separate this space by a
hyperplane, subspace with the dimension n − 1. Ideally, this hyper-
plane would separate the space into two half spaces, each containing
only points corresponding to entries with the same class y. Out of the
hyperplanes for which this holds, we choose the hyperplane so that
the distance from it to the nearest points of both classes is maximized.
These points are called support vectors are sufficient to determine
our classifier. To extend this model into situation when the classes are
not separable without error, soft-margin adaptation is used, it has a
regularization parameter C that balances the error allowance. In the
case that the data is not linearly separable, it may be separable with the
use of a kernel trick. The trick is to transform the space into a higher
dimension and perform the linear separation there. Multiple kernels
can be used. The most popular choices: polynomial, rbf, sigmoid, and
the aforementioned linear are used in this thesis.

                                                                       19
7 Results
In this chapter we present the results gained by evaluating the pro-
posed model with the data we collected. We first provide the overview
of features we calculated in chapter 5 and describe the testing method-
ology. We then introduce the methods we will use for evaluating our
results and perform the evaluation.

7.1 Testing Parameters
Based on our experimental results, we decided to generate 8 datasets
for each card for systematic testing. These datasets always contain the
following features:

   • score see

   • comment_score

   • sentiment

   The differences are in the choices for these features:

   • size Rolling Window size - 3, 7 days

   • delay Label delay - 1, 3 days

   • word2vec skip-gram, continuous bag model or no embedding

    Additionally, we use the model-specific parameters as described
in chapter in chapter 6. This is an overview:

   • SVM - kernel: linear, polynomial, rbf, sigmoid; regularization
     parameter C: 1, 5, 10

   • LR - balanced classes, C: 1, 5, 10, penalty: l1, l2, elastic net

   • RF - balanced classes, split criterion: gini, entropy

                                                                        20
7. Results

   • NB - balanced classes, Multinomial, Gaussian

    We also experimented with two formulations, see 5.1, of the classi-
fication problem: sign, size, however, size interpretation was rejected
due to it’s poor results.

   Based on our initial exploration, we decided to exclude some of
the features described in chapter 5. With the growing size of our
dataset, BOW word embedding quickly became unusable due to the
rapidly growing vocabulary. Instead of implementing techniques to
combat this, we decided to only continue using W2V, since it had
consistently better results and didn’t have any such issues. For mention
detection, we also decided to forego NB-BOW. Similarly, this method
did not scale well, however, the real issue was with accuracy. Even
with normalization, the method only identified common phrases as
mentions, which was undesirable. This, however, does not necessarily
speak to the effectiveness of the method, since the hypothesis we
operated under, see chapter 5.5.1, may have been wrong, because we
were ultimately unable to prove that there is a significant connection
between the card price and Reddit activity.

7.2 Evaluation
We use grid search with 3-fold cross-validation over all parameters
to find the best model for the validation set, we present its results for
the test dataset. We average these results over all the considered cards,
see 4.1.
Because we are dealing with very imbalanced classes, simple metrics
such as accuracy are simply not sufficient. Our evaluation method
of choice is F-measure. It is defined as (7.3), where precision and
recall are defined as (7.1) and (7.2). TP is the number of correctly
predicted positive cases, FP is the number of incorrectly predicted
negative cases, and FN is the number of incorrectly predicted positive
cases. Its values range from 0 to 1, where 1 signifies the best possible
result.

                                         TP
                         precision =                               (7.1)
                                       TP + FP

                                                                      21
7. Results

             Table 7.1: F-measure of the baseline models
                         Model        F-measure
                        stratified       0.41
                        uniform          0.51

                                       TP
                          recall =                                (7.2)
                                     TP + FN

                                     precision · recall
                  f-measure = 2 ·                                 (7.3)
                                     precision + recall
   We then also compare these results to baseline models to gain
greater understanding of what they mean. As our baseline we use
these strategies:

   • stratified makes guesses with respect to the class distribution.

   • uniform each class is predicted with the same probability.

7.3 Results
First, we present the baseline models the table 7.1. The results for our
predictors are presented in the table 7.2.

    One of the possible interpretations for some of our poor results
could be dependence within the dataset. We investigated this hypothe-
sis by using pearson correlation coefficient as provided by pandas [57]
to check for pairwise correlation. We then used methods described in
the SciKit [36] documentation [58] and used spearman correlation
coeficient to look for multicollinear dependencies 7.1. Our hypothesis
was rejected.

                                                                     22
7. Results

          Table 7.2: F-measure of our models
 Model     Best   Average Often Selected Parameters
SVM, SG    0.63    0.47         sigmoid, c = 5
 RF, SG    0.35    0.17    no significant difference
 LR, SG    0.37    0.09    no significant difference
 NB, SG    0.42    0.36            Gaussian
SVM, CB    0.57    0.38         sigmoid, c = 5
 RF, CB    0.27    0.16               gini
 LR, CB    0.38    0.13    no significant difference
 NB, CB    0.48    0.41            Gaussian
SVM, NO    0.31    0.20    no significant difference
 RF, NO    0.19    0.08    no significant difference
 LR, NO    0.28    0.13    no significant difference
 NB, NO    0.35    0.16    no significant difference

     Figure 7.1: Multicollinearity Dendrogram

                                                       23
7. Results

7.4 Discussion
The most important observation is that we were unable to improve on
the baseline model.
We achieved consistently better results for the r/modernmagic subred-
dit, rather than t/magicTCG. This is likely due to the lesser amount of
noise. The best performing model was SVM.
We made two observations about the W2V representation for mention
detection. Misspelling and bad grammar are common problems in
SMP. Consider the two most similar words for the card named Liliana
of the Veil - "lilliana", "lilianna." Since W2V can measure only similarity
on the syntactic level, it often mistook cards. For example most similar
words to Karn Liberated are "ugin" and "wurmcoil." Both are parts of
names of similar cards commonly played together. We decided not to
combat this, because it is not unreasonable that the price of cards that
are often used in similar context would be connected.

                                                                        24
8 Conclusion and Future Work
Our objective was to explore the possibilities of Social Media Pre-
dictions in a new environment. We used a wide array of common
approaches and, despite our best efforts, were unable to improve the
baseline models.

8.1 Future Work
Our next option would be to use recurrent neural networks.
Even though the sentiment classifier we used performed reasonably
well in out limited testing, a possible avenue of further research would
be to develop a similar model tuned specifically for the Reddit envi-
ronment.
One of the significant issues we encountered was speed. We chose
Python as our programming language of choice and some of the com-
putations we had to make took up to entire days. While this is not
unheard-of in the field of natural language processing, perhaps a C++
implementation would serve us much better in this regard.

                                                                     25
Bibliography
 1. MARDER, Andrew. "Magic: The Gathering" – Hasbro’s Key to
    Growth. The Motley Fool, 2014. Available also from: https://
    www.fool.com/investing/general/2014/04/05/magic- the-
    gathering-hasbros-key-to-growth.aspx.
 2. DI NAPOLI, MATTEO. Multi-asset trading with reinforcement
    learning: an application to magic the gathering online. 2018.
 3. SAKAJI, Hiroki; KOBAYASHI, Akio; KOHANA, Masaki; TAKANO,
    Yasunao; IZUMI, Kiyoshi. Card Price Prediction of Trading Cards
    Using Machine Learning Methods. In: International Conference on
    Network-Based Information Systems. 2019, pp. 705–714.
 4. Advance Publications, Inc. Available also from: https://www.
    reddit.com/.
 5. MTGGoldfish, Inc. Available also from: https://www.mtggoldfish.
    com/.
 6. Wizards of the Coast. Available also from: https://company.
    wizards.com/.
 7. ChannelFireball. Available also from: https://shop.channelfireball.
    com/.
 8.   The Sammelkartenmarkt GmbH & Co. KG. Available also from:
      https://www.cardmarket.com/.
 9. TCGplayer, Inc. Available also from: https://www.tcgplayer.
    com/.
10. MITCHELL, Cory. Trend Definition and Trading Tactics. Investo-
    pedia, 2021. Available also from: https://www.investopedia.
    com/terms/t/trend.asp.
11. WARD, Colin D; COWLING, Peter I. Monte Carlo search applied
    to card selection in Magic: The Gathering. In: 2009 IEEE Sympo-
    sium on Computational Intelligence and Games. 2009, pp. 9–16.
12. CHURCHILL, Alex; BIDERMAN, Stella; HERRICK, Austin. Magic:
    The gathering is Turing complete. arXiv preprint arXiv:1904.09828.
    2019.

                                                                  26
BIBLIOGRAPHY

13. BEERS, Brian. How a Buy-and-Hold Strategy Works. Investopedia,
    2021. Available also from: https : / / www . investopedia . com /
    terms/b/buyandhold.asp.
14. ROUSIDIS, Dimitrios; KOUKARAS, Paraskevas; TJORTJIS, Chris-
    tos. Social media prediction: a literature review. Multimedia Tools
    and Applications. 2020, vol. 79, no. 9, pp. 6279–6311.
15. MCCLELLAN, Chandler; ALI, Mir M; MUTTER, Ryan; KROUTIL,
    Larry; LANDWEHR, Justin. Using social media to monitor men-
    tal health discussions- evidence from Twitter. Journal of the Amer-
    ican Medical Informatics Association. 2017, vol. 24, no. 3, pp. 496–
    502.
16.   OIKONOMOU, Lazaros; TJORTJIS, Christos. A method for pre-
      dicting the winner of the usa presidential elections using data
      extracted from twitter. In: 2018 South-Eastern European Design
      Automation, Computer Engineering, Computer Networks and Society
      Media Conference (SEEDA_CECNSM). 2018, pp. 1–8.
17. ASUR, S.; HUBERMAN, B. A. Predicting the Future with Social
    Media. In: 2010 IEEE/WIC/ACM International Conference on Web
    Intelligence and Intelligent Agent Technology. 2010, vol. 1, pp. 492–
    499. Available from doi: 10.1109/WI-IAT.2010.63.
18. KRYVASHEYEU, Yury; CHEN, Haohui; OBRADOVICH, Nick;
    MORO, Esteban; VAN HENTENRYCK, Pascal; FOWLER, James;
    CEBRIAN, Manuel. Rapid assessment of disaster damage using
    social media activity. Science advances. 2016, vol. 2, no. 3, e1500779.
19. BOLLEN, Johan; MAO, Huina; ZENG, Xiaojun. Twitter mood
    predicts the stock market. Journal of computational science. 2011,
    vol. 2, no. 1, pp. 1–8.
20.   Twitter, Inc. Available also from: https://twitter.com/.
21. GANTI, akhilesh. Dow Jones Industrial Average. Investopedia, 2021.
    Available also from: https://www.investopedia.com/terms/d/
    djia.asp.
22. MITTAL, Anshul; GOEL, Arpit. Stock prediction using twitter
    sentiment analysis. Standford University, CS229. 2012, vol. 15.

                                                                        27
BIBLIOGRAPHY

23. PAGOLU, Venkata Sasank; REDDY, Kamal Nayan; PANDA, Gana-
    pati; MAJHI, Babita. Sentiment analysis of Twitter data for pre-
    dicting stock market movements. In: 2016 international conference
    on signal processing, communication, power and embedded system
    (SCOPES). 2016, pp. 1345–1350.
24. ELLIS, Ben. Available also from: https://brnrd.me/social-
    sentiment-sentiment-analysis/.
25. GLENSKI, Maria; WENINGER, Tim; VOLKOVA, Svitlana. Im-
    proved forecasting of cryptocurrency price using social signals.
    arXiv preprint arXiv:1907.00558. 2019.
26. GUI JR, Heng. Stock Prediction Based on Social Media Data via Sen-
    timent Analysis: a Study on Reddit. 2019. MA thesis.
27. KOWSARI, Kamran; JAFARI MEIMANDI, Kiana; HEIDARYSAFA,
    Mojtaba; MENDU, Sanjana; BARNES, Laura; BROWN, Donald.
    Text classification algorithms: A survey. Information. 2019, vol. 10,
    no. 4, p. 150.
28. API. Wikimedia Foundation, 2021. Available also from: https:
    //en.wikipedia.org/wiki/API.
29. Magic: The Gathering. Available also from: https://mtg.fandom.
    com/wiki/Magic:_The_Gathering.
30. Scryfall, LLC. Available also from: https://www.scryfall.com/.
31.   MTGTOP8. Available also from: https://www.mtgtop8.com/.
32. HTTP for Humans. Available also from: https://docs.python-
    requests.org/.
33. JASON MICHAEL BAUMGARTNER, Alexander Seiler. Pushshift
    [https://github.com/pushshift/api]. GitHub, 2021.
34. SCHÜTZE, Hinrich; MANNING, Christopher D; RAGHAVAN,
    Prabhakar. Introduction to information retrieval [https : / / nlp .
    stanford . edu / IR - book / html / htmledition / the - vector -
    space - model - for - scoring - 1 . html]. Cambridge University
    Press Cambridge, 2008.

                                                                     28
BIBLIOGRAPHY

35. LILLEBERG, Joseph; ZHU, Yun; ZHANG, Yanqing. Support vec-
    tor machines and word2vec for text classification with semantic
    features. In: 2015 IEEE 14th International Conference on Cognitive
    Informatics & Cognitive Computing (ICCI* CC). 2015, pp. 136–140.
36. PEDREGOSA, F.; VAROQUAUX, G.; GRAMFORT, A.; MICHEL,
    V.; THIRION, B.; GRISEL, O.; BLONDEL, M.; PRETTENHOFER,
    P.; WEISS, R.; DUBOURG, V.; VANDERPLAS, J.; PASSOS, A.;
    COURNAPEAU, D.; BRUCHER, M.; PERROT, M.; DUCHESNAY,
    E. Scikit-learn: Machine Learning in Python. Journal of Machine
    Learning Research. 2011, vol. 12, pp. 2825–2830.
37. SCHUMAKER, Robert P; CHEN, Hsinchun. Textual analysis
    of stock market prediction using breaking financial news: The
    AZFin text system. ACM Transactions on Information Systems (TOIS).
    2009, vol. 27, no. 2, pp. 1–19.
38. ANTWEILER, Werner; FRANK, Murray Z. Is all that talk just
    noise? The information content of internet stock message boards.
    The Journal of finance. 2004, vol. 59, no. 3, pp. 1259–1294.
39. MIKOLOV, Tomas; CHEN, Kai; CORRADO, Greg; DEAN, Jeffrey.
    Efficient estimation of word representations in vector space. arXiv
    preprint arXiv:1301.3781. 2013.
40. MIKOLOV, Tomas; SUTSKEVER, Ilya; CHEN, Kai; CORRADO,
    Greg; DEAN, Jeffrey. Distributed representations of words and
    phrases and their compositionality. arXiv preprint arXiv:1310.4546.
    2013.
41. DEVLIN, Jacob; CHANG, Ming-Wei; LEE, Kenton; TOUTANOVA,
    Kristina. Bert: Pre-training of deep bidirectional transformers for
    language understanding. arXiv preprint arXiv:1810.04805. 2018.
42. Available also from: https://huggingface.co/nlptown/bert-
    base-multilingual-uncased-sentiment.
43. MCCALLUM, Andrew; NIGAM, Kamal, et al. A comparison
    of event models for naive bayes text classification. In: AAAI-98
    workshop on learning for text categorization. 1998, vol. 752, pp. 41–48.
    No. 1.
44. ZHOU, Zhi-Hua. Ensemble learning. Encyclopedia of biometrics.
    2009, vol. 1, pp. 270–273.

                                                                         29
BIBLIOGRAPHY

45. PRANCKEVIČIUS, Tomas; MARCINKEVIČIUS, Virginijus. Com-
    parison of naive bayes, random forest, decision tree, support vec-
    tor machines, and logistic regression classifiers for text reviews
    classification. Baltic Journal of Modern Computing. 2017, vol. 5, no.
    2, p. 221.
46. MINGERS, John. An empirical comparison of pruning methods
    for decision tree induction. Machine learning. 1989, vol. 4, no. 2,
    pp. 227–243.
47. PATEL, Nikita; UPADHYAY, Saurabh. Study of various decision
    tree pruning methods with their empirical comparison in WEKA.
    International journal of computer applications. 2012, vol. 60, no. 12.
48. BREIMAN, Leo. Bagging predictors. Machine learning. 1996, vol. 24,
    no. 2, pp. 123–140.
49. EFRON, Bradley; TIBSHIRANI, Robert J. An introduction to the
    bootstrap. CRC press, 1994.
50. TOLLES, Juliana; MEURER, William J. Logistic regression: relat-
    ing patient characteristics to outcomes. Jama. 2016, vol. 316, no. 5,
    pp. 533–534.
51. BROWNLEE, Jason. Master Machine Learning Algorithms: discover
    how they work and implement them from scratch. Machine Learning
    Mastery, 2016.
52. MONTGOMERY, Douglas C; PECK, Elizabeth A; VINING, G
    Geoffrey. Introduction to linear regression analysis. John Wiley &
    Sons, 2021.
53. PENG, Chao-Ying Joanne; LEE, Kuk Lida; INGERSOLL, Gary M.
    An introduction to logistic regression analysis and reporting. The
    journal of educational research. 2002, vol. 96, no. 1, pp. 3–14.
54. BÖHNING, Dankmar. Multinomial logistic regression algorithm.
    Annals of the institute of Statistical Mathematics. 1992, vol. 44, no. 1,
    pp. 197–200.
55. WEBB, Geoffrey I. Naïve Bayes. Encyclopedia of machine learning.
    2010, vol. 15, pp. 713–714.

                                                                          30
BIBLIOGRAPHY

56. HEARST, M.A.; DUMAIS, S.T.; OSUNA, E.; PLATT, J.; SCHOLKOPF,
    B. Support vector machines. IEEE Intelligent Systems and their
    Applications. 1998, vol. 13, no. 4, pp. 18–28. Available from doi:
    10.1109/5254.708428.
57.   MCKINNEY, Wes. Data Structures for Statistical Computing in
      Python. In: WALT, Stéfan van der; MILLMAN, Jarrod (eds.).
      Proceedings of the 9th Python in Science Conference. 2010, pp. 56–61.
      Available from doi: 10.25080/Majora-92bf1922-00a.
58. Available also from: https://scikit- learn.org/dev/auto_
    examples/inspection/plot_permutation_importance_multicollinear.
    html.

                                                                        31
A An appendix
Together with the electronic version of our thesis, we include Jupyter
Lab notebook with all the implemented classes.

                                                                   32
You can also read