Predicting market price trends of Magic: The Gathering cards - Masaryk University
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Masaryk University Faculty of Informatics Predicting market price trends of Magic: The Gathering cards Bachelor’s Thesis Pavel Nedělník Brno, Spring 2021
Masaryk University Faculty of Informatics Predicting market price trends of Magic: The Gathering cards Bachelor’s Thesis Pavel Nedělník Brno, Spring 2021
This is where a copy of the official signed thesis assignment and a copy of the Statement of an Author is located in the printed version of the document.
Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Pavel Nedělník Advisor: RNDr. Jaroslav Čechák i
Acknowledgements It is my genuine pleasure to express my deepest gratitude to my mentor RNDr. Jaroslav Čechák. His patience and willingness to help and guide are beyond admiration. ii
Abstract Magic: the Gathering is a famous trading card game, built upon a very strong secondary market. We recognize the potential this market has and explore the possibilities of recent discoveries on the field of natural language processing to try and predict its movements. We use word embeddings and large-scale sentiment analysis to test the hypothesis that the price of cards can be predicted based upon the activity on the social media platform Reddit. We conclude that there is no such connection. iii
Keywords machine learning, NLP, SOBHA iv
Contents 1 Introduction 1 Introduction 1 1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Preliminaries 3 3 Related Work 5 4 Dataset 7 4.1 MTGGoldfish . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 Reddit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5 Feature Analysis 11 5.1 Target Variable . . . . . . . . . . . . . . . . . . . . . . . . 11 5.2 Basic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.3 Word Embedding . . . . . . . . . . . . . . . . . . . . . . 12 5.3.1 Term Frequency-Inverse Document Frequency . 12 5.3.2 Bag of Words . . . . . . . . . . . . . . . . . . . . 13 5.3.3 Word2Vec . . . . . . . . . . . . . . . . . . . . . . 13 5.4 Sentiment . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.5 Mention Detection . . . . . . . . . . . . . . . . . . . . . 14 5.5.1 Naïve Bayes and Bag of Words . . . . . . . . . . 14 5.5.2 Word2Vec . . . . . . . . . . . . . . . . . . . . . . 15 5.6 Composition . . . . . . . . . . . . . . . . . . . . . . . . . 15 6 Predictors 16 6.1 Random Forest . . . . . . . . . . . . . . . . . . . . . . . 16 6.2 Logistic Regression . . . . . . . . . . . . . . . . . . . . . 17 6.3 Naïve Bayes . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.4 Support Vector Machines . . . . . . . . . . . . . . . . . . 19 7 Results 20 7.1 Testing Parameters . . . . . . . . . . . . . . . . . . . . . 20 7.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 v
7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 24 8 Conclusion and Future Work 25 8.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 25 Bibliography 26 A An appendix 32 vi
1 Introduction Magic: the Gathering (MTG), created in 1993 by Richard Garfield Ph.D. and often considered to be the first of its kind, is a trading card game. It’s played with a player-built deck of cards, these cards can have immense value, with some being sold for over $20 000 for an individual copy. The player community consists mainly from the 18-34 old demographic that is estimated to have significant buying power [1]. While the price of the majority of MTG cards does not reach any- where near these heights, it should be clear that understanding this market could be extremely valuable. Many comparisons can be drawn between the prices of individual cards and stock markets and with the ongoing advancements in predicting the stock market, a question proposes itself. Can we predict the fluctuations in MTG card prices? Due to the highly individualistic nature of both supply and demand for these cards in combination with all the usual difficulties with pre- dicting the market, this poses a complex and difficult machine learning task. This area was previously of some interest [2, 3]. We introduce a novel approach by utilizing Social Media Predictions. This growing body of research uses metrics such as general activity and sentiment determined from large-scale social media feeds to create conjectures about the future. The platform of interest is Reddit [4]. A social media site that hosts a significant community of MTG players. We calculate multiple metrics about the activity on this platform and correlate them to the prices gained from MTG Goldfish (MTGG) [5]. The goal of this thesis is to explore whether there is a clear connec- tion between increased social media activity and changes in the prices of MTG cards. We interpret this as a classification problem and try to predict the magnitude and the direction of the price change. Our results suggest that there is no such connection. 1
1. Introduction 1.1 Outline We present the outline of our work: • In chapter 2, we provide the necessary background for the MTG game and the Reddit platform, that are the main focus of our thesis. • In chapter 3, we place our work in the context of the scientific community. We highlight differences and similarities to other works with similar goals. • In chapter 4, we describe the data acquisition process and present our reasoning for each choice. • In chapter 5 we describe the methods we used to process the raw data as obtained in the previous chapter. • In chapter 6 we describe the strategies used to perform the clas- sification task as previously described. • In chapter 7 we present and interpret our experimental results. • In chapter 8 we make the conclusion to our work and discuss potential further research. 2
2 Preliminaries In this chapter we explain more about the character of MTG, focusing on the financial side of things. We define the price of a card and explain the main driving factors behind it. For many players, MTG is a highly competitive game, as most of the player base is organized around tournaments. However, the de- gree to which players compete, varies greatly, all the way from local Friday Night Magic events with little to no barrier of entry to the recent Magic World Championship XXVI with a prize pool of $1 000 000. This combined with the fact that new cards are constantly being released and added to the existing formats forces players that want to stay com- petitive to constantly buy and sell cards from their collection creating a strong player-driven market, which is the main driving factor behind the price of most cards. For the sake of completeness it should be stated that other groups can have significant influence over the price of MTG cards, for example, card collectors or power sellers, however, often these groups are not interested in the same cards that the group of players we described would be interested in. With the use of our expert knowledge, we sidestep this issue by carefully selecting cards so the impact these groups might have will be minimized and so, we take the liberty to not consider them further. The addition of new cards is bound by a set released about four times per year containing from 150 to 200 cards. These cards may have been printed before, and may be reprinted in the future, with slight or more significant cosmetic alterations. In this thesis we focus only on the regular, i.e. the cheapest printing in regular sets, versions of cards, because the price of alternate versions is often driven by the collective aspect of the game and thus is less likely to correlate to new events. The game is played with a certain format in mind. Formats can differ in the fundamental rules of the game, but for us, the most important difference is in the legality of cards. We choose to limit ourselves to only consider cards legal in the Constructed Modern format, we explain this choice more thoroughly in chapter 4.1. Wizards of the Coast [6], the company currently responsible for maintaining the game, provides multiple options for obtaining these 3
2. Preliminaries cards. However, none of these options is particularly efficient, since it is either hindered by randomness or selection. This lack of service leads to a very strong secondary player-based market. Cards can be traded or sold directly between players, but more commonly third- party services such as MTGG, ChannelFireball [7], CardMarket [8], and TCGPlayer [9] are used. There are significant differences between the types of services that these companies provide, but they generally can be separated into two groups. More commonly, as is the case with aforementioned MTGG and ChannelFireball, these companies buy and sell cards just like any normal store would. In contrast to this, CardMarket, TCGPlayer, and similar companies only facilitates the exchange of cards between players that then pay a small fee per use of the service. Understandably, there is a difference between the prices as offered by these services. A question poses itself, what is the actual value of a card and how could we find it? It is common practice within the community to trade and sell cards according to the price trend [10] of CardMarket (for Europe) and TCGPlayer (for America) and its equivalents for other localities. Sadly, none of these services provides access to historic information on these prices. Fortunately, there is an observable and quantifiable relationship between the prices of shop- like and market-like services. The market price trend is consistently about 70% of the price offered by shop-like services and so, this is how we define the card price that we would like to predict. We illustrate this be including this ratio for the cards we focus on in this thesis 2.1. 4
Table 2.1: Ratio of MTGG prices and CardMarket prices Card Name Price Ratio Liliana of the Veil 69.44% Jace, the Mind Sculptor 68.65% Karn Liberated 74.12% Thoughtseize 68.18% Tarmogoyf 71.69% Noble Hierarch 72.27% Horizon Canopy 65.53% 3 Related Work While MTG has sparked a lot of interest [11, 12] very few works focus on the trading aspect of the game. A very notable exception is [2]. The authors focused on the MTG Online environment, an alternate version of the game that is played exclusively online. Despite being online, the game economy is very tightly tied to real-life currency. Players buy the in-game currency in the official Wizards of the Coast shop with a 1:1 ratio for dollars. The opposite process is made possible through the player-based market. They succeeded at creating a trading strategy that outperforms a Buy and Hold Strategy [13], a commonly used baseline model. Some of the sources of information they used were professional MTG blogging sites like ChannelFireball [7] or MTGG. Even though no direct correlation between the general content of these sites was found, the authors managed to show a strong influence of MTGG Budget Magic series on the price of a specific subset of cards. Abstracting from the specifics of our inquiry, we can find several SMP applications in various environments [14, 15, 16, 17, 18]. Of most interest to us being stock market predictions and finance in general, as stated in 1.1 Cards as Market Assets. The most famous work [19] in this area used Google Profile of Mood States to classify the mood of posts on the social media site Twitter [20] and neural networks to correlate them to the Dow Jones Industrial Average (DJIA) [21]. 5
3. Related Work They succeeded at showing that the collective public mood, derived from large-scale Twitter feeds, can be used to significantly improve the accuracy of models predicting the DJIA closing values. Their model achieved a very high accuracy of 87%. Surprisingly, the most impactful mood was the “Calm” mood. GPOMS is no longer available, however, an alternative was developed [22], albeit with a slightly more modest result of 75.5% score in a sequential k-fold cross validation test. Au- thors of [23] developed a sentiment classifier for the Twitter platform and showed that there is a strong connection between the public opin- ion of a company, expressed through tweets, and changes in its stock value. The main contribution of their work is the mood classifier, which managed to achieve an accuracy of 70.5% with N-gram representation and 70.2% for Word2Vec, see chapter 5.3.3. This is very comparable to the human concordance on this topic, which is estimated to be around 70% to 79% [24]. Despite the slightly lower accuracy, the authors argue in favor of the use of Word2Vec citing its "promising accuracy for large datasets and the sustainability in word meaning." While Twitter is the object of interest for most of the recent re- search, MTG community is far more active on the Reddit site, with over 450 000 users on the main Reddit page as opposed to 287 000 on Twitter, and so, this is where our inquiry leads. Despite not being the focal point of interest of the scientific community, there is still much work done for the Reddit platform. In [25] the authors used multiple metrics, some specific for Reddit, like the volume of comments, their language, and popularity to improve the results of neural networks trained to predict the value of different cryptocurrencies. As another example, [26] shows a strong connection between the activity on stock- related subreddits and changes in stock prices. This is again done with the use of sentiment analysis. According to our formulation of the task, see chapter 1, we use a number of methods common for text classification [27] described in more detail in chapter 6. 6
4 Dataset The very first step of our work was acquiring the necessary data. In this chapter we describe the sources for our data, the way it was acquired, its characteristics, and comment about our reasoning for each choice. The dataset consists of two main sources - MTGGoldfish and Reddit. 4.1 MTGGoldfish The most crucial part of our data collection process was obtaining the historic prices of the cards themselves. Multiple sources were con- sidered, some used in the past [3], however, in recent years most of them became unavailable to the public. Eventually, we were left with a singular choice. Despite its shortcomings, namely a paywall, general lack of context such as information about sales, and a lack of API [28], data from the MTGG was chosen to fill this part. This carries addi- tional problems, since the site belongs to a company that is native to America and so, may not represent the entire community. For the most part, this should not be an issue, since the prices are updated regularly to match the competitors and, most importantly, the player market as is said in more detail in chapter 2. However, there are instances when a significant gap between the markets can arise, for example when the worldwide distribution of new cards is hindered in some way. Alternatively, since the player-based market is not moderated in any way, a player may attempt to artificially raise or lower the price of a specific card on the local market through various methods of market manipulation. Due to the high computational requirements of many of the tech- niques we use, and the sheer amount of MTG cards, over 20 000 [29] at the point of writing this thesis, we chose to only focus on a handful of handpicked cards. With our expert knowledge, we chose four main criteria to look at when considering a card: • History • Demand 7
4. Dataset • Price • Format In the first criteria, we looked at cards that were at least seven years in existence. This was to maximize the size of our dataset. We also wanted these cards to be predominantly played in non-rotating formats. In rotating formats, after a certain period of time, cards stop being legal for deck construction. This greatly reduces the time during which these cards are likely to be relevant and in turn diminishes our dataset. In the demand criteria, we looked at tournament play. We made the assumption that heavily played cards are more likely to have both a more varied price history and, more importantly, a higher social media traffic. We considered cards that were played in at least 5% of tournament-winning decks during the majority of their existence. Lastly, the Price criteria were set so that there were possible real-life applications for our discovery. With relatively high postal costs and other such fees associated with buying MTG cards, we considered only cards with a current price above $ 20. Price is also, to some extent, a reflection of demand. We also decided to limit our choices to the Constructed Modern format as the most played non-rotating competitive format. There are many cards that satisfy our criteria within this format and this limitation allowed us to pick a more specific subreddit, see chapter 4.2, with higher concentration of relevant information. It also allowed us to calculate more metrics about the data, since the total amount was lessened. With this choice in mind, we picked cards that are predomi- nantly played in Constructed Modern to limit outside influences. Aforementioned process could be done automatically since this data can be accessed through the combination of ScryFall [30] and MTGTOP8 [31] APIs. With these restrictions in mind, we picked the following cards: Liliana of the Veil, Jace, the Mind Sculptor, Karn Liberated, Thoughtseize, Tarmogoyf, Noble Hierarch, Horizon Canopy. The prices data of these cards were accessed with the use of Python Requests library [32]. MTGG does not provide an API access point, 8
4. Dataset so the data was accessed with the use of empirical knowledge. The dataset consists from daily updated prices and corresponding dates, we worked under the assumption that the prices for any given date were assigned at midnight, however, we were unable to confirm whether that is the case. 4.2 Reddit The majority of our work focuses on the social media platform Reddit. Reddit lets users join communities called subreddits, where they can post submissions or react to them with comments and votes. Submis- sions (comments) can be at most 40 000 (10 000) characters long and can include anything from emoticons, markup to urls. Subreddits have topics listed in their description. They can range from hobbies to jokes or world news. As suggested in the previous chapter, we decided to focus on the r/modernmagic subreddit, a subreddit dedicated to the players of the Modern Constructed format, we then also used r/magicTCG, a general MTG subreddit, as a control for our findings. For r/modernmagic we acquired all records of activity during the time frame of seven years, from 1. 1. 2014 to 1. 1. 2021. For r/magicTCG we only looked at three years, from 1. 1. 2014 to 1. 1. 2017. This data was obtained through the Pushshift [33] API, again with the use of Requests library. This API was chosen because it provides the option to search for user activity by timestamps, which was crucial for our research. It provides three endpoints - subreddit, submission, and comment. The subreddit endpoint sadly was not accessible at the time of writing this thesis. It could have been used to gain more comprehensive insight into the community, seeing that we would not be limited to manually chosen subreddits. We approached submis- sions and comments separately, due to the slight differences in their character and rules. 9
4. Dataset For r/modernmegic (r/magicTCG), we collected over 53 000 (110 000) submissions and 1 320 000 (3 000 000) comments. Out of the raw data returned by the API, we were only interested in some parameters. Created_utc was used to place submissions and comments into the correct time period. We used id and parent_id to as- sign comments to corresponding submissions. We were also interested in the score parameter. Users of the platform can express their opinion on a comment or submission by voting. The score is the difference between positive and negative votes. We then combined the title and the text of submissions to a single feature to match comments. We also preprocessed the text parameter further with: 1. Regular Expressions were used to remove unnecessary data such as urls and markup commands and also to replace common occurrences such as user tags, references to other subreddits, and numbers with tokens to better generalize context. An example of such change is replacing "4x" with "token_count". 2. Stopword Removal Stopwords are commonly used words that carry little to no mean- ing. For example: "the", "and", or "have". 3. Tokenization Each text is transformed into a list of sentences and each sentence is transformed into a list of words. Punctuation is removed. 10
5 Feature Analysis Next step after data collection is to further process this data to be usable by predictors. We do this either directly, by word embedding, or indirectly, by calculating descriptive metrics. First, we calculate the target variable for our classification problem. We then extract input variables from the Reddit dataset. These features are separated into three groups: • Basic • Word Embedding • Sentiment • Mention Detection 5.1 Target Variable To construct the target variable for our classification problem, we use the MTGG dataset, obtained as described in chapter 4.1. First, the price difference from the last time interval is calculated. For a specific card i and the j-th time interval of length n is calculated as 4.1 where pricea,b refers the the price of a card a on a day b starting with day zero. diff(i, j) = pricei,j∗n − pricei,( j+1)∗n (5.1) This difference is then transformed into a classification problem in one of the following ways: ( 1 if diff(i, j) > 0 sign(i, j) = (5.2) 0 otherwise ( 1 if |diff(i, j)| > t size(i, j) = (5.3) 0 otherwise Where t is a threshold set as the average price difference in the training dataset. 11
5. Feature Analysis 5.2 Basic For each submission, we calculate basic descriptive features such num- ber of comments, score, and average score of comments. For comments, we only calculate score. 5.3 Word Embedding One of the crucial features of our model are word embeddings, vector representations of texts. The central idea of these methods is that texts with similar meaning should have similar vectors. This similarity can then be measured, for example by cosine distance [34], as is done in this thesis. v1 · v2 dist(v1, v2) = (5.4) |v1| · |v2| The values range between −1 and 1, where −1 indicates opposite meaning and 1 indicates a synonym. Unlike plain text, these vectors can then be fed to a predictor as a fixed set of features. 5.3.1 Term Frequency-Inverse Document Frequency Both the methods we introduce later on are designed to embed words, however, we would like to embed sentences and even entire para- graphs. Naïve approach to this could be to average the embeddings of all the words within the text we would like to embed. This approach has the unwanted effect that words or phrases that repeat often, and consequently are less likely to carry meaning, are propagated more. We can combat this by multiplying the embeddings with the Term Frequency-Inverse Document Frequency (TF-IDF) of the correspond- ing word [35]. TF-IDF is the combination of two statistics, term frequency and in- verse document frequency. Say we wish to calculate the TF-IDF of a word w within the document di from the corpus D. The term fre- quency is given by the formula (5.6), inverse document frequency by the (5.7) formula, and consequently the TF-IDF can be calculated as (5.8), where wdi denotes the number of occurrences of word w in the document d1 , |di | denotes the number of word within the docu- ment d1 , | D | is the number of document within the corpus D, and 12
5. Feature Analysis |{dkw ∈ d, d ∈ D }| is the number of documents containing word w in the corpus D. w di tf = (5.5) | di | |D| idf = ln (5.6) |{dkw ∈ d, d ∈ D }| w di |D| tfidf = · ln (5.7) | di | |{dkw ∈ d, d ∈ D }| 5.3.2 Bag of Words Bag of Words (BOW) [36] is a simple and intuitive way to perform word embeddings. For a given corpus, all present words are ordered and assigned a number based on this ordering. This vocabulary is then stored. A word is embedded as a vector of zeroes with a one at the position that corresponds to the position in the vocabulary, if there is such a word. Despite its simplicity, it is still one of the go-to algorithms [37, 38] for this task. 5.3.3 Word2Vec More complex approach is with Word2Vec (W2V) [39, 40]. Very im- portant difference to the BOW model is that the dimension of the resulting vector stays the same no matter the size of our corpus. It uses a shallow, two-layer neural network to construct the embeddings. There are two approaches to training the model. Continuous bag of words (CB) trains the network to predict the target word from sur- rounding context words. Skip-gram (SG) model is trained to predict the context words within a certain range of the target word. 5.4 Sentiment Sentiment analysis has been an integral part of SMP [19, 22]. We use the Bidirectional Encoder Representations from Transformers (BERT) BASE [41] multilingual model tuned for sentiment analysis of product reviews, kindly provided by Hugging Face [42]. BERT models use pre-trained deep-learning algorithms that then can be fine-tuned 13
5. Feature Analysis for specific task to quickly create state of the art models for natural language processing. BERT BASE is the smaller of the two versions of the model (as opposed to BERT LARGE). 5.5 Mention Detection One of the issues we encountered when making this thesis was that unlike many of our predecessors we don’t have a specific enough platform to where we could consider all activity as relevant. For each given subreddit we have up to thousands of potential cards of inter- est. Naïve approach would be to simply consider only submissions or comments that directly mention the card name, this would be ill advised, because it is very common that players refer to cards with alternate names, that don’t necessarily have anything in common with the actual name, for example a card named Dark Confidant is most of- ten referred to simply as Bob. This is very similar to keyword extraction We define a relevant submission as a submission directly contain- ing a word that is either the name of the card that is currently the object of interest or a word that is detected by one of the following approaches as a synonym. For comments, we considered all comments either satisfying the same condition or as assigned to a submission that does. 5.5.1 Naïve Bayes and Bag of Words A simple and intuitive approach is a Naïve Bayes classifier, see chapter 6.3, with BOW representation of the corpus, equivalent to the multi- nomial model from [43], described earlier in this chapter. With the hypothesis that names or other mentions of the cards have the biggest impact on their price, we trained the NB BOW and later LR on the BOW preprocessed dataset with labels set according to the sign of the price change of a time period from one to seven days. We evaluated this method by studying the weights assigned to each corresponding word. 14
5. Feature Analysis 5.5.2 Word2Vec More sophisticated solution to this task is with Word2Vec represen- tation as described earlier. We can compare the embeddings of the card name to any given word and, for similarity, see (4.5), above a certain threshold t, consider them to be a synonym. The threshold was empirically set to be 0.8. 5.6 Composition The last step of the feature extraction process is to combine our sources to one dataset. We do this by splitting the Reddit dataset into individ- ual days that are then combined to form a single entry to match the MTGG data. The feature values are averaged over the entire day. To maximize training examples, we then use a technique named Rolling Window to combine multiple days into a single entry. Notably, this method does not significantly affect the dataset length. It can be described as follows: We wish to transform our dataset D with entries a1 , a2 , ...an to the new dataset with entries b1 , b2 , ...bn−d by combining d + 1 days to a single entry, then the i-th entry of the new dataset can be described as (5.9), where Fa j is the set of input variables of the j-th entry of the dataset D. +d i[ bi = Fa j (5.8) j =i 15
6 Predictors In this chapter we introduce the algorithms that are used to predict the price movement. We interpret this as a classification task. Our model predicts the target variable (class), as defined in chapter 5.1, from a set of input variables (features). 6.1 Random Forest Random Forest Classifier (RF) originates from DT, but combats the tendency to overfit with the use of the Bagging algorithm. It falls within the category of ensemble learning [44]. It is one of the go-to algorithms for many tasks, including text classification [45]. Decision Tree DT can be used for both classification and regression. During the train- ing phase a tree-like structure is constructed in the following way: Nodes represent a split in the dataset along the value of a specific input variable. Leaves correspond to a target variable. Thus, a branch of nodes leading to a leaf represents a series of decisions based on the values of input variables that determine the target variable. Im- portantly, input variables for DT have to be categorical. Variables that do not satisfy this condition will be discretized as the first step of the algorithm. The tree construction is most often realized top-to-bottom: Starting with the entire training dataset a feature is chosen based on a number of metrics, see below. This feature becomes a node. The dataset is then split along this feature producing subsets of the initial dataset. This algorithm then repeats on each of these subsets until there are no more suitable features to select or the subset has only one value. The tree is then usually pruned [46], non-perspective branches are combined to a single leaf, to mitigate overfitting. Pruning can also be done during the construction by modifying the stop condition [47]. 16
6. Predictors The feature selection process has a big impact on the resulting accuracy and the ability to generalize well. We used two metric in this thesis: Entropy and Gini Index. For a given dataset D, on a given input variable F with 1, ..., n classes, Entropy measures the purity of the split along that variable. It is calculated as (6.1), where |i | is the number of entries that belong to the class i and | D | is the size of the dataset. Its values range from 0 to 1, where 1 signifies the highest possible impurity. n |i | |i | E( F ) = ∑ − | D| ln | D| (6.1) i =1 Gini Index is very similar in its function. It is given by the formula (6.2). n |i | 2 GI ( F ) = 1 − ∑ ( ) (6.2) i =1 |D| Bagging Also called Bootstrap Aggregating [48, 49] is a method used to reduce variance and avoid overfitting for many different algorithms. Given a training dataset, Bagging generates a certain number of new datasets with lesser or equal size. This is done by bootstrapping i.e. by sampling from D uniformly, each unique element has equal chance to be selected, with replacement, each element can be selected multiple times. On each of the newly generated datasets a predictor, in our case a DT, is trained and stored. Classification is performed by counting all outcomes and selecting the most occurring, regression is done by averaging over all results. 6.2 Logistic Regression Logistic Regression [50, 51], despite what its name would suggest, is a model used for classification. In some sense, it is an adaptation of the Linear Regression [52] model for classification. First, regression is 17
6. Predictors performed, as it would be with Linear Regression, i.e. the value of each input variable is multiplied by its bias as determined by the model during the training process. LR calculates the logarithm of odds of a target variable as a linear combination of input variables. This is then transformed by the logistic function to change the range from h−∞, ∞i to h0, 1i so that classification can be performed by comparing the result with a threshold, for example 0.5. To demonstrate, assuming we want to calculate the probability of target variable Y being the class 1, based on the input variables x1 , x2 , ...xn , and during the training process, we have obtained the weights w1 , w2 , ...wn and bias b we can use the formula: eb+w1 · x1 +w2 · x2 ...wn · xn P (Y = 1 ) = (6.3) 1 + eb+w1 · x1 +w2 · x2 ...wn · xn To show this is LR as we defined it: P(Y = 1) + P(Y = 1) · eb+w1 · x1 +w2 · x2 ...wn · xn = eb+w1 · x1 +w2 · x2 ...wn · xn (6.4) P(Y = 1) = (1 − P(Y = 1)) · eb+w1 · x1 +w2 · x2 ...wn · xn (6.5) P (Y = 1 ) ln = b + w1 · x1 + w2 · x2 ...wn · xn (6.6) 1 − P (Y = 1 ) LR makes a number of important assumptions about the dataset [53]. As described here, it assumes that the target variable is dichoto- mous, but there are variations that do not have this restriction [54]. It also assumes that all input variables are independent, that there is no noise in the dataset, and a linear dependence between the input variables and the target variable. This can lead to drastically lowered accuracy when not all of these assumptions are met. 6.3 Naïve Bayes Naïve Bayes (NB) [55] is a classifier based on the Bayes Theorem. It models the probability of the target variable as the multiplication 18
6. Predictors of the conditional probabilities of all input variables. It makes the assumption that all input variables are strongly independent, despite the fact that this condition is rarely met in real-life examples, it often outperforms more complex models. Its main advantage is its relative simplicity and the resulting speed. It is also resilient to noise in the training dataset [55]. For fine-tuning we used the Multinomial and Gaussian variants as defined [] 6.4 Support Vector Machines Support Vector Machines [35, 56] can, with slight differences, be used for both regression and classification. Since this is a lengthy topic, we will only explain dichotomous classification. A training dataset of m entries with n target variables can be represented by an n-dimensional space, where each entry corresponds to a point p1 , p2 , ...pm given by the values of its target variables. SVC seeks to separate this space by a hyperplane, subspace with the dimension n − 1. Ideally, this hyper- plane would separate the space into two half spaces, each containing only points corresponding to entries with the same class y. Out of the hyperplanes for which this holds, we choose the hyperplane so that the distance from it to the nearest points of both classes is maximized. These points are called support vectors are sufficient to determine our classifier. To extend this model into situation when the classes are not separable without error, soft-margin adaptation is used, it has a regularization parameter C that balances the error allowance. In the case that the data is not linearly separable, it may be separable with the use of a kernel trick. The trick is to transform the space into a higher dimension and perform the linear separation there. Multiple kernels can be used. The most popular choices: polynomial, rbf, sigmoid, and the aforementioned linear are used in this thesis. 19
7 Results In this chapter we present the results gained by evaluating the pro- posed model with the data we collected. We first provide the overview of features we calculated in chapter 5 and describe the testing method- ology. We then introduce the methods we will use for evaluating our results and perform the evaluation. 7.1 Testing Parameters Based on our experimental results, we decided to generate 8 datasets for each card for systematic testing. These datasets always contain the following features: • score see • comment_score • sentiment The differences are in the choices for these features: • size Rolling Window size - 3, 7 days • delay Label delay - 1, 3 days • word2vec skip-gram, continuous bag model or no embedding Additionally, we use the model-specific parameters as described in chapter in chapter 6. This is an overview: • SVM - kernel: linear, polynomial, rbf, sigmoid; regularization parameter C: 1, 5, 10 • LR - balanced classes, C: 1, 5, 10, penalty: l1, l2, elastic net • RF - balanced classes, split criterion: gini, entropy 20
7. Results • NB - balanced classes, Multinomial, Gaussian We also experimented with two formulations, see 5.1, of the classi- fication problem: sign, size, however, size interpretation was rejected due to it’s poor results. Based on our initial exploration, we decided to exclude some of the features described in chapter 5. With the growing size of our dataset, BOW word embedding quickly became unusable due to the rapidly growing vocabulary. Instead of implementing techniques to combat this, we decided to only continue using W2V, since it had consistently better results and didn’t have any such issues. For mention detection, we also decided to forego NB-BOW. Similarly, this method did not scale well, however, the real issue was with accuracy. Even with normalization, the method only identified common phrases as mentions, which was undesirable. This, however, does not necessarily speak to the effectiveness of the method, since the hypothesis we operated under, see chapter 5.5.1, may have been wrong, because we were ultimately unable to prove that there is a significant connection between the card price and Reddit activity. 7.2 Evaluation We use grid search with 3-fold cross-validation over all parameters to find the best model for the validation set, we present its results for the test dataset. We average these results over all the considered cards, see 4.1. Because we are dealing with very imbalanced classes, simple metrics such as accuracy are simply not sufficient. Our evaluation method of choice is F-measure. It is defined as (7.3), where precision and recall are defined as (7.1) and (7.2). TP is the number of correctly predicted positive cases, FP is the number of incorrectly predicted negative cases, and FN is the number of incorrectly predicted positive cases. Its values range from 0 to 1, where 1 signifies the best possible result. TP precision = (7.1) TP + FP 21
7. Results Table 7.1: F-measure of the baseline models Model F-measure stratified 0.41 uniform 0.51 TP recall = (7.2) TP + FN precision · recall f-measure = 2 · (7.3) precision + recall We then also compare these results to baseline models to gain greater understanding of what they mean. As our baseline we use these strategies: • stratified makes guesses with respect to the class distribution. • uniform each class is predicted with the same probability. 7.3 Results First, we present the baseline models the table 7.1. The results for our predictors are presented in the table 7.2. One of the possible interpretations for some of our poor results could be dependence within the dataset. We investigated this hypothe- sis by using pearson correlation coefficient as provided by pandas [57] to check for pairwise correlation. We then used methods described in the SciKit [36] documentation [58] and used spearman correlation coeficient to look for multicollinear dependencies 7.1. Our hypothesis was rejected. 22
7. Results Table 7.2: F-measure of our models Model Best Average Often Selected Parameters SVM, SG 0.63 0.47 sigmoid, c = 5 RF, SG 0.35 0.17 no significant difference LR, SG 0.37 0.09 no significant difference NB, SG 0.42 0.36 Gaussian SVM, CB 0.57 0.38 sigmoid, c = 5 RF, CB 0.27 0.16 gini LR, CB 0.38 0.13 no significant difference NB, CB 0.48 0.41 Gaussian SVM, NO 0.31 0.20 no significant difference RF, NO 0.19 0.08 no significant difference LR, NO 0.28 0.13 no significant difference NB, NO 0.35 0.16 no significant difference Figure 7.1: Multicollinearity Dendrogram 23
7. Results 7.4 Discussion The most important observation is that we were unable to improve on the baseline model. We achieved consistently better results for the r/modernmagic subred- dit, rather than t/magicTCG. This is likely due to the lesser amount of noise. The best performing model was SVM. We made two observations about the W2V representation for mention detection. Misspelling and bad grammar are common problems in SMP. Consider the two most similar words for the card named Liliana of the Veil - "lilliana", "lilianna." Since W2V can measure only similarity on the syntactic level, it often mistook cards. For example most similar words to Karn Liberated are "ugin" and "wurmcoil." Both are parts of names of similar cards commonly played together. We decided not to combat this, because it is not unreasonable that the price of cards that are often used in similar context would be connected. 24
8 Conclusion and Future Work Our objective was to explore the possibilities of Social Media Pre- dictions in a new environment. We used a wide array of common approaches and, despite our best efforts, were unable to improve the baseline models. 8.1 Future Work Our next option would be to use recurrent neural networks. Even though the sentiment classifier we used performed reasonably well in out limited testing, a possible avenue of further research would be to develop a similar model tuned specifically for the Reddit envi- ronment. One of the significant issues we encountered was speed. We chose Python as our programming language of choice and some of the com- putations we had to make took up to entire days. While this is not unheard-of in the field of natural language processing, perhaps a C++ implementation would serve us much better in this regard. 25
Bibliography 1. MARDER, Andrew. "Magic: The Gathering" – Hasbro’s Key to Growth. The Motley Fool, 2014. Available also from: https:// www.fool.com/investing/general/2014/04/05/magic- the- gathering-hasbros-key-to-growth.aspx. 2. DI NAPOLI, MATTEO. Multi-asset trading with reinforcement learning: an application to magic the gathering online. 2018. 3. SAKAJI, Hiroki; KOBAYASHI, Akio; KOHANA, Masaki; TAKANO, Yasunao; IZUMI, Kiyoshi. Card Price Prediction of Trading Cards Using Machine Learning Methods. In: International Conference on Network-Based Information Systems. 2019, pp. 705–714. 4. Advance Publications, Inc. Available also from: https://www. reddit.com/. 5. MTGGoldfish, Inc. Available also from: https://www.mtggoldfish. com/. 6. Wizards of the Coast. Available also from: https://company. wizards.com/. 7. ChannelFireball. Available also from: https://shop.channelfireball. com/. 8. The Sammelkartenmarkt GmbH & Co. KG. Available also from: https://www.cardmarket.com/. 9. TCGplayer, Inc. Available also from: https://www.tcgplayer. com/. 10. MITCHELL, Cory. Trend Definition and Trading Tactics. Investo- pedia, 2021. Available also from: https://www.investopedia. com/terms/t/trend.asp. 11. WARD, Colin D; COWLING, Peter I. Monte Carlo search applied to card selection in Magic: The Gathering. In: 2009 IEEE Sympo- sium on Computational Intelligence and Games. 2009, pp. 9–16. 12. CHURCHILL, Alex; BIDERMAN, Stella; HERRICK, Austin. Magic: The gathering is Turing complete. arXiv preprint arXiv:1904.09828. 2019. 26
BIBLIOGRAPHY 13. BEERS, Brian. How a Buy-and-Hold Strategy Works. Investopedia, 2021. Available also from: https : / / www . investopedia . com / terms/b/buyandhold.asp. 14. ROUSIDIS, Dimitrios; KOUKARAS, Paraskevas; TJORTJIS, Chris- tos. Social media prediction: a literature review. Multimedia Tools and Applications. 2020, vol. 79, no. 9, pp. 6279–6311. 15. MCCLELLAN, Chandler; ALI, Mir M; MUTTER, Ryan; KROUTIL, Larry; LANDWEHR, Justin. Using social media to monitor men- tal health discussions- evidence from Twitter. Journal of the Amer- ican Medical Informatics Association. 2017, vol. 24, no. 3, pp. 496– 502. 16. OIKONOMOU, Lazaros; TJORTJIS, Christos. A method for pre- dicting the winner of the usa presidential elections using data extracted from twitter. In: 2018 South-Eastern European Design Automation, Computer Engineering, Computer Networks and Society Media Conference (SEEDA_CECNSM). 2018, pp. 1–8. 17. ASUR, S.; HUBERMAN, B. A. Predicting the Future with Social Media. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. 2010, vol. 1, pp. 492– 499. Available from doi: 10.1109/WI-IAT.2010.63. 18. KRYVASHEYEU, Yury; CHEN, Haohui; OBRADOVICH, Nick; MORO, Esteban; VAN HENTENRYCK, Pascal; FOWLER, James; CEBRIAN, Manuel. Rapid assessment of disaster damage using social media activity. Science advances. 2016, vol. 2, no. 3, e1500779. 19. BOLLEN, Johan; MAO, Huina; ZENG, Xiaojun. Twitter mood predicts the stock market. Journal of computational science. 2011, vol. 2, no. 1, pp. 1–8. 20. Twitter, Inc. Available also from: https://twitter.com/. 21. GANTI, akhilesh. Dow Jones Industrial Average. Investopedia, 2021. Available also from: https://www.investopedia.com/terms/d/ djia.asp. 22. MITTAL, Anshul; GOEL, Arpit. Stock prediction using twitter sentiment analysis. Standford University, CS229. 2012, vol. 15. 27
BIBLIOGRAPHY 23. PAGOLU, Venkata Sasank; REDDY, Kamal Nayan; PANDA, Gana- pati; MAJHI, Babita. Sentiment analysis of Twitter data for pre- dicting stock market movements. In: 2016 international conference on signal processing, communication, power and embedded system (SCOPES). 2016, pp. 1345–1350. 24. ELLIS, Ben. Available also from: https://brnrd.me/social- sentiment-sentiment-analysis/. 25. GLENSKI, Maria; WENINGER, Tim; VOLKOVA, Svitlana. Im- proved forecasting of cryptocurrency price using social signals. arXiv preprint arXiv:1907.00558. 2019. 26. GUI JR, Heng. Stock Prediction Based on Social Media Data via Sen- timent Analysis: a Study on Reddit. 2019. MA thesis. 27. KOWSARI, Kamran; JAFARI MEIMANDI, Kiana; HEIDARYSAFA, Mojtaba; MENDU, Sanjana; BARNES, Laura; BROWN, Donald. Text classification algorithms: A survey. Information. 2019, vol. 10, no. 4, p. 150. 28. API. Wikimedia Foundation, 2021. Available also from: https: //en.wikipedia.org/wiki/API. 29. Magic: The Gathering. Available also from: https://mtg.fandom. com/wiki/Magic:_The_Gathering. 30. Scryfall, LLC. Available also from: https://www.scryfall.com/. 31. MTGTOP8. Available also from: https://www.mtgtop8.com/. 32. HTTP for Humans. Available also from: https://docs.python- requests.org/. 33. JASON MICHAEL BAUMGARTNER, Alexander Seiler. Pushshift [https://github.com/pushshift/api]. GitHub, 2021. 34. SCHÜTZE, Hinrich; MANNING, Christopher D; RAGHAVAN, Prabhakar. Introduction to information retrieval [https : / / nlp . stanford . edu / IR - book / html / htmledition / the - vector - space - model - for - scoring - 1 . html]. Cambridge University Press Cambridge, 2008. 28
BIBLIOGRAPHY 35. LILLEBERG, Joseph; ZHU, Yun; ZHANG, Yanqing. Support vec- tor machines and word2vec for text classification with semantic features. In: 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). 2015, pp. 136–140. 36. PEDREGOSA, F.; VAROQUAUX, G.; GRAMFORT, A.; MICHEL, V.; THIRION, B.; GRISEL, O.; BLONDEL, M.; PRETTENHOFER, P.; WEISS, R.; DUBOURG, V.; VANDERPLAS, J.; PASSOS, A.; COURNAPEAU, D.; BRUCHER, M.; PERROT, M.; DUCHESNAY, E. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011, vol. 12, pp. 2825–2830. 37. SCHUMAKER, Robert P; CHEN, Hsinchun. Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM Transactions on Information Systems (TOIS). 2009, vol. 27, no. 2, pp. 1–19. 38. ANTWEILER, Werner; FRANK, Murray Z. Is all that talk just noise? The information content of internet stock message boards. The Journal of finance. 2004, vol. 59, no. 3, pp. 1259–1294. 39. MIKOLOV, Tomas; CHEN, Kai; CORRADO, Greg; DEAN, Jeffrey. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. 2013. 40. MIKOLOV, Tomas; SUTSKEVER, Ilya; CHEN, Kai; CORRADO, Greg; DEAN, Jeffrey. Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546. 2013. 41. DEVLIN, Jacob; CHANG, Ming-Wei; LEE, Kenton; TOUTANOVA, Kristina. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. 2018. 42. Available also from: https://huggingface.co/nlptown/bert- base-multilingual-uncased-sentiment. 43. MCCALLUM, Andrew; NIGAM, Kamal, et al. A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization. 1998, vol. 752, pp. 41–48. No. 1. 44. ZHOU, Zhi-Hua. Ensemble learning. Encyclopedia of biometrics. 2009, vol. 1, pp. 270–273. 29
BIBLIOGRAPHY 45. PRANCKEVIČIUS, Tomas; MARCINKEVIČIUS, Virginijus. Com- parison of naive bayes, random forest, decision tree, support vec- tor machines, and logistic regression classifiers for text reviews classification. Baltic Journal of Modern Computing. 2017, vol. 5, no. 2, p. 221. 46. MINGERS, John. An empirical comparison of pruning methods for decision tree induction. Machine learning. 1989, vol. 4, no. 2, pp. 227–243. 47. PATEL, Nikita; UPADHYAY, Saurabh. Study of various decision tree pruning methods with their empirical comparison in WEKA. International journal of computer applications. 2012, vol. 60, no. 12. 48. BREIMAN, Leo. Bagging predictors. Machine learning. 1996, vol. 24, no. 2, pp. 123–140. 49. EFRON, Bradley; TIBSHIRANI, Robert J. An introduction to the bootstrap. CRC press, 1994. 50. TOLLES, Juliana; MEURER, William J. Logistic regression: relat- ing patient characteristics to outcomes. Jama. 2016, vol. 316, no. 5, pp. 533–534. 51. BROWNLEE, Jason. Master Machine Learning Algorithms: discover how they work and implement them from scratch. Machine Learning Mastery, 2016. 52. MONTGOMERY, Douglas C; PECK, Elizabeth A; VINING, G Geoffrey. Introduction to linear regression analysis. John Wiley & Sons, 2021. 53. PENG, Chao-Ying Joanne; LEE, Kuk Lida; INGERSOLL, Gary M. An introduction to logistic regression analysis and reporting. The journal of educational research. 2002, vol. 96, no. 1, pp. 3–14. 54. BÖHNING, Dankmar. Multinomial logistic regression algorithm. Annals of the institute of Statistical Mathematics. 1992, vol. 44, no. 1, pp. 197–200. 55. WEBB, Geoffrey I. Naïve Bayes. Encyclopedia of machine learning. 2010, vol. 15, pp. 713–714. 30
BIBLIOGRAPHY 56. HEARST, M.A.; DUMAIS, S.T.; OSUNA, E.; PLATT, J.; SCHOLKOPF, B. Support vector machines. IEEE Intelligent Systems and their Applications. 1998, vol. 13, no. 4, pp. 18–28. Available from doi: 10.1109/5254.708428. 57. MCKINNEY, Wes. Data Structures for Statistical Computing in Python. In: WALT, Stéfan van der; MILLMAN, Jarrod (eds.). Proceedings of the 9th Python in Science Conference. 2010, pp. 56–61. Available from doi: 10.25080/Majora-92bf1922-00a. 58. Available also from: https://scikit- learn.org/dev/auto_ examples/inspection/plot_permutation_importance_multicollinear. html. 31
A An appendix Together with the electronic version of our thesis, we include Jupyter Lab notebook with all the implemented classes. 32
You can also read