Predictive Embeddings for Hate Speech Detection on Twitter
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Predictive Embeddings for Hate Speech Detection on Twitter Rohan Kshirsagar1 Tyrus Cukuvac1 Kathleen McKeown1 Susan McGregor2 1 Department of Computer Science at Columbia University 2 School of Journalism at Columbia University rmk2161@columbia.edu thc2125@columbia.edu kathy@cs.columbia.edu sem2196@columbia.edu Abstract group or is intended to be derogatory, to humili- ate, or to insult the members of the group.” arXiv:1809.10644v1 [cs.CL] 27 Sep 2018 We present a neural-network based approach In this paper, we present a neural classification to classifying online hate speech in general, system that uses minimal preprocessing to take ad- as well as racist and sexist speech in par- vantage of a modified Simple Word Embeddings- ticular. Using pre-trained word embeddings and max/mean pooling from simple, fully- based Model (Shen et al., 2018) to predict the oc- connected transformations of these embed- currence of hate speech. Our classifier features: dings, we are able to predict the occurrence of hate speech on three commonly used publicly • A simple deep learning approach with few available datasets. Our models match or out- parameters enabling quick and robust train- perform state of the art F1 performance on all ing three datasets using significantly fewer param- eters and minimal feature preprocessing com- • Significantly better performance than two pared to previous methods. other state of the art methods on publicly available datasets 1 Introduction • An interpretable approach facilitating analy- The increasing popularity of social media plat- sis of results forms like Twitter for both personal and politi- cal communication (Lapowsky, 2017) has seen a In the following sections, we discuss related well-acknowledged rise in the presence of toxic work on hate speech classification, followed by a and abusive speech on these platforms (Hillard, description of the datasets, methods and results of 2018; Drum, 2017). Although the terms of ser- our study. vices on these platforms typically forbid hate- 2 Related Work ful and harassing speech, enforcing these rules has proved challenging, as identifying hate speech Many efforts have been made to classify hate speech at scale is still a largely unsolved prob- speech using data scraped from online message fo- lem in the NLP community. Waseem and Hovy rums and popular social media sites such as Twit- (2016), for example, identify many ambiguities ter and Facebook. Waseem and Hovy (2016) ap- in classifying abusive communications, and high- plied a logistic regression model that used one- to light the difficulty of clearly defining the parame- four-character n-grams for classification of tweets ters of such speech. This problem is compounded labeled as racist, sexist or neither. Davidson et al. by the fact that identifying abusive or harassing (2017) experimented in classification of hateful as speech is a challenge for humans as well as au- well as offensive but not hateful tweets. They ap- tomated systems. plied a logistic regression classifier with L2 reg- Despite the lack of consensus around what con- ularization using word level n-grams and various stitutes abusive speech, some definition of hate part-of-speech, sentiment, and tweet-level meta- speech must be used to build automated systems data features. to address it. We rely on Davidson et al. (2017)’s Additional projects have built upon the data sets definition of hate speech, specifically: “language created by Waseem and/or Davidson. For exam- that is used to express hatred towards a targeted ple, Park and Fung (2017) used a neural network
approach with two binary classifiers: one to pre- Dataset Labels and Counts Total dict the presence abusive speech more generally, and another to discern the form of abusive speech. Sexist Racist Neither SR Zhang et al. (2018), meanwhile, used pre- 3086 1924 10,898 15,908 trained word2vec embeddings, which were then Hate Speech Not Hate Speech HATE fed into a convolutional neural network (CNN) 1430 23,353 24,783 with max pooling to produce input vectors for Harassment Non Harassing a Gated Recurrent Unit (GRU) neural network. HAR 5,285 15,075 20,360 Other researchers have experimented with us- ing metadata features from tweets. Founta et al. Table 1: Dataset Characteristics (2018) built a classifier composed of two sepa- rate neural networks, one for the text and the other for metadata of the Twitter user, that were trained set we used, which we call HAR, was collected by jointly in interleaved fashion. Both networks used Golbeck et al. (2017); we removed all retweets re- in combination - and especially when trained us- ducing the dataset to 20,000 tweets. Tweets were ing transfer learning - achieved higher F1 scores labeled as “Harrassing” or “Non-Harrassing”; hate than either neural network classifier alone. speech was not explicitly labeled, but treated as In contrast to the methods described above, an unlabeled subset of the broader “Harrassing” our approach relies on a simple word embedding category(Golbeck et al., 2017). (SWEM)-based architecture (Shen et al., 2018), reducing the number of required parameters and 4 Transformed Word Embedding Model length of training required, while still yielding im- (TWEM) proved performance and resilience across related Our training set consists of N examples classification tasks. Moreover, our network is able {X i , Y i }N i i=1 where the input X is a sequence of to learn flexible vector representations that demon- tokens w1 , w2 , ..., wT , and the output Y i is the strate associations among words typically used in numerical class for the hate speech class. Each hateful communication. Finally, while metadata- input instance represents a Twitter post and thus, based augmentation is intriguing, here we sought is not limited to a single sentence. to develop an approach that would function well We modify the SWEM-concat (Shen et al., even in cases where such additional data was miss- 2018) architecture to allow better handling of in- ing due to the deletion, suspension, or deactivation frequent and unknown words and to capture non- of accounts. linear word combinations. 3 Data 4.1 Word Embeddings In this paper, we use three data sets from the liter- Each token in the input is mapped to an embed- ature to train and evaluate our own classifier. Al- ding. We used the 300 dimensional embeddings though all address the category of hateful speech, for all our experiments, so each word wt is mapped they used different strategies of labeling the col- to xt ∈ R300 . We denote the full embedded se- lected data. Table 1 shows the characteristics of quence as x1:T . We then transform each word the datasets. embedding by applying 300 dimensional 1-layer Data collected by Waseem and Hovy (2016), Multi Layer Perceptron (MLP) Wt with a Recti- which we term the Sexist/Racist (SR) data set1 , fied Liner Unit (ReLU) activation to form an up- was collected using an initial Twitter search fol- dated embedding space z1:T . We find this bet- lowed by analysis and filtering by the authors and ter handles unseen or rare tokens in our training their team who identified 17 common phrases, data by projecting the pretrained embedding into a hashtags, and users that were indicative of abu- space that the encoder can understand. sive speech. Davidson et al. (2017) collected the HATE dataset by searching for tweets using a 4.2 Pooling lexicon provided by Hatebase.org. The final data We make use of two pooling methods on the up- 1 Some Tweet IDs/users have been deleted since the cre- dated embedding space z1:T . We employ a max ation, so the total number may differ from the original pooling operation on z1:T to capture salient word
features from our input; this representation is de- Method SR HATE HAR noted as m. This forces words that are highly LR(Char-gram + Unigram) 0.79 0.85 0.68 indicative of hate speech to higher positive val- ues within the updated embedding space. We also LR(Waseem and Hovy, 2016) 0.74 - - average the embeddings z1:T to capture the over- LR (Davidson et al., 2017) - 0.90 - all meaning of the sentence, denoted as a, which provides a strong conditional factor in conjunction CNN (Park and Fung, 2017) 0.83 - - with the max pooling output. This also helps reg- GRU Text (Founta et al., 2018) 0.83 0.89 - ularize gradient updates from the max pooling op- GRU Text + Metadata 0.87 0.89 - eration. TWEM (Ours) 0.86 0.924 0.71 4.3 Output We concatenate a and m to form a document rep- Table 2: F1 Results3 resentation d and feed the representation into a 50 node 2 layer MLP followed by ReLU Activation to 6 Results and Discussion allow for increased nonlinear representation learn- ing. This representation forms the preterminal The approach we have developed establishes a layer and is passed to a fully connected softmax new state of the art for classifying hate speech, layer whose output is the probability distribution outperforming previous results by as much as 12 over labels. F1 points. Table 2 illustrates the robustness of our method, which often outperform previous results, 5 Experimental Setup measured by weighted F1. 4 Using the Approximate Randomization (AR) We tokenize the data using Spacy Test (Riezler and Maxwell, 2005), we perform (Honnibal and Johnson, 2015). We use 300 significance testing using a 75/25 train and test Dimensional Glove Common Crawl Embeddings split (840B Token) (Pennington et al., 2014) and to compare against (Waseem and Hovy, 2016) fine tune them for the task. We experimented and (Davidson et al., 2017), whose models we re- extensively with pre-processing variants and implemented. We found 0.001 significance com- our results showed better performance without pared to both methods. We also include in-depth lemmatization and lower-casing (see supplement precision and recall results for all three datasets in for details). We pad each input to 50 words. We the supplement. train using RMSprop with a learning rate of .001 Our results indicate better performance than and a batch size of 512. We add dropout with several more complex approaches, including a drop rate of 0.1 in the final layer to reduce Davidson et al. (2017)’s best model (which used overfitting (Srivastava et al., 2014), batch size, word and part-of-speech ngrams, sentiment, and input length empirically through random readability, text, and Twitter specific features), hyperparameter search. Park and Fung (2017) (which used two fold classi- All of our results are produced from 10-fold fication and a hybrid of word and character CNNs, cross validation to allow comparison with previ- using approximately twice the parameters we use ous results. We trained a logistic regression base- excluding the word embeddings) and even recent line model (line 1 in Table 2) using character work by Founta et al. (2018), (whose best model ngrams and word unigrams using TF*IDF weight- relies on GRUs, metadata including popularity, ing (Salton and Buckley, 1987), to provide a base- network reciprocity, and subscribed lists). line since HAR has no reported results. For the On the SR dataset, we outperform Founta et al. SR and HATE datasets, the authors reported their (2018)’s text based model by 3 F1 points, while trained best logistic regression model’s2 results on just falling short of the Text + Metadata Inter- their respective datasets. leaved Training model. While we appreciate the potential added value of metadata, we believe a 2 Features described in Related Works section tweet-only classifier has merits because retriev- 3 SR: Sexist/Racist (Waseem and Hovy, 2016), 4 HATE: Hate (Davidson et al., 2017) HAR: Harassment This was used in previous work, as confirmed by check- (Golbeck et al., 2017) ing with authors
ing features from the social graph is not always about the model (8). We elaborate on our clus- tractable in production settings. Excluding the em- tering approach in the supplement. We see that bedding weights, our model requires 100k param- the model learned general semantic groupings of eters , while Founta et al. (2018) requires 250k pa- words associated with hate speech as well as spe- rameters. cific idiosyncrasies related to the dataset itself (e.g. katieandnikki) 6.1 Error Analysis Cluster Tokens False negatives5 Geopolitical and bomb, mobs, jewish, kidnapped, airstrikes, Many of the false negatives we see are specific ref- religious refer- secularization, ghettoes, islamic, burnt, mur- erences to characters in the TV show “My Kitchen ences around Islam derous, penal, traitor, intelligence, molesting, and the Middle cannibalism Rules”, rather than something about women in East general. Such examples may be innocuous in iso- Strong epithets liberals, argumentative, dehumanize, gen- lation but could potentially be sexist or racist in and adjectives dered, stereotype, sociopath,bigot, repressed, associated with judgmental, heinous, misandry, shameless, context. While this may be a limitation of consid- harassment and depravity, scumbag, ering only the content of the tweet, it could also be hatespeech Miscellaneous turnt, pedophelia, fricken, exfoliated, soci- a mislabel. olinguistic, proph, cissexism, guna, lyked, mobbing, capsicums, orajel, bitchslapped, Debra are now my most hated team on venturebeat, hairflip, mongodb, intersec- #mkr after least night’s ep. Snakes in tional, agender Sexist related epi- malnourished, katieandnikki, chevapi, dumb- the grass those two. thets and hashtags slut, mansplainers, crazybitch, horrendous- ness, justhonest, bile, womenaretoohardtoan- Along these lines, we also see correct predictions imate, of innocuous speech, but find data mislabeled as Sexist, sexual, and actress, feminism, skank, breasts, redhead, pornographic terms anime, bra, twat, chick, sluts, trollop, teenage, hate speech: pantyhose, pussies, dyke, blonds, @LoveAndLonging ...how is that exam- Table 3: Projected Embedding Cluster Analysis from ple ”sexism”? SR Dataset @amberhasalamb ...in what way? 7 Conclusion Another case our classifier misses is problematic speech within a hashtag: Despite minimal tuning of hyper-parameters, fewer weight parameters, minimal text preprocess- :D @nkrause11 Dudes who go to culi- ing, and no additional metadata, the model per- nary school: #why #findawife #notsex- forms remarkably well on standard hate speech ist :) datasets. Our clustering analysis adds inter- This limitation could be potentially improved pretability enabling inspection of results. through the use of character convolutions or sub- Our results indicate that the majority of recent word tokenization. deep learning models in hate speech may rely False Positives on word embeddings for the bulk of predictive In certain cases, our model seems to be learning power and the addition of sequence-based param- user names instead of semantic content: eters provide minimal utility. Sequence based ap- proaches are typically important when phenom- RT @GrantLeeStone: @MT8 9 I don’t ena such as negation, co-reference, and context- even know what that is, or where it’s dependent phrases are salient in the text and thus, from. Was that supposed to be funny? we suspect these cases are in the minority for pub- It wasn’t. licly available datasets. We think it would be valuable to study the occurrence of such linguistic Since the bulk of our model’s weights are in the phenomena in existing datasets and construct new embedding and embedding-transformation matri- datasets that have a better representation of sub- ces, we cluster the SR vocabulary using these tle forms of hate speech. In the future, we plan to transformed embeddings to clarify our intuitions investigate character based representations, using 5 Focused on the SR Dataset (Waseem and Hovy, 2016) character CNNs and highway layers (Kim et al.,
2016) along with word embeddings to allow robust References representations for sparse words such as hashtags. Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. arXiv preprint arXiv:1703.04009. Kevin Drum. 2017. Twitter Is a Cesspool, But It’s Our Cesspool. Mother Jones. Antigoni-Maria Founta, Despoina Chatzakou, Nico- las Kourtellis, Jeremy Blackburn, Athena Vakali, and Ilias Leontiadis. 2018. A unified deep learn- ing architecture for abuse detection. arXiv preprint arXiv:1802.00385. Jennifer Golbeck, Zahra Ashktorab, Rashad O Banjo, Alexandra Berlinger, Siddharth Bhagwan, Cody Buntain, Paul Cheakalos, Alicia A Geller, Quint Gergory, Rajesh Kumar Gnanasekaran, et al. 2017. A large labeled corpus for online harassment re- search. In Proceedings of the 2017 ACM on Web Science Conference, pages 229–233. ACM. Graham Hillard. 2018. Stop Complaining about Twit- ter — Just Leave It. National Review. Matthew Honnibal and Mark Johnson. 2015. An im- proved non-monotonic transition system for depen- dency parsing. In Proceedings of the 2015 Con- ference on Empirical Methods in Natural Language Processing, pages 1373–1378, Lisbon, Portugal. As- sociation for Computational Linguistics. Ian T Jolliffe. 1986. Principal component analysis and factor analysis. In Principal component analysis, pages 115–128. Springer. Yoon Kim, Yacine Jernite, David Sontag, and Alexan- der M Rush. 2016. Character-aware neural language models. In AAAI, pages 2741–2749. Issie Lapowsky. 2017. Trump faces lawsuit over his Twitter blocking habits. Wired. Ji Ho Park and Pascale Fung. 2017. One-step and two- step classification for abusive language detection on twitter. CoRR, abs/1706.01206. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Pretten- hofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830. Jeffrey Pennington, Richard Socher, and Christo- pher D. Manning. 2014. Glove: Global vectors for word representation. In Empirical Methods in Nat- ural Language Processing (EMNLP), pages 1532– 1543. Stefan Riezler and John T Maxwell. 2005. On some pitfalls in automatic evaluation and significance test- ing for mt. In Proceedings of the ACL workshop on
intrinsic and extrinsic evaluation measures for ma- A Supplemental Material chine translation and/or summarization, pages 57– 64. We experimented with several different prepro- cessing variants and were surprised to find that re- Gerard Salton and Chris Buckley. 1987. Term weight- ing approaches in automatic text retrieval. Technical ducing preprocessing improved the performance report, Cornell University. on the task for all of our tasks. We go through each preprocessing variant with an example and Dinghan Shen, Guoyin Wang, Wenlin Wang, Mar- then describe our analysis to compare and evalu- tin Renqiang Min, Qinliang Su, Yizhe Zhang, Ri- cardo Henao, and Lawrence Carin. 2018. On the ate each of them. use of word embeddings alone to represent natural language sequences. A.1 Preprocessing Original text Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks RT @AGuyNamed Nick Now, I’m not from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958. sexist in any way shape or form but I think women are better at gift wrapping. Zeerak Waseem and Dirk Hovy. 2016. Hateful sym- It’s the XX chromosome thing bols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the Tokenize (Basic Tokenize: Keeps case and NAACL student research workshop, pages 88–93. words intact with limited sanitizing) Z. Zhang, D. Robinson, and J. Tepper. 2018. Detecting RT @AGuyNamed Nick Now , I ’m not hate speech on twitter using a convolution-gru based deep neural network. c 2018 Springer Verlag. sexist in any way shape or form but I think women are better at gift wrapping . It ’s the XX chromosome thing Tokenize Lowercase: Lowercase the basic tok- enize scheme rt @aguynamed nick now , i ’m not sex- ist in any way shape or form but i think women are better at gift wrapping . it ’s the xx chromosome thing Token Replace: Replaces entities and user names with placeholder) ENT USER now , I ’m not sexist in any way shape or form but I think women are better at gift wrapping . It ’s the xx chromosome thing Token Replace Lowercase: Lowercase the To- ken Replace Scheme ENT USER now , i ’m not sexist in any way shape or form but i think women are better at gift wrapping . it ’s the xx chromosome thing We did analysis on a validation set across mul- tiple datasets to find that the ”Tokenize” scheme was by far the best. We believe that keeping the case in tact provides useful information about the user. For example, saying something in all CAPS is a useful signal that the model can take advantage of.
Preprocessing Scheme Avg. Validation Loss dimensions for each word and sorted by the sum Token Replace Lowercase 0.47 in descending order to find the 1000 most salient Token Replace 0.46 word embeddings from our vocabulary. We then Tokenize 0.32 ran PCA (Jolliffe, 1986) to reduce the dimension- Tokenize Lowercase 0.40 ality of the projected embeddings from 300 dimen- sions to 75 dimensions. This captured about 60% Table 4: Average Validation Loss for each Preprocess- of the variance. Finally, we ran K means cluster- ing Scheme ing for k = 5 clusters to organize the most salient embeddings in the projected space. A.2 In-Depth Results The learned clusters from the SR vocabulary were very illuminating (see Table 8); they gave in- sights to how hate speech surfaced in the datasets. One clear grouping we found is the misogynistic Waseem 2016 Ours and pornographic group, which contained words like breasts, blonds, and skank. Two other clusters P R F1 P R F1 had references to geopolitical and religious issues none 0.76 0.98 0.86 0.88 0.93 0.90 in the Middle East and disparaging and resentful sexism 0.95 0.38 0.54 0.79 0.74 0.76 epithets that could be seen as having an intellec- racism 0.85 0.30 0.44 0.86 0.72 0.78 tual tone. This hints towards the subtle pedagogic 0.74 0.86 forms of hate speech that surface. We ran silhouette analysis (Pedregosa et al., Table 5: SR Results 2011) on the learned clusters to find that the clus- ters from the learned representations had a 35% Davidson 2017 Ours higher silhouette coefficient using the projected embeddings compared to the clusters created from P R F1 P R F1 the original pre-trained embeddings. This rein- none 0.82 0.95 0.88 0.89 0.94 0.91 forces the claim that our training process pushed offensive 0.96 0.91 0.93 0.95 0.96 0.96 hate-speech related words together, and words hate 0.44 0.61 0.51 0.61 0.41 0.49 from other clusters further away, thus, structuring 0.90 0.924 the embedding space effectively for detecting hate speech. Table 6: HATE Results Cluster Tokens Geopolitical and bomb, mobs, jewish, kidnapped, airstrikes, Method Prec Rec F1 Avg F1 religious refer- secularization, ghettoes, islamic, burnt, mur- Ours 0.713 0.206 0.319 0.711 ences around Islam derous, penal, traitor, intelligence, molesting, and the Middle cannibalism LR Baseline 0.820 0.095 0.170 0.669 East Strong epithets liberals, argumentative, dehumanize, gen- and adjectives dered, stereotype, sociopath,bigot, repressed, associated with judgmental, heinous, misandry, shameless, Table 7: HAR Results harassment and depravity, scumbag, hatespeech Miscellaneous turnt, pedophelia, fricken, exfoliated, soci- olinguistic, proph, cissexism, guna, lyked, A.3 Embedding Analysis mobbing, capsicums, orajel, bitchslapped, venturebeat, hairflip, mongodb, intersec- Since our method was a simple word embedding tional, agender based model, we explored the learned embedding Sexist related epi- malnourished, katieandnikki, chevapi, dumb- thets and hashtags slut, mansplainers, crazybitch, horrendous- space to analyze results. For this analysis, we only ness, justhonest, bile, womenaretoohardtoan- use the max pooling part of our architecture to imate, help analyze the learned embedding space because Sexist, sexual, and actress, feminism, skank, breasts, redhead, pornographic terms anime, bra, twat, chick, sluts, trollop, teenage, it encourages salient words to increase their val- pantyhose, pussies, dyke, blonds, ues to be selected. We projected the original pre- trained embeddings to the learned space using the Table 8: Projected Embedding Cluster Analysis from time distributed MLP. We summed the embedding SR Dataset
Confidential Review Copy. DO NOT DISTRIBUTE. 000 1 Supplemental Material Preprocessing Scheme Avg. Validation Loss 050 001 Token Replace Lowercase 0.47 051 We experimented with several different prepro- Token Replace 0.46 002 052 cessing variants and were surprised to find that re- Tokenize 0.32 003 053 ducing preprocessing improved the performance Tokenize Lowercase 0.40 004 054 on the task for all of our tasks. We go through 005 055 each preprocessing variant with an example and Table 1: Average Validation Loss for each Preprocess- 006 056 then describe our analysis to compare and evalu- ing Scheme 007 057 ate each of them. 008 058 009 1.1 Preprocessing 1.2 In-Depth Results 059 010 Original text 060 011 061 012 062 RT @AGuyNamed Nick Now, I’m not arXiv:1809.10644v1 [cs.CL] 27 Sep 2018 013 Waseem ? Ours 063 sexist in any way shape or form but I 014 P R F1 P R F1 064 think women are better at gift wrapping. 015 065 It’s the XX chromosome thing none 0.76 0.98 0.86 0.88 0.93 0.90 016 066 Tokenize (Basic Tokenize: Keeps case and sexism 0.95 0.38 0.54 0.79 0.74 0.76 017 067 words intact with limited sanitizing) racism 0.85 0.30 0.44 0.86 0.72 0.78 018 068 0.74 0.86 019 RT @AGuyNamed Nick Now , I ’m not 069 020 sexist in any way shape or form but I Table 2: SR Results 070 021 think women are better at gift wrapping 071 022 . It ’s the XX chromosome thing 072 023 Davidson ? Ours 073 Tokenize Lowercase: Lowercase the basic tok- 024 enize scheme P R F1 P R F1 074 025 075 none 0.82 0.95 0.88 0.89 0.94 0.91 026 rt @aguynamed nick now , i ’m not sex- 076 offensive 0.96 0.91 0.93 0.95 0.96 0.96 027 ist in any way shape or form but i think 077 hate 0.44 0.61 0.51 0.61 0.41 0.49 028 women are better at gift wrapping . it ’s 078 0.90 0.924 the xx chromosome thing 029 079 030 Token Replace: Replaces entities and user Table 3: HATE Results 080 031 names with placeholder) 081 032 ENT USER now , I ’m not sexist in any Method Prec Rec F1 Avg F1 082 033 way shape or form but I think women 083 Ours 0.713 0.206 0.319 0.711 034 are better at gift wrapping . It ’s the xx LR Baseline 0.820 0.095 0.170 0.669 084 035 chromosome thing 085 036 086 Token Replace Lowercase: Lowercase the To- 037 087 ken Replace Scheme Table 4: HAR Results 038 088 039 ENT USER now , i ’m not sexist in any 089 040 way shape or form but i think women 1.3 Embedding Analysis 090 041 are better at gift wrapping . it ’s the xx 091 Since our method was a simple word embedding 042 chromosome thing 092 based model, we explored the learned embedding 043 We did analysis on a validation set across mul- space to analyze results. For this analysis, we only 093 044 tiple datasets to find that the ”Tokenize” scheme use the max pooling part of our architecture to 094 045 was by far the best. We believe that keeping the help analyze the learned embedding space because 095 046 case in tact provides useful information about the it encourages salient words to increase their val- 096 047 user. For example, saying something in all CAPS ues to be selected. We projected the original pre- 097 048 is a useful signal that the model can take advantage trained embeddings to the learned space using the 098 049 of. time distributed MLP. We summed the embedding 099 1
Confidential Review Copy. DO NOT DISTRIBUTE. 100 dimensions for each word and sorted by the sum 150 101 in descending order to find the 1000 most salient 151 102 word embeddings from our vocabulary. We then 152 103 ran PCA (?) to reduce the dimensionality of the 153 104 projected embeddings from 300 dimensions to 75 154 105 dimensions. This captured about 60% of the vari- 155 106 ance. Finally, we ran K means clustering for k = 5 156 107 clusters to organize the most salient embeddings in 157 108 the projected space. 158 109 The learned clusters from the SR vocabulary 159 110 were very illuminating (see Table 5); they gave in- 160 111 sights to how hate speech surfaced in the datasets. 161 112 One clear grouping we found is the misogynistic 162 113 and pornographic group, which contained words 163 like breasts, blonds, and skank. Two other clusters 114 164 had references to geopolitical and religious issues 115 165 in the Middle East and disparaging and resentful 116 166 epithets that could be seen as having an intellec- 117 167 tual tone. This hints towards the subtle pedagogic 118 168 forms of hate speech that surface. 119 169 We ran silhouette analysis (?) on the learned 120 170 clusters to find that the clusters from the learned 121 representations had a 35% higher silhouette coef- 171 122 ficient using the projected embeddings compared 172 123 to the clusters created from the original pre-trained 173 124 embeddings. This reinforces the claim that our 174 125 training process pushed hate-speech related words 175 126 together, and words from other clusters further 176 127 away, thus, structuring the embedding space effec- 177 128 tively for detecting hate speech. 178 129 179 Cluster Tokens 130 Geopolitical and bomb, mobs, jewish, kidnapped, airstrikes, 180 131 religious refer- secularization, ghettoes, islamic, burnt, mur- 181 ences around Islam derous, penal, traitor, intelligence, molesting, 132 182 and the Middle cannibalism 133 East 183 Strong epithets liberals, argumentative, dehumanize, gen- 134 184 and adjectives dered, stereotype, sociopath,bigot, repressed, 135 associated with judgmental, heinous, misandry, shameless, 185 136 harassment and depravity, scumbag, 186 hatespeech 137 Miscellaneous turnt, pedophelia, fricken, exfoliated, soci- 187 138 olinguistic, proph, cissexism, guna, lyked, 188 mobbing, capsicums, orajel, bitchslapped, 139 venturebeat, hairflip, mongodb, intersec- 189 140 tional, agender 190 Sexist related epi- malnourished, katieandnikki, chevapi, dumb- 141 191 thets and hashtags slut, mansplainers, crazybitch, horrendous- 142 ness, justhonest, bile, womenaretoohardtoan- 192 imate, 143 193 Sexist, sexual, and actress, feminism, skank, breasts, redhead, 144 pornographic terms anime, bra, twat, chick, sluts, trollop, teenage, 194 145 pantyhose, pussies, dyke, blonds, 195 146 196 147 Table 5: Projected Embedding Cluster Analysis from 197 SR Dataset 148 198 149 199 2
You can also read