Semantic Oppositeness Assisted Deep Contextual Modeling for Automatic Rumor Detection in Social Networks
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Semantic Oppositeness Assisted Deep Contextual Modeling for Automatic Rumor Detection in Social Networks Nisansa de Silva, Dejing Dou Department of Computer and Information Science, University of Oregon, Eugene, USA. {nisansa,dou}@cs.uoregon.edu Abstract mation disseminated by others (e.g., Facebook by adding friends, Twitter by following). By virtue Social networks face a major challenge in the of other mechanisms, such as Facebook pages and form of rumors and fake news, due to their intrinsic nature of connecting users to mil- Twitter lists, the reach of each individual was then lions of others, and of giving any individual extended to the range of thousands-to-millions of the power to post anything. Given the rapid, users. New content, in the form of posts, is created widespread dissemination of information in so- on social media sites each passing second. cial networks, manually detecting suspicious The rapidity of this post creation is such, that it news is sub-optimal. Thus, research on auto- is possible to claim that social media reflect a near matic rumor detection has become a necessity. real-time view of the events in the real world (Vey- Previous works in the domain have utilized seh et al., 2019). While it was, indeed, beneficial in the reply relations between posts, as well as the semantic similarity between the main post terms of volume of data, to have private individuals and its context, consisting of replies, in order be content creators and propagators of information, to obtain state-of-the-art performance. In this this created significant issues, from the perspective work, we demonstrate that semantic opposite- of veracity of the data. This gave rise to a challenge ness can improve the performance on the task of detecting fake news and rumors (which, in this of rumor detection. We show that semantic op- study, we refer to as the task of rumor detection). positeness captures elements of discord, which The need for rumor detection has come to the fore- are not properly covered by previous efforts, which only utilize semantic similarity or reply front, in light of its momentous impacts on political structure. Our proposed model learns both ex- events (Jin et al., 2017) and social (Jin et al., 2014) plicit and implicit relations between the main or economic (Domm, 2013) trends. tweet and its replies, by utilizing both seman- Manual intervention on this task would require tic similarity and semantic oppositeness. Both extensive analysis of and reasoning about various of these employ the self-attention mechanism sources of information, resulting in long response in neural text modeling, with semantic oppo- times, which are intolerable, given the impact of siteness utilizing word-level self-attention, and with semantic similarity utilizing post-level these rumors, and the rate at which they spread. self-attention. We show, with extensive ex- Thus, automatic rumor detection, toward which we periments on recent data sets for this problem, contribute in this paper, has become an important that our proposed model achieves state-of-the- area of contemporary research. Cao et al. (2018) art performance. Further, we show that our define any piece of information, of which the verac- model is more resistant to the variances in per- ity status was questionable at the time of posting, formance introduced by randomness. as a rumor. They further claim that a rumor may later be verified to be true or false by other autho- 1 Introduction rized sources. We follow their definition in this Social media changed the ecosystem of the World work; thus, we define the task of rumor detection Wide Web by making it possible for any individ- as: Given a piece of information from a social net- ual, regardless of their level of knowledge of web work, predict whether the piece of information is a technologies, to create and maintain profiles online. rumor or not using the conversations which were At the same time, various social media provided induced by the said piece of information. The ini- these individuals with means to tap into the infor- tial piece of information could be a tweet or a user 405 Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, pages 405–415 April 19 - 23, 2021. ©2021 Association for Computational Linguistics
post, and the induced conversation would be the The remainder of this paper is organized as fol- replies from other users (which we use as contex- lows: Section 2 presents related work, and then Sec- tual information). Following the conventions in the tion 3 provides a formal definition of the problem, literature, in this work, we refer to a main post and along with our proposed solution. It is followed its replies as a thread. by Section 4 discussing experiments and results. In this paper, we utilize the semantic opposite- Finally the Section 5 concludes the paper. ness proposed by (de Silva and Dou, 2019) to im- prove the rumor detection task, which has so far 2 Related Work been restricted to only considering semantic simi- larity. We further prove that semantic oppositeness Semantic oppositeness is the mathematical coun- is well-suited to be applied to this domain, under terpart of semantic similarity (de Silva and Dou, the observation that rumor threads are more discor- 2019). While implementations of semantic simi- dant than those of non-rumors. We further observe larity (Jiang and Conrath, 1997; Wu and Palmer, that, within rumor threads, false rumor threads con- 1994) are more widely used than those of seman- tinue to be clamorous; while true rumor threads tic oppositeness, there are a number of studies settle into inevitable acquiescence. We claim that which work on deriving or using semantic oppo- semantic oppositeness can help in distinguishing siteness (de Silva et al., 2017; Paradis et al., 1982; this behavior as well. Mettinger, 1994; Schimmack, 2001; Rothman and We propose word-level self-attention mechanism Parker, 2009; de Silva, 2020). However, it is noted for the semantic oppositeness to augment the tweet that almost all of these studies are reducing oppo- level self-attention mechanism for the semantic siteness from a scale to either bipolar scales (Schim- similarity. We model the explicit and implicit con- mack, 2001) or simple anonymity (Paradis et al., nections within a thread, using a relevancy matrix. 1982; Jones et al., 2012). The study by de Silva Unlike a regular adjacency matrix, our relevancy et al. (2017) proves that this reduction is incorrect matrix recognizes the coherence of each sub-tree and proposes an alternative oppositeness function. of conversation rooted at the main post, while ac- Their follow-up study, de Silva and Dou (2019) knowledging that, by definition, for this task, the creates a word embedding model for this function. main tweet must be directly related all the rest of In this study, we use the oppositeness embeddings the tweets, regardless of the degrees of separation created by them. that may exist between them. We conduct extensive Rumor detection task has been approached on experiments to compare our proposed model with three fronts, according to Cao et al. (2018): feature the state-of-the-art studies conducted on the same engineering, propagation-based, and deep learn- topic. To the best of our knowledge, this work is ing. In the feature engineering approach, posts are the first to utilize semantic oppositeness in rumor transformed into feature representations by hand- detection. In summary, our contributions in this designed features and sent to a statistical model paper include: to be classified. In addition to textual information, structural evidences (Castillo et al., 2011; Yang • We introduce a novel method for rumor de- et al., 2012) and media content (Gupta et al., 2012) tection, based on both semantic similarity and are also utilized. Given that this approach depends semantic oppositeness, utilizing the main post heavily on the quality of the hand-designed fea- and the contextual replies. ture sets, it is neither scalable, nor transferable to other domains. The propagation-based approach • We model the explicit and implicit connec- is built on the assumption that the propagation pat- tions within a thread, using a relevancy ma- tern of a rumor is significantly different to that trix, which is then used to balance the impact of a non-rumor. It has been deployed to detect semantic similarity and semantic oppositeness rumors in social networks (Ma et al., 2017). How- have on the overall prediction. ever, this method does not pay any heed to the information in the post content itself. As expected, • We conduct experiments on recent rumor de- deep learning approach, automatically learns effec- tection data sets and compare with numerous tive features (Ma et al., 2016, 2018; Veyseh et al., state-of-the-art baseline models to show that 2019). Ma et al. (2016) claim that these discovered we achieve superior performance. features capture the underlying representations of 406
the posts, and hence, improve the generalization 3 Methodology performance, while making it easy to be adapted into a new domain or a social medium for the pur- We use a recent work (Veyseh et al., 2019) on ru- pose of rumor detection.Yang et al. (2020) propose mor detection as our baseline. Their work, in turn, a slide window-based system for feature extrac- was heavily influenced by the earlier work on ru- tion. None of these state-of-the-art work attempt to mor detection in Twitter (Ma et al., 2018). A tweet check rumour veracity akin to attempts by Hamid- set I is defined as shown in Equation 1, where R0 ian and Diab (2019a) and Derczynski et al. (2017). is the initial tweet and R1 , R2 , . . . , RT are replies, Instead, they attempt to do classification on the al- such that T is the count of replies. Each tweet Ri ready established baseline. Thus, our work also is a sequence of words W1 , W2 , ..., Wn , such that follow the approach of the former rather than the n is the count of words. We tokenize the tweets; latter. The work by Hamidian and Diab (2019b) and in this work, tokens and words are used inter- does focus on rumor detection and classification. changeably. We also define the relevance matrix However, they are not using the data sets common M , which carries the information of the tree struc- to the state-of-the-art work mentioned above to ture of the tweet tree in Equation 2, where A ? B evaluate their approach. denotes that A and B belong to the same tree in the forest obtained by eliminating the initial tweet. Our work is most related to the rumor detection We show the process in Fig 1 as well. Our input is model on Twitter by means of deep learning to cap- the pair P = (I, M ), which differs from (Veyseh ture contextual information (Veyseh et al., 2019). et al., 2019), where only I was used as the input. However, we also derive inspiration from earlier The entire data set is represented by D. work on the same topic (Ma et al., 2018), which Following the convention of (Veyseh et al., 2019) utilized the tree-like structures of the posts, and which is our baseline, we classify each pair (I, M ) the work by de Silva and Dou (2019), which in- into four labels: 1) Not a rumor (NR); 2) False troduced the oppositeness embedding model. The Rumor (FR); 3) True Rumor (TR); and 4) Unrec- early work by Ma et al. (2018) uses Recursive Neu- ognizable (UR), It should be noted that the distinc- ral Networks (RvNN) for the construction of the tion between “False Rumor” and “True Rumor” is aforementioned tree-like structures of the posts, drawn from the truthfulness of R0 . based on their tf-idf representations. The following work by Veyseh et al. (2019) ac- knowledges the usefulness of considering the in- I = (R0 , R1 , R2 , . . . , RT ) (1) nate similarities between replies, but further claims 1 if Ri = R0 ∨ Rj = R0 that only considering the replies along the tree- mi,j = 1 if Ri ? Rj (2) like structure only exploits the explicit relations 0 otherwise between the main posts and their replies, and thus ignores the implicit relations among the posts from different branches based on their semantics. Under In simpler terms, we can represent Veyseh et al. this claim, they disregard the tree-like structure en- (2019) as a trivial relevance matrix where all el- tirely. In our work, we preserve the idea of consid- ements are set at 1. The success of Veyseh et al. ering semantic similarities to discover the implicit (2019) over previous state-of-the-art methods at- relationships among posts, as proposed by (Veyseh test to the success of using a relevance matrix over et al., 2019). vanilla adjacency matrix. In this work what we do with the above described relevance matrix M is However, we augment the model and re- to augment the implicit relationship consideration introduce the explicit relationships proposed by Ma using the high level structure of the explicit relation- et al. (2018) in a balancing of information between ships, hence bringing in the best-of-both-worlds. In implicit and explicit. Further, we note that all these summary, the set of edges in the relevancy matrix prior works have been solely focused on the simi- is a super-set of the set of edges in the adjacency larity between the posts and have ignored the oppo- matrix. In addition to the edges that were in the siteness metric. To the best of our knowledge, we adjacency matrix, the relevancy matrix also has are the first to utilize oppositeness information in edges that carry implicit connection information. the rumor detection task. Thus, by definition, the relevancy matrix is more 407
R0 R0 R0 R0 R1 R3 R6 R1 R3 R6 R1 R3 R6 R1 R3 R6 R2 R4 R5 R2 R4 R5 R2 R4 R5 R2 R4 R5 R1 R2 R3 R4 R5 R6 R0 R1 R2 R3 R4 R5 R6 R1 1 1 R3 1 1 1 R6 1 R0 1 1 1 1 1 1 1 R2 1 1 R4 1 1 1 R1 1 1 1 0 0 0 0 R5 1 1 1 R2 1 1 1 0 0 0 0 R3 1 0 0 1 1 1 0 R4 1 0 0 1 1 1 0 R5 1 0 0 1 1 1 0 R6 1 0 0 0 0 0 1 Figure 1: Relevance matrix building: 1) Original tweet reply tree; 2) Obtain the forest by temporarily removing the root (main tweet); 3) Consider each tree in the forest to be fully connected graphs, and obtain the relevance ma- trices; 4) Obtain the full Relevance matrix by putting together the matrices from the previous step and considering the main tweet to be connected to all the other tweets. Rumour Classifier [With FF of 200 units] ... ... ... Classifier Word Embeddings [Each of Size: 300] MLP Llabel Tweet-Level Self-Attention Main Tweet Adjustment Layer Reply N Reply 1 Latent Label Classifier [With FF of 100 units] MLP Lthread Oppositeness Embeddings [Each of Size: 300] Classifier ... ... ... MLP Word-Level Self-Attention Lmain ... ... ... Mask-Builder Oppositeness Matrix Relevancy Matrix Figure 2: The Proposed Model: Red vectors and node represent the main (root) tweet, and green vectors and nodes represent replies. Pooling operations are shown in boxes with dashed lines. descriptive of the thread compared to the adjacency siteness embedding Oi is created as oi1 , oi2 , ..., oim matrix. where oij is the embedding of Wij . Note that m ≤ N where all tokens might not have corre- 3.1 Formal Definition of Tweet sponding oppositeness embeddings. Representation Each word in each tweet is then converted to Each tweet will have different number of words n; a representative vector by means of a set of pre- thus, we pad the short tweets with a special token, trained word embeddings, such that for the i-th until all the tweets have the same word length N tweet Ri , with words Wi1 , Wi2 , ..., WiN is con- as defined by 3. verted ei1 , ei2 , ..., eiN . We then apply max-pooling operation over the word embeddings along each N = argmax(ni ) (3) dimension, resulting in a representative vector hi Pi ∈D coupled to Ri , as shown in Equation 4. At this We build the representative oppositeness list point, note that the tweet set I of each pair P , O using the oppositeness embeddings created which used to be I = (R0 , R1 , R2 , . . . , RT ), has by (de Silva and Dou, 2019) such that, for the i-th been replaced by I = (h0 , h1 , h2 , . . . , hT ). It is tweet Ri , with words Wi1 , Wi2 , ..., WiN , the oppo- this new representation which is passed to the fol- 408
0 lowing steps. at the sentence level. We build key (ki ) and query 0 (qi ) vectors for each word based on its representa- tion oi , as shown in Equation 7. (W and b follow hi = Elementwise M ax(ei1 , ei2 , ..., eiN ) (4) the traditional notation of weights and bias). 3.2 Similarity-Based Contextualization 0 0 As discussed earlier, the Twitter data is organized as ki = Wk ∗ oi + bk qi = Wq ∗ oi + bq (7) a tree rooted at the main tweet R0 in each instance. Since the oppositeness embedding of (de Silva The earlier work by Ma et al. (2018) proved that, and Dou, 2019) is based on Euclidean distance, in rumor detection, it is helpful to capture these re- with the key and query vectors, we calculate the op- lations among the main tweet and the replies. The positeness opix ,jy between x-th word of i-th tweet subsequent work by Veyseh et al. (2019) noted that and y-th word of j-th tweet using the Euclidean only considering explicit reply relation between the 0 distance, as shown in Equation 8 where kix is the main tweet and other tweets neglects the implicit 0 relations among the tweets, arising from their se- key vector for x-th word of i-th tweet, qjy is the mantic similarities (i.e., by the virtue of discussing query vector for y-th word of j-th tweet, and Eu- the same topic, tweets in two separate branches clidean distance d(, ) is calculated across the size may carry mutually useful information). Follow- of the oppositeness embedding. ing this hypothesis, they exploited such implicit 0 0 semantic relations for the purpose of improving opix ,jy = d(kix , qjy ) (8) the performance of the rumor detection task. How- To obtain the abstract tweet-level oppositeness, ever, in doing so, they abandoned the information we apply element-wise average-pooling on the garnered from the tree structure. In this work we OPi,j matrix, as shown in Equation 9, to cre- propose to continue to use the implicit information, ate the oppositeness matrix O” , where EA is but to augment it with the information derived from Elementwise Average operation, δ is the oppo- the tree structure. siteness embedding count of the i-th tweet, and We follow the self-attention mechanism of (Vey- % is the oppositeness embedding count of the j-th seh et al., 2019), which was inspired by the trans- tweet. Note that the dimensions of the oppositeness former architecture in (Vaswani et al., 2017), to matrix O” is the same as the relevance matrix M . learn the pairwise similarities among tweets for capturing the semantic relations between the tweets. The process starts with calculating the key (ki ) and opi0 ,j0 opi1 ,j0 opiδ ,j0 !... query (qi ) vectors for each tweet, based on its rep- opi0 ,j1 opi1 ,j1 opiδ ,j1 ... o”i,j = EA ... resentation hi , as shown in Equation 5. (W and b ... ... ... follow the traditional notation of weights and bias). opi0 ,j% opi1 ,j% opiδ ,j% ... (9) Next we create the oppositeness mask Ω by ki = Wk ∗ hi + bk qi = Wq ∗ hi + bq (5) average-pooling O” along rows and columns, as shown in Equation 10, where the definition of EA, With the key and query vectors, we calculate the is the same as Equation 9 and similar to Equation 3, similarity aij between i-th and j-th tweets, using ni and nj are natural lengths of the i-th and j-th the dot product as shown in Equation 6, where γ is tweets respectively. a normalization factor. ωi,j = 1 − EA(o”i,0 , o”i,1 , ..., o”i,nj ) ai,j = ki · qj /γ (6) (10) −EA(o”0,j , o”1,j , ..., o”ni ,j ) 3.3 Oppositeness-Based Contextualization Unlike in the case of similarity vectors, which were 3.4 Deriving Overall Thread Representations reduced to a single dimension at this point, the Similar to the oppositeness mask Ω, we create the oppositeness representations are still at two dimen- relevance mask Ψ by sum-pooling M along rows sions. Thus the self-attention of oppositeness be- and columns, as shown in Equation 11, where ES, tween tweets is handled at a word level, rather than is Elementwise Sum operation, and similar to 409
Equation 3, ni and nj are natural lengths of the i-th proposed to bring forward the information in the and j-th tweets respectively. main tweet independently of and separately from that of the collective twitter thread, in order to pro- vide a check. We, in this work, also provide this ψi,j = ES(mi,0 , mi,1 , ..., mi,nj ) (11) sanctity check, to enhance the obtained results. +ES(m0,j , m1,j , ..., mni ,j ) The basic idea is that, by virtue of definition, if At this point we diverge from (Veyseh et al., a main tweet is a rumor (or not), unique trait and 2019) in two ways and utilize the related relevance information pertaining to that class should be in the mask M as a weighting mechanism, with propor- main tweet itself. Thus, the latent label (Lthread ) tion constant α (where 0 < α < 1), as well as the obtained by processing the thread representation h0 oppositeness mask Ω, to obtain augmented atten- above should be the same as a potential latent label 0 tion ai,j as shown in Equation 12. (Lmain ) obtained by processing the representation of the main tweet h0 . To calculate Lmain , we use a h i 2-layer feed-forward neural network with a softmax 0 2 ai,j = ai,j ωi,j (ψi,j − α) + αψi,j (12) layer in the end, where it assigns the latent labels drawn from K possible latent labels. Next, we use 0 We utilize the augmented similarity values ai,j another 2-layer feed-forward neural network with for each tweet pair in the thread to compute ab- a softmax layer in the end, assigning the same K stract representations for the tweets based on the number of possible latent labels as shown in the weighted sums, as shown in Equation 13. negative log-likelihood function to match it with the thread. 0 h0i = Σj ai,j ∗ hj (13) Lmain = argmaxL P (L|R0 ) (16) Next, we apply the max-pooling operation over the processed tweet representation vectors h0i to obtain the overall representation vector h0 for the input pair P . Lthread = − log P 0 (Lmain |R0 , . . . , RT ) (17) Finally, the loss function to train the entire model h0 = Elementwise M ax(h00 , h01 , h02 , ..., h0T ) is defined as in Equation 18, where the Llabel (14) is obtained from Equation 15, and β is a hyper- Finally, the result is sent through a 2-layer feed- parameter which controls the contribution of the forward neural network capped with a softmax main tweet information preservation loss to final layer, with the objective of producing the probabil- loss. ity distribution P (y|R0 , R1 , R2 , . . . , RT ; θ) over the four possible labels, where θ is the model pa- Loss = Llabel + βLthread (18) rameter. On this, we optimize the negative log- likelihood function, in order to train the model, as 4 Experiments shown in Equation 15, where y ∗ is the expected (correct) label for I. We use the Twitter 15 and Twitter 16 data sets in- troduced by Ma et al. (2017) for the task of rumor detection. Some statistics of the data sets as given Llabel = − log P (y ∗ |R0 , R1 , R2 , . . . , RT ; θ) by Ma et al. (2017) are shown in Table 1. We (15) use Glove (Pennington et al., 2014) embedding to initialize the word vectors and oppositeness em- 3.5 Main Tweet Information Preservation bedding (de Silva and Dou, 2019) to initialize the The Veyseh et al. (2019) study noted that the model oppositeness embeddings. Both embedding vectors by Ma et al. (2018) treats all tweets equally. This are of size 300. Key and query vectors in Equa- was deemed undesirable, given that the main tweet tions 5 and Equations 7 employ 300 hidden units. of each thread incites the conversation, and thus, The rumor classifier feed-forward network has two arguably, carries the most important content in the layers of 200 hidden units. The feed-forward layer conversation, which should be emphasized, to pro- in the main tweet information preservation compo- duce good performance. To achieve this end, it was nent has two layers, each with 100 hidden units, 410
and it maps to three latent labels. The proportion of reply structure or be it in the form of semantic constant α, which balances the explicit and implicit relations, in helping to improve performance. We information, is set at 0.1. The loss function uses a further notice that Veyseh et al. (2019) which uses trade-off parameter of β = 1, with an initial learn- implicit information, outperforms TD-RvNN (Ma ing rate of 0.3 on the Adagrad optimizer. For the et al., 2018), which only uses explicit information. purpose of fair results comparison, we follow the Semantic Oppositeness Graph, which uses explicit convention of using 5-fold cross validation proce- information, implicit information, and semantic op- dure to tune the parameters (such as node and layer positeness outperforms all the other models in ac- counts) set by Ma et al. (2018). curacy, while outperforming all the other models in three out of four classes, in terms of F1 Score. The Statistic Twitter15 Twitter16 one class in which Semantic Oppositeness Graph Number of NR 374 205 loses out to Veyseh et al. (2019) is in the case of Number of FR 370 205 the Unrecognizable (UR) class. We argue that this Number of TR 372 205 is not an issue, given that the unrecognizable class Number of UR 374 203 consists of tweets which were too ambiguous for Avg. Num. of Posts/Tree 223 251 human annotators to tag as one of: not a rumor Max Num. of Posts/Tree 1,768 2,765 (NR), false rumor (FR), or true rumor (TR). We Min Num. of Posts/Tree 55 81 assert that Tables 2 and 3 clearly demonstrate the ef- fectiveness of the proposed Semantic Oppositeness Table 1: Statistics of the Data Sets. Graph method in the task of rumor detection. 4.1 Comparison to the State-of-the-Art 4.2 Model Stability Analysis Models While comparing our system with Veyseh et al. We compare the proposed model against the state- (2019), which we use as our main baseline, we of-the-art models on the same data sets. The per- noticed that their system has a high variance in formance is compared by means of overall accu- results, depending on the random weight initial- racy and F1 score per class. We observe that there ization. This was impactful in such a way that in are two types of models against which we com- some random weight initializations, the accuracy pare. The first type are the feature-based models, of their system could fall as low as 24% from the which used feature engineering to extract features reported high 70% results in their paper. Given for Decision Trees (Zhao et al., 2015; Castillo et al., that we use their system as our baseline and the 2011), Random Forest (Kwon et al., 2013), and basis for our model, we decided to do a stability SVM (Ma et al., 2015; Wu et al., 2015; Ma et al., analysis between their system and ours. For this 2017). The second type of models are deep learn- purpose, we created 100 random seeds and trained ing models, which used Recurrent Neural Networks four models with each seed, resulting in a total or Recursive Neural Networks to learn features for of 400 models. The models were: 1) Veyseh et al. rumor detection. We compare our model to GRU- (2019) on twitter 15, 2) Veyseh et al. (2019) on twit- RNN proposed by Ma et al. (2016), BU-RvNN ter 16, 3) Semantic Oppositeness Graph on twitter and TD-RvNN proposed by Ma et al. (2018), and 15, 4) Semantic Oppositeness Graph on twitter 16. Semantic Graph proposed by Veyseh et al. (2019). Then we normalized the results of the Veyseh et al. Results for Twitter 15 and Twitter 16 are shown in (2019) models to the values reported in their paper Tables 2 and 3, respectively. (also shown in the relevant row on Tables 2 and 3). It is evident from these tables that, in the ru- Each result is reported in the format of (µ, σ) for mor detection task, the deep learning models out- the 5 fold cross-validation to explore how random perform feature-based models, proving that auto- weight initialization affects the two models. matically learning effective features from data is From the results in Tables 4 and 5, it is evident superior to hand-crafting features. We also note that our Semantic Oppositeness Graph has higher that the Semantic Oppositeness Graph, along with mean values for accuracy, not a rumor (NR), false the Semantic Graph, and other RvNN models with rumor (FR), and true rumor (TR), while having GRU-RNN, generally do well, which attests to the comparably reasonable values for Unrecognizable utility of structural information, be it in the form (UR) class. But more interesting are the standard 411
Model Accuracy F1 NR F1 FR F1 TR F1 UR DTR (Zhao et al., 2015) 0.409 0.501 0.311 0.364 0.473 DTC (Castillo et al., 2011) 0.454 0.733 0.355 0.317 0.415 RFC (Kwon et al., 2013) 0.565 0.810 0.422 0.401 0.543 SVM-TS (Ma et al., 2015) 0.544 0.796 0.472 0.404 0.483 SVM-BOW (Ma et al., 2018) 0.548 0.564 0.524 0.582 0.512 SVM-HK (Wu et al., 2015) 0.493 0.650 0.439 0.342 0.336 SVM-TK (Ma et al., 2017) 0.667 0.619 0.669 0.772 0.645 GRU-RNN (Ma et al., 2016) 0.641 0.684 0.634 0.688 0.571 BU-RvNN (Ma et al., 2018) 0.708 0.695 0.728 0.759 0.653 TD-RvNN (Ma et al., 2018) 0.723 0.682 0.758 0.821 0.654 Semantic Graph (Veyseh et al., 2019) 0.770 0.814 0.764 0.775 0.743 Semantic Oppositeness Graph (SOG) 0.796 0.825 0.820 0.814 0.742 Table 2: Model Performance on Twitter 15. Model Accuracy F1 NR F1 FR F1 TR F1 UR DTR (Zhao et al., 2015) 0.414 0.394 0.273 0.630 0.344 DTC (Castillo et al., 2011) 0.465 0.643 0.393 0.419 0.403 RFC (Kwon et al., 2013) 0.585 0.752 0.415 0.547 0.563 SVM-TS (Ma et al., 2015) 0.574 0.755 0.420 0.571 0.526 SVM-BOW (Ma et al., 2018) 0.585 0.553 0.655 0.582 0.578 SVM-HK (Wu et al., 2015) 0.511 0.648 0.434 0.473 0.451 SVM-TK (Ma et al., 2017) 0.662 0.643 0.623 0.783 0.655 GRU-RNN (Ma et al., 2016) 0.633 0.617 0.715 0.577 0.527 BU-RvNN (Ma et al., 2018) 0.718 0.723 0.712 0.779 0.659 TD-RvNN (Ma et al., 2018) 0.737 0.662 0.743 0.835 0.708 Semantic Graph (Veyseh et al., 2019) 0.768 0.825 0.751 0.768 0.789 Semantic Oppositeness Graph (SOG) 0.826 0.843 0.843 0.878 0.774 Table 3: Model Performance on Twitter 16. Model Accuracy F1 NR F1 FR F1 TR F1 UR Veyseh et al. (2019) (0.770,0.138) (0.814,0.133) (0.764,0.198) (0.775,0.118) (0.743,0.129) SOG (This work) (0.796,0.089) (0.825,0.080) (0.820,0.109) (0.814,0.093) (0.742,0.100) Table 4: Model Variance Performance on Twitter 15. Model Accuracy F1 NR F1 FR F1 TR F1 UR Veyseh et al. (2019) (0.768,0.103) (0.825,0.226) (0.751,0.103) (0.768,0.096) (0.789,0.184) SOG (This work) (0.826,0.082) (0.843,0.153) (0.843,0.091) (0.878,0.074) (0.774,0.114) Table 5: Model Variance Performance on Twitter 16. deviation values. It is evident that in all cases, formation, as a counterpart for the already existing our model has smaller standard deviation values similarity information, preventing the predictions than that of Veyseh et al. (2019). This is proof from having a swinging bias. that our system is comparatively more stable in the face of random weight initialization. We argue For a demonstration, consider the subset of three that this stability comes from the introduction of words increase, decrease, and expand from the ex- the oppositeness component, which augments the ample given by de Silva and Dou (2019). If the decision-making process with the oppositeness in- main tweet (R0 ) were to say “A will increase B”, R1 replied with “A will decrease B”, and R2 replied 412
(a) Without the Oppositeness Component (b) With the Oppositeness Component Figure 3: t-SNE diagrams for thread representations. with “A will expand B”, then the purely semantic model is trained without the oppositeness compo- similarity based model will position R0 and R1 nent, and Fig. 3b shows the data points clustering closer than R0 and R2 , given that the word con- when the model is trained with the oppositeness texts in which increase and decrease are found are component. Note that all other variables, including more similar than the word contexts in which in- the seed for the weight initializer, are the same in crease and expand are found. This would result the two models. These diagrams prove that the op- in the neural network having to learn the oppo- positeness component helps improve the separabil- site semantics between increase and expand by ity of the classes. Specially note how the False Ru- itself, during the training, making it more vulnera- mor and True Rumor classes are now more clearly ble to issues of bad initial weight selection. This, separated. We postulate that this derives from the in turn, will result in greater variance among the fact that the oppositeness component would help in trained models as is the case of Veyseh et al. (2019). distinguishing the continuous discord happening in However, a system with an oppositeness compo- a False Rumor thread from the subsequent general nent will already have the opposite semantics be- agreement in a True Rumor thread. tween increase and decrease, as well as increase and expand already calculated. Thus, such a sys- 5 Conclusion tem would have pre-knowledge that the word pair increase and decrease, despite being used in more Rumors and fake news are a significant problem common contexts, is more semantically opposite in social networks, due to their intrinsic nature of than the word pair increase and expand, which is connecting users to millions of others and giving used in less common contexts. Hence the neural any individual the power to post anything. We in- network does not have to learn that information troduced a novel method for rumor detection, based from scratch during the training, resulting in it be- on semantic oppositeness, in this paper. We demon- ing less vulnerable to issues of bad initial weight strated the effectiveness of our method using data selection. Analogously, this, in turn, will result in sets from Twitter. Compared to previous work, lesser variance among the trained models; hence, which only used explicit structures in the reply rela- explaining the better stability demonstrated by Se- tions or semantic similarity, our model learns both mantic Oppositeness Graph in comparison to Vey- explicit and implicit relations between a main tweet seh et al. (2019) in Tables 4 and 5. and its replies, by utilizing both semantic similarity and semantic oppositeness. We proved, with exten- 4.3 Impact of the Oppositeness Component sive experiments, that our proposed model achieves Finally, to emphasize the effect the oppositeness state-of-the-art performance, while being more re- component has on the model, we draw the t-SNE sistant to the variances in performance introduced diagrams for the final representations of the threads. by randomness. Figure 3a shows the data points clustering when the 413
References Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, and Kam-Fai Wong. 2015. Detect rumors using time se- Juan Cao, Junbo Guo, Xirong Li, Zhiwei Jin, Han ries of social context information on microblogging Guo, and Jintao Li. 2018. Automatic Rumor De- websites. In CIKM, pages 1751–1754. ACM. tection on Microblogs: A Survey. arXiv preprint arXiv:1807.03505. Jing Ma, Wei Gao, and Kam-Fai Wong. 2017. Detect rumors in microblog posts using propagation struc- Carlos Castillo, Marcelo Mendoza, and Barbara ture via kernel learning. In ACL, pages 708–717. Poblete. 2011. Information Credibility on Twitter. In WWW, pages 675–684. ACM. Jing Ma, Wei Gao, and Kam-Fai Wong. 2018. Ru- mor detection on twitter with tree-structured recur- Leon Derczynski, Kalina Bontcheva, Maria Liakata, sive neural networks. In ACL, pages 1980–1989. Rob Procter, Geraldine Wong Sak Hoi, and Arkaitz Zubiaga. 2017. Semeval-2017 task 8: Rumoureval: Arthur Mettinger. 1994. Aspects of semantic opposi- Determining rumour veracity and support for ru- tion in English. Oxford University Press. mours. arXiv preprint arXiv:1704.05972. Michel Paradis, Marie-Claire Goldblum, and Raouf Patti Domm. 2013. False Rumor of Explo- Abidi. 1982. Alternate antagonism with paradoxi- sion at White House Causes Stocks to Briefly cal translation behavior in two bilingual aphasic pa- Plunge; AP Confirms Its Twitter Feed Was Hacked. tients. Brain and language, 15(1):55–69. https://www.cnbc.com/id/100646197. Jeffrey Pennington, Richard Socher, and Christopher Manish Gupta, Peixiang Zhao, and Jiawei Han. 2012. Manning. 2014. Glove: Global vectors for word rep- Evaluating event credibility on twitter. In SDM, resentation. In EMNLP, pages 1532–1543. pages 153–164. Lori Rothman and M Parker. 2009. Just-About-Right Sardar Hamidian and Mona Diab. 2019a. Gwu nlp at (JAR) Scales. West Conshohocken, PA: ASTM Inter- semeval-2019 task 7: Hybrid pipeline for rumour ve- national. racity and stance classification on social media. In Proceedings of the 13th International Workshop on Ulrich Schimmack. 2001. Pleasure, displeasure, and Semantic Evaluation, pages 1115–1119. mixed feelings: Are semantic opposites mutually ex- clusive? Cognition & Emotion, 15(1):81–97. Sardar Hamidian and Mona T Diab. 2019b. Rumor Nisansa de Silva. 2020. Semantic Oppositeness detection and classification for twitter data. arXiv for Inconsistency and Disagreement Detection preprint arXiv:1912.08926. in Natural Language. Phd dissertation, Col- lege of Arts and Sciences, University of Ore- Jay J Jiang and David W Conrath. 1997. Semantic sim- gon. Available at https://www.cs.uoregon. ilarity based on corpus statistics and lexical taxon- edu/Reports/PHD-202012-deSilva.pdf. omy. In COLING, pages 19–33. Nisansa de Silva and Dejing Dou. 2019. Semantic op- Z. Jin, J. Cao, Y. Jiang, and Y. Zhang. 2014. News positeness embedding using an autoencoder-based credibility evaluation on microblog with a hierarchi- learning model. In Database and Expert Systems cal propagation model. In ICDM, pages 230–239. Applications, pages 159–174. Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, Nisansa de Silva, Dejing Dou, and Jingshan Huang. Yu Wang, and Jiebo Luo. 2017. Detection and anal- 2017. Discovering inconsistencies in PubMed ab- ysis of 2016 US presidential election related rumors stracts through ontology-based information extrac- on twitter. In SBP-BRiMS, pages 14–24. tion. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biol- Steven Jones, M Lynne Murphy, Carita Paradis, and ogy, and Health Informatics, pages 362–371. ACM. Caroline Willners. 2012. Antonyms in English: Con- struals, constructions and canonicity. Cambridge Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob University Press. Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei you need. In NIPS, pages 5998–6008. Chen, and Yajun Wang. 2013. Prominent features of rumor propagation in online social media. In ICDM, Amir Pouran Ben Veyseh, My T. Thai, Thien Huu pages 1103–1108. IEEE. Nguyen, and Dejing Dou. 2019. Rumor detection in social networks via deep contextual modeling. In Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, ASONAM. Bernard J. Jansen, Kam-Fai Wong, and Meeyoung Cha. 2016. Detecting rumors from microblogs with Ke Wu, Song Yang, and Kenny Q Zhu. 2015. False ru- recurrent neural networks. In IJCAI, pages 3818– mors detection on sina weibo by propagation struc- 3824. tures. In ICDE, pages 651–662. 414
Z. Wu and M. Palmer. 1994. Verbs semantics and lexi- cal selection. In ACL, pages 133–138. Fan Yang, Yang Liu, Xiaohui Yu, and Min Yang. 2012. Automatic detection of rumor on sina weibo. In ACM SIGKDD Workshop on Mining Data Seman- tics, pages 1–7. Haolin Yang, Zhiwei Xu, Limin Liu, Jie Tian, and Yu- jun Zhang. 2020. Adaptive slide window-based fea- ture cognition for deceptive information identifica- tion. IEEE Access, 8:134311–134323. Zhe Zhao, Paul Resnick, and Qiaozhu Mei. 2015. En- quiring minds: Early detection of rumors in social media from enquiry posts. In WWW, pages 1395– 1405. 415
You can also read