Neural Temporal Opinion Modelling for Opinion Prediction on Twitter
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Neural Temporal Opinion Modelling for Opinion Prediction on Twitter Lixing Zhu† Yulan He†∗ Deyu Zhou§ † Department of Computer Science, University of Warwick, UK § School of Computer Science and Engineering, Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, China {Lixing.Zhu,Yulan.He}@warwick.ac.uk d.zhou@seu.edu.cn Abstract time duration. Models trained on the current inter- val are used to predict users’ opinions in the next Opinion prediction on Twitter is challenging interval. However, we argue that such a manual due to the transient nature of tweet content segmentation may not be appropriate since users and neighbourhood context. In this paper, we model users’ tweet posting behaviour as a tem- post tweets at different frequency. Also, the time in- poral point process to jointly predict the post- terval between two consecutively published tweets ing time and the stance label of the next tweet by a user is important to study the underlying opin- given a user’s historical tweet sequence and ion dynamics system and hence should be treated tweets posted by their neighbours. We design as a random variable. a topic-driven attention mechanism to capture Inspired by the multivariate Hawkes process the dynamic topic shifts in the neighbourhood (Aalen et al., 2008; Du et al., 2016), we propose context. Experimental results show that the proposed model predicts both the posting time to model a user’s posting behaviour by a temporal and the stance labels of future tweets more ac- point process that when user u posts a tweet d at curately compared to a number of competitive time t, they need to decide on whether they want baselines. to post a new topic/opinion, or post a topic/opinion influenced by past tweets either posted by other 1 Introduction users or by themselves. We thus propose a neu- ral temporal opinion model to jointly predict the Social media platforms allow users to express their time when the new post will be published and its opinions online towards various subject matters. associated stance. Instead of using the fixed for- Despite much progress in sentiment analysis in so- mulation of the multivariate Hawkes process, the cial media, the prediction of opinions, however, intensity function of the point process is automati- remains challenging. Opinion formation is a com- cally learned by a gated recurrent neural network. plex process. An individual’s opinion could be In addition, one’s neighbourhood context and the influenced by their own prior belief, their social topics of their previously published tweets are also circles and external factors. Existing studies often taken into account for the prediction of both the assume that socially connected users hold similar posting time and stance of the next tweet. opinions. Social network information is integrated with user representations via weighted links and To the best of our knowledge, this is the first encoded using neural networks with attentions or work to exploit the temporal point process for opin- more recently Graphical Convolutional Networks ion prediction on Twitter. Experimental results on (GCNs) (Chen et al., 2016; Li and Goldwasser, the two Twitter datasets relating to Brexit and US 2019). This strand of work, including (Chen et al., general election show that our proposed model out- 2018; Zhu et al., 2020; Del Tredici et al., 2019), performs existing approaches on both stance and leverages both the chronological tweet sequence posting time prediction. and social networks to predict users’ opinions. The majority of previous work requires a man- 2 Methodology ual segmentation of a tweet sequence into equally- We present in Figure 1 the overall architecture spaced intervals based on either tweet counts or of our proposed Neural Temporal Opinion Model ∗ Corresponding author (NTOM). The input to the model at time step i 3804 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3804–3810 July 5 - 10, 2020. c 2020 Association for Computational Linguistics
τi+1 yi+1 Intensity Softmax function gi … GRUcell GRUcell GRUcell … Attention hci,1 hci,2 hci,L ci LSTM hi zi τi u Bi-LSTM Bi-LSTM … Bi-LSTM Bi-LSTM VAE … … … … di,1 di,2 di,L … xb i Neighborhood context xi ith tweet i-1th tweet i+1th tweet Figure 1: Overview of the Neural Temporal Opinion Model. consists of user’s own tweet xi , bag-of-word rep- by optimising the reconstruction error between the resentation xbi , time interval τi between the i − 1th generated tweet and the original tweet. Specifically, tweet and the ith tweet, user embedding u, and we convert each tweet to the bag-of-word format neighbours’ tweet queue {di,1 , di,2 , . . . , di,L }. At weighted by term frequency, xbi , and feed it to two first, a Bi-LSTM layer is applied to extract features inference neural networks defined as fµφ and fΣφ . from input tweets. Then the neighborhood tweets These generate mean and variance of a Gaussian are processed by a stacked Bi-LSTM/LSTM layer distribution from which the latent topic vector zi is for the extraction of neighborhood context, which sampled. Then the approximated posterior would is fed into an attention module queried by the user’s be qφ (zi |xbi ) = N (zi |fµφ (xbi ), fΣφ (xbi )). To gen- own tweet hi and topic zi . The output of attention erate the observation x̃bi conditional on the latent module is concatenated with tweet representation, topic vector zi , we define the generative network time interval τi , user representation u, and topic as pϕ (xbi |zi ) = N (xbi |fµϕ (zi )), fΣϕ (zi )). The re- representation zi , which is encoded from xbi via a construction loss for the tweet xbi is then: Variational Autoencoder (VAE). Finally, the com- bined representation is sent to a GRU cell, whose Lx =Eq b b b [log pϕ (xi |zi )]−KL(qφ (zi |xi )||p(zi )) (1) φ (zi |xi ) hidden state participates in computing the intensity Neighbourhood Context Attention: To capture function and the softmax function, for the predic- the influence from the neighbourhood context, we tion of the posting time interval and the stance label first input the neighbours’ recent L tweets to an of the next tweet. In the following, we elaborate LSTM in a temporal ascending order. The output the model in more details: of the LSTM is weighed by the attention signals Tweet representation: Words in tweets are queried by the user’s ith tweet and topic: mapped to pre-trained word embeddings (Baziotis et al., 2017)1 , which is specially trained for tweets. L X Then Bi-LSTM is used to generate the tweet repre- ci = αl hci,l (2) sentation. l=1 Topic extraction: The topic representation zi in αl ∝ exp([hT T c c i , zi ]tanh(Wh hi,l + Wz zi,l )) (3) Figure 1 captures the topic focus of the ith tweet. where {hci,1 , hci,2 , . . . , hci,L } denotes the hidden It is learned by VAE (Kingma and Welling, 2014), state output of each tweet di,l in the neighbour- which approximates the intractable true posterior hood context, zi,lc denotes the associated topic, hi 1 https://github.com/cbaziotis/ is the representation of the user’s own tweet at time datastories-semeval2017-task4 step i, and both Wh and Wz are weight matrices. 3805
Brexit Election We use this attention mechanism to align the 8000 10000 Number of users user’s tweet to the most relevant part in the neigh- 6000 8000 6000 bourhood context. Our rationale is that a user 4000 4000 would attend to their neighbours’ tweets that dis- 2000 2000 cuss similar topics. The attention output ci is then 0 0 5 10 15 20 25 5 10 15 20 25 concatenated with a user’s own tweet hi and the Number of tweets Number of tweets extracted topic zi . We further enrich the represen- tation with the elapsed time τi between the post- Figure 2: Number of users versus number of tweets. ing time of the current tweet and the last posted tweet, and add a randomly initialised user vector For the stance prediction we employ the cross- u to distinguish the user from others. The final entropy loss denoted as Lstan . The final objective representation is passed to a GRU cell for the joint function is computed as: prediction of the posting time and stance label of the next tweet. L = ηLx + βLtime + γLstan (9) Temporal Point Process: The goal of NTOM is to forecast the time gap till the next post, together where η, β and γ are coefficients determining the with the stance label. Instead of modelling the time contribution of various loss functions. interval value based on regression analysis, we use the GRU (Cho et al., 2014) to simulate the temporal 3 Experiments point process. 3.1 Setup At each time step, the combined representation [ci , hi , zi , τi , u] is input to the GRU cell to itera- We perform experiments on two publicly avail- tively update the hidden state taking into account able Twitter datasets2 (Zhu et al., 2020) on Brexit the influence of previous tweets: and US election. The Brexit dataset consists of 363k tweets with 31.6%/29.3%/39.1% support- gi = fGRU (gi−1 , ci , hi , zi , τi , u) (4) ing/opposing/neutral tweets towards Brexit. The Election dataset consists of 452k tweets with where gi is the hidden state of GRU cell. Given gi , 74.2%/20.4%/5.4% supporting/opposing/neutral the intensity function is formulated as: tweets towards Trump. We filter out users who posted less than 3 tweets and are left with 20, 914 λ∗ (t) = λ(t|Hi ) = exp(bλ + vλT gi + wλ t) (5) users in Brexit and 26, 965 users in Election. We Here, Hi summarises all the tweet histories up plot in Figure 2 the number of users versus the num- to tweet i, bλ denotes the base density level, the ber of tweets and found that over 81.6% users have term vλT gi captures the influence from all previous published fewer than 7 tweets, we therefore set the tweets and wλ t denotes the influence from the in- maximum length of the tweet sequence of each user stant interval. The likelihood that the next tweet to 7. For users who have published more than 7 will be posted at the next interval τ given the his- tweets, we split their tweet sequence into multiple tory is: training sequences of length 7 with an overlapping Z τ window size of 1. For each user, we use 90% of ∗ ∗ their tweets for training and 10% (round up) for λ∗ (t)dt f (τ ) = λ (τ ) exp − (6) 0 testing. Our settings are η = 0.2, β = 0.4 and γ = 0.4. The expectation for the occurrence of the next We set the topic number to 50 and the vocabu- tweet can be estimated using: lary size to 3k for the tweet bag-of-words input Z ∞ to VAE. The mini-batch size is 16. We use Adam τ̂i+1 = τ · f ∗ (τ )dτ (7) optimizer with learning rate 0.0005 and learning 0 rate decay 0.9. The evaluation metrics are accu- Loss: We expect the predicted interval to be close racy for stance prediction and Mean Squared Error to the actual interval as much as possible by min- (MSE) for posting time prediction. The results are imising the Gaussian penalty function: compared against the following baselines: 1 −(τi+1 − τ̂i+1 )2 2 https://github.com/somethingx01/ Ltime = √ exp (8) σ 2π 2σ 2 TopicalAttentionBrexit 3806
Brexit Election 3.2 Results Model Acc. MSE Acc. MSE CSIM W 0.653 – 0.656 – We report in Table 1 the stance prediction accuracy NOD 0.675 – 0.690 – and MSE scores of predicted posting time. Com- LING+GAT 0.692 – 0.704 – pared to baselines, NTOM consistently achieves RNN 0.636 7.81 0.659 9.62 better performance on both datasets, showing the LSTM 0.677 3.37 0.683 4.51 benefit of modelling the tweet posting sequence as GRU 0.691 2.80 0.693 3.92 a temporal point process. In the second set of ex- NTOM-VAE 0.697 2.67 0.705 4.01 periments, we study the effect of temporal process NTOM-context 0.665 3.34 0.682 4.78 modelling. The results verify the benefit of using NTOM-GCN 0.680 2.65 0.706 4.29 the intensity function, with at least a 2% increase in NTOM 0.713 2.59 0.715 3.70 accuracy and 0.2 decrease in MSE compared with vanilla RNN and its variants. In the ablation study, Table 1: Stance prediction accuracy and Mean Squared the removal of neighbourhood context component Errors of predicted posting time on the Brexit and Elec- caused the largest performance decline compared tion datasets. to other components, verifying the importance of social influence in opinion prediction. Removing either VAE (for topic extraction) or intensity func- - CSIM W (Chen et al., 2018) gauges the social tion (using only GRU) results in slight drops in influence by an attention mechanism for the pre- stance prediction and more noticeable performance diction of the user sentiment of the next tweet. gaps in time prediction. It can be also observed that - NOD (Zhu et al., 2020) takes into account the using GCN to model higher-order influence in so- neighborhood context and pre-extracted topics cial networks does not bring any benefits, possibly for tweet stance prediction. due to extra noise introduced to the model. - LING+GAT (Del Tredici et al., 2019) places a GCN variant over linguistic features to extract node representations. Tweets are aggregated by 3.3 Visualisation of Topical Attention users for user-level prediction. To investigate the effectiveness of the context at- We also perform ablation study on our model tention that is queried by topics, we first select by removing the topic extraction component some example topics from the topic-word matrix in (NTOM-VAE ) or removing the neighbourhood con- VAE. The label of each topic is manually assigned text component (NTOM-context ). In addition, to based on its associated top 10 words. Then we validate that NTOM does benefit from point pro- display a tweet’s topic distribution together with its cess modelling and can better forecast the time neighborhood tweets’ topic distribution. We also and stance of the next tweet, we remove the in- visualize the attention weights assigned to the 3 tensity function (i.e. no Eq. (5)-(7)) and directly neighborhood tweets. use vanilla RNN and its variants including LSTM Figure 3 illustrates the example topics, topic and GRU to predict the true time interval. Fur- distribution and attention signals towards context thermore, to investigate if is is more beneficial to tweets. Here, x2 and x4 denote a user’s 2nd and use GCN to encode the neighbourhood context, we 4th tweets respectively. The most recent 3 neigh- learn tweet representation using GCN3 (Hamilton borhood tweets are denoted as d1 , d2 , d3 . Blue in et al., 2017), which preserves high-order influence the leftmost separate column denotes the attention in social networks through convolution. As in (Li weights, and each row on top of T 1, T 2 and T 3 de- and Goldwasser, 2019), we use a 2-hop GCN and notes the topic distribution. It can be observed that denote the variant as NTOM-GCN . For the Brexit the user’s concerned topic shifts from immigration dataset, MSE is measured in hours, while for the to Boris Johnson in 2 time steps. The drift also Election dataset it is measured in minutes due to appears in the neighbour’s tweets. Higher atten- the intensive tweets published within two days. tion weights are assigned to the neighbour’s tweets which share similar topical distribution as the user. 3 https://github.com/williamleif/ We can thus infer that the topic vector does help GraphSAGE select the most relevant neighborhood tweet. 3807
x2 yes , open borders with no way of planning strains on nhs… jith et al., 2017), rumor detection (Lukasik et al., d3 absolutely correct , so many reasons to vote leave 2016; Zubiaga et al., 2016; Alvari and Shakarian, d2 then vote leave for the sake of the fishermen . vote leave 2019) and retweet prediction (Kobayashi and Lam- d1 #brexit and we will all still be europeans free to do them all biotte, 2016). A combination of the Hawkes pro- x2 T1 T2 T3 cess with recurrent neural networks, called Recur- x4 well played tonight boris ! u absolutely smashed it ! #brexit rent Marked Temporal Pointed Process (RMTPP), d3 vote leave on thursday ! make it our independence day was proposed to automatically capture the influence d2 bbc debate remain team has two aggressive bullies… of the past events on future events, which shows d1 …eu is a closed protectionist market we pay than we ever… x4 T1 T2 T3 promising results on geolocation prediction (Du et al., 2016). Benefiting from the flexibility and Topic Top wordsp Words T1 immigration immigration, stop, free, work, change, countries, immigrants, scalability of neural networks, several work has migrants, migration, open been done in this vein including event sequence T2 Boris Johnson Boris, live, Johnson, politics, sturgeon, TV, Nicola, morning, takebackcontrol, guy prediction (Mei and Eisner, 2017) and failure pre- T3 vote remain voteremain, strongerin, Cameron, eureferendum, David, diction (Xiao et al., 2017). Our work is partly inorout, pm, eudebate, osborne, positive inspired by RMTPP, but departs from the previous Figure 3: Distribution over 3 topics and attention sig- work by jointly considering users’ social relations nals on 3 neighborhood tweets, respectively in 2 time and topical attentions for stance prediction on so- steps. Topics are labelled based on the top 10 words. cial media. 5 Conclusion 4 Related Work In this paper, we propose a novel Neural Temporal The prediction of real-time stances on social media Opinion Model (NTOM) to address users’ chang- is challenging, partly caused by the diversity and ing interest and dynamic social context. We model fickleness of users (Andrews and Bishop, 2019). users’ tweet posting behaviour based on a temporal A line of work mitigated the problem by taking point process for the joint prediction of the post- into account the homophily that users are similar to ing time and stance label of the next tweet. Ex- their friends (McPherson et al., 2001; Halberstam perimental results verify the effectiveness of the and Knight, 2016). For example, Chen et al. (2016) model. Furthermore, visualisation of the topics gauged a user’s opinion as an aggregated stance and attention signals shows that NTOM captures of their neighborhood users. Linmei et al. (2019) the dynamics in the focused topics and contextual took a step further by exploiting the extracted top- attention. ics, which discern a user’s focus on neighborhood tweets. Recent advances in this strand also include Acknowledgments the application of GCNs, with which the social relationships are leveraged to enrich the user repre- This work was funded in part by EPSRC (grant sentations (Li and Goldwasser, 2019; Del Tredici no. EP/T017112/1). LZ is funded by the Chan- et al., 2019). cellor’s International Scholarship of the University On the other hand, several work has utilized the of Warwick. DZ was partially funded by the Na- chronological order of tweets. Chen et al. (2018) tional Key Research and Development Program of presented an opinion tracker that predicts a stance China (2017YFB1002801) and the National Natu- every time a user publishes a tweet, whereas (Zhu ral Science Foundation of China (61772132). The et al., 2020) extended the previous work by intro- authors would also like to thank Akanni Adewoyin ducing a topic-dependent attention. Shrestha et al. for insightful discussions. (2019) considered diverse social behaviors and jointly forecast them through a hierarchical neural network. However, the aforementioned work re- References quires a manual segmentation of a tweet sequence. Odd Aalen, Ornulf Borgan, and Hakon Gjessing. 2008. Furthermore, they are unable to predict when a user Survival and event history analysis: a process point will next publish a tweet and what its associated of view. Springer Science & Business Media. stance is. These problems can be addressed using Hamidreza Alvari and Paulo Shakarian. 2019. the Hawkes process (Hawkes, 1971), which has Hawkes process for understanding the influence been successfully applied to event tracking (Sri- of pathogenic social media accounts. In 2019 2nd 3808
International Conference on Data Intelligence and Diederik P Kingma and Max Welling. 2014. Auto- Security (ICDIS), pages 36–42. IEEE. encoding variational bayes. stat, 1050:1. Nicholas Andrews and Marcus Bishop. 2019. Learning Ryota Kobayashi and Renaud Lambiotte. 2016. Tideh: invariant representations of social media users. In Time-dependent hawkes process for predicting Proceedings of the 2019 Conference on Empirical retweet dynamics. In Tenth International AAAI Con- Methods in Natural Language Processing and the ference on Web and Social Media. 9th International Joint Conference on Natural Lan- guage Processing (EMNLP-IJCNLP), pages 1684– Chang Li and Dan Goldwasser. 2019. Encoding so- 1695. cial information with graph convolutional networks forpolitical perspective detection in news media. In Christos Baziotis, Nikos Pelekis, and Christos Doulk- Proceedings of the 57th Annual Meeting of the Asso- eridis. 2017. DataStories at SemEval-2017 task 4: ciation for Computational Linguistics, pages 2594– Deep LSTM with attention for message-level and 2604. topic-based sentiment analysis. In Proceedings of the 11th International Workshop on Semantic Eval- Hu Linmei, Tianchi Yang, Chuan Shi, Houye Ji, and uation (SemEval-2017), pages 747–754, Vancouver, Xiaoli Li. 2019. Heterogeneous graph attention net- Canada. Association for Computational Linguistics. works for semi-supervised short text classification. In Proceedings of the 2019 Conference on Empirical Chengyao Chen, Zhitao Wang, Yu Lei, and Wenjie Li. Methods in Natural Language Processing and the 2016. Content-based influence modeling for opin- 9th International Joint Conference on Natural Lan- ion behavior prediction. In Proceedings of COLING guage Processing (EMNLP-IJCNLP), pages 4823– 2016, the 26th International Conference on Compu- 4832. tational Linguistics: Technical Papers, pages 2207– 2216. Michal Lukasik, PK Srijith, Duy Vu, Kalina Bontcheva, Arkaitz Zubiaga, and Trevor Cohn. 2016. Hawkes Chengyao Chen, Zhitao Wang, and Wenjie Li. 2018. processes for continuous time sequence classifica- Tracking dynamics of opinion behaviors with a tion: an application to rumour stance classification content-based sequential opinion influence model. in twitter. In Proceedings of the 54th Annual Meet- IEEE Transactions on Affective Computing. ing of the Association for Computational Linguistics KyungHyun Cho, Bart van Merrienboer, Dzmitry Bah- (Volume 2: Short Papers), pages 393–398. danau, and Yoshua Bengio. 2014. On the properties Miller McPherson, Lynn Smith-Lovin, and James M of neural machine translation: Encoder-decoder ap- Cook. 2001. Birds of a feather: Homophily in social proaches. CoRR, abs/1409.1259. networks. Annual review of sociology, 27(1):415– Marco Del Tredici, Diego Marcheggiani, 444. Sabine Schulte im Walde, and Raquel Fernández. 2019. You shall know a user by the company it Hongyuan Mei and Jason M Eisner. 2017. The neural keeps: Dynamic representations for social media hawkes process: A neurally self-modulating multi- users in nlp. In Proceedings of the 2019 Conference variate point process. In Advances in Neural Infor- on Empirical Methods in Natural Language Process- mation Processing Systems, pages 6754–6764. ing and the 9th International Joint Conference on Prasha Shrestha, Suraj Maharjan, Dustin Arendt, and Natural Language Processing (EMNLP-IJCNLP), Svitlana Volkova. 2019. Learning from dynamic pages 4701–4711. user interaction graphs to forecast diverse social be- Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upad- havior. In Proceedings of the 28th ACM Interna- hyay, Manuel Gomez-Rodriguez, and Le Song. tional Conference on Information and Knowledge 2016. Recurrent marked temporal point processes: Management, pages 2033–2042. ACM. Embedding event history to vector. In Proceedings PK Srijith, Michal Lukasik, Kalina Bontcheva, and of the 22nd ACM SIGKDD International Conference Trevor Cohn. 2017. Longitudinal modeling of so- on Knowledge Discovery and Data Mining, pages cial media with hawkes process based on users and 1555–1564. ACM. networks. In Proceedings of the 2017 IEEE/ACM Yosh Halberstam and Brian Knight. 2016. Homophily, International Conference on Advances in Social Net- group size, and the diffusion of political information works Analysis and Mining 2017, pages 195–202. in social networks: Evidence from twitter. Journal of Public Economics, 143:73–88. Shuai Xiao, Junchi Yan, Xiaokang Yang, Hongyuan Zha, and Stephen M Chu. 2017. Modeling the in- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. tensity function of point process via recurrent neural Inductive representation learning on large graphs. In networks. In Thirty-First AAAI Conference on Arti- Advances in Neural Information Processing Systems, ficial Intelligence. pages 1024–1034. Lixing Zhu, Yulan He, and Deyu Zhou. 2020. Neural Alan G Hawkes. 1971. Spectra of some self-exciting opinion dynamics model for the prediction of user- and mutually exciting point processes. Biometrika, level stance dynamics. Information Processing & 58(1):83–90. Management, 57(2):102031. 3809
Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, and Michal Lukasik. 2016. Stance classifi- cation in rumours as a sequential task exploiting the tree structure of social media conversations. In Pro- ceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Techni- cal Papers, pages 2438–2448. 3810
You can also read