Twitter Summarization Based on Social Network and Sparse Reconstruction
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Twitter Summarization Based on Social Network and Sparse Reconstruction ∗ Ruifang He, Xingyi Duan School of Computer Science and Technology, Tianjin University, Tianjin 300350, China Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin 300350, China {rfhe,xingyiduan}@tju.edu.cn Abstract tion and aid agencies monitor crisis progress so as to assist recovery and provide disaster relief. With the rapid growth of microblogging services, such as Twitter, a vast of short and noisy messages are produced by Despite document summarization has been researched for millions of users, which makes people difficult to quickly many years, it is still a knotty problem due to the large grasp essential information of their interested topics. In this scale short, noisy and informal nature of messages in so- paper, we study extractive topic-oriented Twitter summariza- cial media, such as tweets. Existing Twitter summariza- tion as a solution to address this problem. Traditional sum- tion approaches usually regard tweets as sentences, and marization methods only consider text information, which is adopt traditional summarization methods (Inouye and Kalita insufficient in social media situation. Existing Twitter sum- 2011), such as SumBasic (Vanderwende et al. 2007), Cen- marization techniques rarely explore relations between tweets troid (Radev, Blair-Goldensohn, and Zhang 2001), LexRank explicitly, ignoring that information can spread along the (Erkan and Radev 2004) and TextRank (Mihalcea and Tarau social network. Inspired by social theories that expression 2004) to validate the relevant performance on microblog- consistence and expression contagion are observed in so- cial network, we propose a novel approach for Twitter sum- ging posts. However, it is not clear whether adding the com- marization in short and noisy situation by integrating So- plexity of methods will improve the performance of Twitter cial Network and Sparse Reconstruction (SNSR). We ex- summarization. Some other researches (Chang et al. 2016; plore whether social relations can help Twitter summariza- Liu et al. 2012) explore to utilize static social features ex- tion, modeling relations between tweets described as the so- cept for textual content, such as number of replies, number cial regularization and integrating it into the group sparse of retweets, number of likes, author popularity (i.e. number optimization framework. It conducts a sparse reconstruction of followers for a given tweet’s author) and temporal signals. process by selecting tweets that can best reconstruct the orig- All the above methods ignore the fact that Twitter data inal tweets in a specific topic, with considering coverage and sparsity. We simultaneously design the diversity regulariza- is networked. There exist some researches (Chang et al. tion to remove redundancy. In particular, we present a mathe- 2013; Duan et al. 2012) exploiting social network informa- matical optimization formulation and develop an efficient al- tion. These approaches mainly consider network informa- gorithm to solve it. Due to the lack of public corpus, we con- tion from the user-level perspective, assuming that high au- struct the gold standard twitter summary datasets for 12 dif- thority users are more likely to post salient tweets. However, ferent topics. Experimental results on this datasets show the tweets are also potentially networked through user connec- effectiveness of our framework for handling the large scale tions. Different from traditional methods, which obtain as- short and noisy messages in social media. sociated tweet information through measuring similarity be- tween tweets purely based on content information, the net- worked tweets may contain more semantic clues than purely Introduction text-based methods. So we need to explore a new method for Twitter has become one of the most popular social network modeling the tweet-level networked information. platforms, through which amounts of users can freely pro- The social theories indicate the reciprocal influence of duce content (called tweets) on their interested topics. How- networked information. People themselves are more likely ever, the rapid growth of tweets makes it difficult for people to keep the same sentiment (Hu et al. 2013), preference to quickly grasp essential information. Twitter summariza- (Wang et al. 2015) on a specific topic in a short period, and tion aims to generate a succinct summary delivering the core this phenomenon is called expression consistency. More- information from a sheer volume of tweets in a given topic. over, relationship between people is established through a It can be used to help ordinary people fastly acquire informa- series of interactions and feedbacks. The influence is subtle ∗ This work is supported by NSFC (61472277) and partly sup- and can make a great impact on ones’ preference, speaking ported by National Program on Key Basic Research Project (973 manner or even expression content. Thus people gradually Program, 2013CB329301) and NSFC (61772361). have similar viewpoints about a topic with their friends and Copyright c 2018, Association for the Advancement of Artificial show them with almost the similar tone and words, which Intelligence (www.aaai.org). All rights reserved. is regarded as expression contagion. Inspired by these two 5787
social theories, we explore how to utilize them for Twitter solve the classical challenges existed in summarization, in- summarization. cluding coverage, salience, and diversity. Further improved Recently, sparse reconstruction based summarization researches are contained in (Yao, Wan, and Xiao 2015; methods have been proposed (He et al. 2012; Liu, Yu, and Liu, Yu, and Deng 2016). However, the large scale short and Deng 2016; Yao, Wan, and Xiao 2015), and show a signifi- noisy texts in social media make these methods unsuitable cant performance on traditional corpus DUC/TAC. It is also for twitter. because that social information can be seamlessly combined Twitter Summarization. The prosperity of social media into the sparse reconstruction based method. In this paper, impels people to explore the adaptation of traditional sum- we propose to integrate social network into a unified op- marization methods (Inouye and Kalita 2011) on twitter, in- timization framework for Twitter summarization from the cluding Hybrid TF-IDF model and phrase reinforcement al- perspective of sparse reconstruction, through modeling the gorithm to find the most commonly used phrase as summary tweet-level networked information. It assumes that a good (Sharifi, Hutton, and Kalita 2010; Nichols, Mahmud, and summary can best reconstruct the original corpus, and better Drews 2012). All these methods only consider text informa- address the coverage, sparsity and diversity of summary in tion. However, social media platform can provide us much social media. Our contributions are summarized as follows: more rich information other than texts in Twitter. (Duan et • From the statistical perspective, we verify the existence of al. 2012; Liu et al. 2012) extended the PageRank algorithm two social theories in twitter data and formally define the through incorporating social properties. (Alsaedi, Burnap, problem of Twitter summarization to enable the utiliza- and Rana 2016) proposed three methods for event summa- tion of social network; rization, by using the temporal information and retweet in- formation. (Chang et al. 2013; 2016) regarded Twitter sum- • Model the tweet-level networked information as a social marization as a supervised classification task through mining regularization through integrating social network into the rich social features, such as temporal signal and user influ- sparse reconstruction-based method; ence. These approaches mainly use the static social infor- • Design the group sparsity regularization for Twitter sum- mation or user-level network information, and don’t further marization to keep salient tweets from the corpus-level, explore tweet-level networked relations which may contain and the diversity regularization to avoid the more serious much more potential semantic clues. redundancy bought by social network; Social Network Propagation. Social theories and so- • Construct 12 gold standard topic-oriented tweet datasets cial network analysis provide us useful insights to com- by asking 24 volunteers to manually select the most infor- bine topology and content in Twitter summarization. So- mative tweets, all in 48 expert summaries; cial network propagation also known as social influence or network influence has been researched in several domains, • We empirically evaluate the proposed SNSR framework such as sentiment analysis (Hu et al. 2013), topic identi- on this datasets, elaborate the effectiveness of social net- fication (Wang et al. 2015), topic detection (Bi, Tian, and work, and validate the new designed sparsity and diversity Sismanis 2014), and network inference (He et al. 2015). schemes. From these researches, we know that sentiment and topic can spread along the network. In this paper, we will further Related Work explore how expression content, which is the carrier of sen- Our proposed method belongs to the extractive and unsu- timent and topic, can spread along the network and influence pervised style. Therefore, we mainly review the relevant re- Twitter summarization. Social regularization considering so- searches. cial network propagation can be seamlessly integrated into Multi-Document Summarization. Lots of traditional sparse reconstruction based summarization methods. There- methods extract the result summary from top sentences with fore, we further study the coverage, salience and diversity the highest scores, through assigning salient scores to sen- challenges of summarization in social media from the sparse tences of the original document. The computation strate- reconstruction perspective. gies of salience include: (1) Feature based methods, includ- ing Centroid and SumBasic, consider the frequency and Problem Statement the position of word to measure the sentence weight; (2) Graph based methods are the PageRank like algorithms, Assume that the tweets corpus in a specific topic are repre- such as LexRank and TextRank built by random walk on sented as a weighted term frequency inverse tweet frequency sentence or word graph. However, these methods face the matrix, denoted as S = [t1 , t2 , . . . , tn ] ∈ Rm×n , where redundancy problem. Some researches propose to use clus- m is the size of vocabulary and n is the total number of ter based strategies to keep the diversity of summary to tweets. Each column ti of S stands for a single tweet vector. avoid the redundancy (Cai et al. 2010; Wang et al. 2011; U ∈ Rd×n denotes the user-tweet matrix, where Uij = 1 Shen, Li, and Ding 2010; Wang et al. 2009; Gao et al. 2012; means that the jth tweet is posted by the ith user. We con- Litvak et al. 2015). They mainly use topic modeling, clus- struct the user-user matrix F ∈ Rd×d according to the fol- ter algorithms or matrix factorization to produce the more lowing relationship, and Fij = 1 indicates that the ith user coverage summary. Recently, the appearance of sparse re- is related to the jth user. construction based summarization methods, originally pro- From the notation above, we formally describe Twitter posed by (He et al. 2012), brings us new possibility to re- summarization in short and noisy social media texts: 5788
social media mining (Harrigan, Achananuparp, and Lim Table 1: Statistics of the Data sets 2012). The analysis indicates that the members of a so- Osama Joplin Mavs Oslo Date 0501 0522 0612 0722 cial network often exhibit correlated behavior, sentiment and # of Tweets 4780 2896 3859 4571 # of Users 1309 1082 1780 1026 topic can be diffused through network. Consistency means Max Degree of Users 69 68 76 77 that social behaviours conducted by the same person keep Min Degree of Users 1 1 1 1 consistent in a short period of time. Contagion means that Max Tweets Number of Users 42 93 92 56 Min Tweets Number of Users 2 1 1 2 friends can influence each other. In this subsection, we in- Ave. Tweets per User 3.65 2.68 2.18 4.46 vestigate expression consistency and expression contagion P-value(Consistency) 4.78e-125 2.1e-98 9.08e-211 2.62e-131 P-value(Contagion) 1.82e-33 6.6e-09 8.09e-08 4.98e-19 under a given topic for Twitter summarization. In our work, we redefine and explore the consistency and contagion as: • Expression consistency: Whether the tweets posted by Given a topic oriented Twitter corpus C with their content the same user are more consistent than two randomly se- S and social context including user-tweet matrix U and user- lected tweets? user matrix F , we aim to learn the reconstruction coefficient matrix W to automatically produce a summary. • Expression contagion: Whether the two tweets posted by friends are more similar than the two randomly selected Data and Observations tweets? Due to the lack of public Twitter summarization corpus, in To verify the two questions, we measure the distance be- this section, we first introduce how to collect the data, and tween two tweets as Dij = ti − tj 2 , where ti denotes the the construction scheme of ground truth corpus is listed in vector of the ith tweet. The more similar the two tweets, the experiment section. Then, we explore whether the social the- more Dij tends to 0. For the first question, we construct two ories can bring some motivating insights for Twitter summa- vectors named as consc and consr with equal number of rization. elements. Each element of the first vector is obtained by cal- culating the distance of two tweets posted by the same user, Data and each element of the second vector is obtained by calcu- We use the public Twitter data collected by University of lating the distance of two randomly selected tweets. Then we Illinois1 as the raw data. According to the hashtags, we ex- conduct the two-sample t-test on the two vectors consc and tract twelve popular topics happening in May, June and July consr . The null hypothesis is that there is no difference be- 2012, including politics, science and technology, sports, nat- tween the two vectors, H0 : consc = consr . The alternative ural disasters, terrorist attacks and entertainment gossips. hypothesis is that the distance between two tweets posted by Each topic can have multiple hashtags, such as “#osama” the same user is less than that of those randomly selected and “#osamabinladen”. Then we search the tweets which tweets, H1 : consc < consr . contain any of these hashtags or any of the keywords ob- Similarly, to ask the second question, we construct two tained by getting rid of “#” from hashtags. Through ob- vectors named as contc and contr with equal number of el- serving the topic trends of tweet number over time, there ements. Each element of the first vector is obtained by cal- are mainly emergence and hot event. To consider expres- culating the distance of two tweets posted by friends, and sion consistency and expression contagion in a short time each element of the second vector is obtained by calcu- interval, we further collect tweets within five days after the lating two randomly selected tweets. We also conduct the emergencies occurred (e.g. Oslo terrorist attack) and tweets two-sample t-test on the two vectors contc and contr . The within five days before and after the hot event occurred (e.g. null hypothesis H0 : contc = contr , shows that there Harrypotter). After obtaining the topic-oriented data, we fil- is no difference between two tweets posted by friends and ter some tweets beforehand if they satisfy one of the follow- those randomly selected tweets. The alternative hypothesis ing conditions: H1 : contc < contr , shows that the distance between two tweets posted by friends is less than those randomly selected • Appear more than one time (only remain one of them); tweets. For all the topics, the consistency null hypothesis and • The number of words is less than 3 other than hashtags, the contagion null hypothesis are rejected respectively at sig- keywords, mentions, URL and stop words; nificance level α = 0.01 with p-values presented in the last • The user of a certain tweet is independent of others; two rows of Table 1. This observation provides strong evidence for the exis- Due to the limited space, the statistics of partial topics are tence of expression consistency and expression contagion. shown in Table 1. In the next section, we elaborate how to exploit these social Observations of Social Theories for Twitter theories for Twitter summarization. Summarization Our Approach Social theories, such as consistency (P.Abelson 1983) and The large scale short and noisy texts in social media bring contagion (Shalizi and Thomas 2011; Harrigan, Achananu- more serious data sparseness, and content redundancy due parp, and Lim 2012), have been proved to be useful for to social network propagation. We investigate the new chal- 1 lenges from Twitter summarization through the perspective https://wiki.illinois.edu/wiki/display/forward/Dataset-UDI -TwitterCrawl-Aug2012 of sparse reconstruction. 5789
SR - Sparse Reconstruction user-user matrix F , tweet-tweet correlation matrix for ex- Coverage Through treating Twitter summarization as an pression consistency Tcons is defined as Tcons = U T × U , issue of sparse reconstruction, original corpus should be re- where Tcons = 1 denotes two tweets are posted by the constructed by summary tweets. Given the original corpus same user. For expression contagion, Tcont is defined as C, we can formally describe the reconstruction process as: Tcont = U T × F × U , where Tcont = 1 denotes two tweets are posted by friends. Then we obtain the tweet-tweet corre- 1 lation matrix, which can be either Tcons , Tcont or the combi- min S − SW (1) W 2 nation T = Tcons + dTcont , where d is a balance parameter between the two relations. In this paper, we can simply set Where W = [W∗1 , W∗2 , . . . , W∗n ] ∈ Rn×n is the re- d = 1 to construct a relation matrix. Tij = 1 denotes two construction coefficient matrix, and each column W∗j = tweets have a correlated connection, otherwise Tij = 0. We [W1j , W2j , . . . Wnj ] is a vector of coefficients used for rep- resenting tweet tj , and each element Wij of W∗j denotes define the reconstruction matrix of S as Ŝ = SW , so graph the proportion of tweet ti in reconstructing tweet tj . To con- Lasso penalty term, namely social regularization is formu- sider the original tweets can be regarded as the non-negative lated as: linear combination of summary tweets, we add constraint 1 n n W ≥ 0 . To avoid tweet being reconstructed by itself, say- Ωgraph = Tij Ŝ∗i − Ŝ∗j ing the reconstruction matrix W ≈ I, we add the constraint 2 i=1 j=1 diag(W ) = 0. The formula Eq.(1) can be transformed as: m (6) = Ŝi∗ D − T Ŝi∗ T 1 i=1 min S − SW (2) W 2 = tr(SW LW T S T ) s.t. W ≥ 0, diag(W ) = 0 Where tr(·) denotes trace of a matrix, L = D − T is the nmatrix, D ∈ R n×n Sparsity - Group Lasso Due to that we only select a few Laplacian is a diagonal matrix with tweets as a summary to reconstruct the original corpus, and Dii = j=1 T ij , and each diagonal element denotes the thus not all the tweets have an impact on reconstructing one degree of a tweet in matrix T . certain tweet. Lots of coefficients of each column should Finally, we incorporate the social regularization Eq.(6) tend to be zero. Inspired by sparse coding (Ye and Liu 2012), into Eq.(3) as: we regard each tweet as a group. The problem turns into se- 1 lecting a subset of these groups to reduce the reconstruction min S − SW + αtr(SW LW T S T ) W 2 loss. We add l2,1 norm constraint on W , so the entire objec- (7) + λ W 2,1 tive function is transformed as: s.t. W ≥ 0, diag(W ) = 0 1 min S − SW + λ W 2,1 (3) where α is the parameter of social regularization. W 2 s.t. W ≥ 0, diag(W ) = 0 Diversity-Model the Redundancy Information Where Redundancy removal has always been the focus of sum- n marization researches. Social studies show (Harrigan, W 2,1 = W (i, :)2 (4) Achananuparp, and Lim 2012) that reciprocal ties and cer- i=1 tain triadic structures substantially increase social conta- Ã gion, yet which also brings the more inherent redundancy n and the lack of novelty of messages in a certain social net- and W (i, :)2 = Wij 2 (5) work. Therefore, this kind of redundancy challenge will be j=1 more serious than that of traditional summarization. Sparse reconstruction based methods tend to select tweets SN - Model Networked Tweet-level Information that cover the whole corpus, yet there is no explicit tendency To reduce the reconstruction error and make a rectification to select tweets containing different aspects of a topic. (Liu, during the process of reconstruction, we exploit social theo- Yu, and Deng 2016) introduced a correlation term to control ries to model the networked tweet-level information as a so- the diversity, and their optimization process is very com- cial regularization. This means two correlated tweets which plex. (Yao, Wan, and Xiao 2015) introduced a dissimilar- are originally close should keep close during the reconstruc- ity matrix, which greatly reduces the optimization complex- tion. Essentially, we need to build a graph Lasso (Ye and ity. However, the computational method of this matrix is not Liu 2012). suitable for tweets due to the nature of the large scale short In order to utilize the social relations for Twitter sum- and noisy texts in social media, since it measures the en- marization, we model the two mentioned social theories to coding cost for each word with sentence length or vocabu- construct tweet-tweet correlation graph through transform- lary size. This method makes each dissimilarity value pretty ing the user-tweet relations and social relations into tweet- large, and leads each element of W closing to zero, so it is tweet correlation relations. Given user-tweet matrix U and confused to identify the salience of a tweet. 5790
Inspired by the dissimilarity matrix, we introduce a rel- Algorithm 1 An Efficient Optimization Algorithm for atively simple but effective cosine similarity matrix ∇, and SNSR each element ∇ij ∈ [0, 1] denotes the cosine similarity be- Input: S, U, F, ∇, W0 , α, γ, λ, θ, tween tweet ti and tweet tj . In the process of sparse re- Output: W construction, we add constraint diag(W ) = 0 to avoid 1: Initialize μ0 = 0, μ1 = 1, W1 = W0 , lr = 0.1 tweets reconstructing themselves. Based on this knowledge, 2: A = U T × U + U T × F × U, L = D − A, ∇ = ∇ ≥ θ we have reason to avoid tweets being reconstructed by those 3: for iter = 0,1,2,. . . do tweets pretty similar to them. Considering the example be- 4: V = W1 + μ0μ−1 1 (W1 − W0 ) low: ∂f (W ) 5: = S T SW1 − S T S + γ∇ + αS T SW1 L • Tweet1: the mood was solemn at the garden of reflec- 6: loop ∂W tion in lower makefield following the death of osama bin 1 ∂f (W ) 7: U = V − lr laden. video: http://fb.me/tof3pqok ∂W 8: for each row Ui∗ of U do • Tweet2: the mood was solemn at the garden of reflection 9: Wi∗ = Sλ/lr (Ui∗ ) in lower makefield following osama bin laden’s death. 10: end for video: http://bit.ly/l9tvdw 11: W = W − diag(W ), W = max(W, 0) Obviously the above two tweets are similar to each other, 12: if f (W ) ≤ Glr,V (W ) then it will lead that both of the reconstruction coefficients W12 13: break and W21 close to 1. Therefore, this fact raises the salience of 14: end if two tweets throughout the corpus, and brings more redun- 15: lr = 2 ∗ lr dancy. Through the preliminary experiments, we can dis- 16: end loop cover lots of similar pairs presented in the final summary 17: Set f unV al(iter) = f (W ) + λW 2,1 without handling the diversity. To better avoid the “similar” 18: if |f unV al(iter) − f unV al(iter − 1)| ≤ then reconstruction phenomena, we design ∇ as: 19: break ® 20: end if 1 if ∇ij ≥ θ, 21: W0 = W1 ∇ij = (8) W1 = W 0 otherwise 22: 23: μ0 = μ1 √ where θ is the threshold used to distinguish the similar pairs (1+ 1+4μ21 ) and normal pairs. Then we formally introduce the diversity 24: μ1 = 2 regularization term, 25: end for n n tr(∇T W ) = ∇ij Wij where z ≥ 0 is the radius of the 2,1 −ball, and there is a i=1 j=1 one-to-one correspondence between λ and z. into Eq.(7), so the objective function is transformed as: We omit details of our mathematical derivations due 1 to the limited space. Interested reader may reference (Hu min S − SW + αtr(SW LW T S T ) W 2 et al. 2013) and SLEP package2 (Sparse Learning with (9) + γtr(∇T W ) + λ W 2,1 Efficient Projects). The entire algorithm is described in s.t. W ≥ 0, diag(W ) = 0 Algorithm 1, where Å ãT where γ is the parameter of diversity regularization term. ∂f (V ) Glr,V (W ) = f (V ) + tr( (W − V )) By solving Eq.(9), the ranking score of tweet ti is calcu- ∂V (11) lated as: lr 2 Score(ti ) = W (i, :)2 + W − V F 2 We select tweets according to this ranking score to form the and the shrinkage operator in Line 9 is defined as: final summary. λ Optimization Algorithm for SNSR Sλ/lr = max(1 − , 0)Ui∗ (12) lrUi∗ 2 Inspired by (Ji and Ye. 2009; Nesterov and Nesterov 2004; Hu et al. 2013), we derive an efficient algorithm to solve the optimization problem in Eq.(9). Objective function can be Experiments equivalently expressed as: Ground Truth and Evaluation Metric 1 In order to evaluate our approach, we construct the ground min f (W ) = S − SW + αtr(SW LW T S T ) truth (expert summaries) Corpus for Twitter Summarization W 2 (10) (CTS). For each of the twelve topics, we ask four volunteers + γtr(∇T W ) 2 s.t. W ≥ 0, diag(W ) = 0, W 2,1 ≤ z http://www.yelab.net/software/SLEP/ 5791
(a) Social influence. From left to right are SNSR and SNSR-social, SNSR-sparse and SNSR-social-sparse, SNSR-div and SNSR-social-div, SNSR-sparse-div and SNSR-social-sparse-div (b) Sparse influence. From left to right are SNSR and SNSR-sparse, SNSR-social and SNSR-social-sparse, SNSR-div and SNSR-sparse-div, SNSR-social-div and SNSR-context-sparse-div (c) Diversity influence. From left to right are SNSR and SNSR-div, SNSR-social and SNSR-social-div, SNSR-sparse and SNSR-sparse-div, SNSR-social-sparse and SNSR-social-sparse-div Figure 1: The influences of social regularization, sparse regularization and diversity regularization to selects 25 tweets as a summary respectively, altogether tweet; LexRank (Erkan and Radev 2004) ranks tweets by 48 expert summaries. Then we ask three other volunteers to the PageRank like algorithm; LSA (Gong and Liu. 2001) score all summaries based on coverage, diversity and writ- exploits SVD (singular value decomposition) to decompose ing quality of tweets in range [1, 5]. If only 0-6 tweets are the TF-IDF matrix, and then selects the highest ranked satisfactory, then this summary is scored as 1, 12 as 2, 18 as tweets from each right singular vector; NNMF (Park et al. 3, 24 as 4, and if all the tweets are good, we score it 5. The 2007) performs non-negative matrix factorization on the TF- higher score, the more possible it is a better summary. We IDF matrix, and chooses tweet with the maximum prob- remain the summaries whose scores are greater or equal to ability in each cluster; And two other methods based on 3, and require those low-quality summaries to be modified sparse reconstruction are namely DSDR (He et al. 2012), until they are eligible. and MDS-Sparse (Liu, Yu, and Deng 2016). The SNSR- We use ROUGE as our evaluation metric (Lin 2004), div, SNSR-sparse, SNSR-social are the degradation mod- which measures the overlapping N-grams between expert els of SNSR, “-” denotes deleting the corresponding diver- summaries and the model summary. In our experiment, we sity, group sparse and social regularizations from our model report the F-measures of ROUGE-1, 2, and ROUGE-SU4. SNSR. The “-” setting rules of models in Figure 1 are simi- lar. Performance Evaluation Through the overall comparisons seen in Table 2, we have the following observations: Since our model belongs to the extractive and unsupervised style, we only compare with the relevant systems, includ- • Our model outperforms all the baselines, and is below the ing text and sparse reconstruction based methods. The upper upper bound. Yet it is worthy to be noted that all the meth- bound of human summary and the baselines shown in Table ods make a high performance, especially for ROUGE-1. 2 are as follows: This phenomenon can be explained that our task is topic- Expert denotes the average mutual assessment of expert oriented and we collect tweets according to the hashtags, summaries. Random selects tweets randomly; Centroid thus tweets tend to have the coherent content. It also may (Radev, Blair-Goldensohn, and Zhang 2001) ranks tweets by be due to that we conduct an effective preprocessing; calculating the similarity between tweets and pseudo-center • Among all the comparison experiments, the methods us- 5792
Table 2: Performance on the Twitter data Table 3: Average improvement of different regularization in System ROUGE-1 ROUGE-2 ROUGE-SU4 SNSR over the degradation models Expert 0.47814 0.16337 0.20389 Regularization ROUGE-1 ROUGE-2 ROUGE-SU4 Random 0.41701 0.09439 0.14231 social 4.84% 46.01% 23.69% Centroid 0.38190 0.12384 0.15668 sparse 9.98% 21.03% 14.87% LexRank 0.42046 0.13273 0.17366 diversity 10.27% 6.34% 12.51% LSA 0.43474 0.13023 0.16625 NNMF 0.43784 0.13321 0.17433 DSDR 0.43236 0.12946 0.16521 MDS-Sparse 0.42240 0.10060 0.14666 regularization and then diversity regularization term. Es- SNSR-div 0.40191 0.12940 0.15894 pecially for ROUGE-2 and ROUGE-SU4, adding so- SNSR-sparse 0.43327 0.13692 0.17749 cial regularization outperforms the degradation models SNSR-social 0.43236 0.10271 0.15379 of 46.01% and 23.69% respectively, which demonstrates SNSR 0.44887 0.13882 0.18147 that adding social regularization tends to select tweets close to expert summaries. • It is noted that diversity regularization makes a better per- ing matrix factorization, especially NNMF shows a com- formance in ROUGE-1 than the other two terms, which parable performance. The probable reasons come from demonstrates that redundancy removal indeed decreases two aspects: (1) NNMF can also be regarded as a recon- duplicated words and increases words coverage in com- struction method, furthermore, it is similar to (Li et al. parison with expert summaries. 2017) that exploits aspect term vectors to reconstruct the original term space; (2) It solves coverage and diversity For social regularization, we just simply set d = 1, and challenges to some degree by mining sub-topics. have not discussed the different influences of consistency re- lation and contagion relation due to the limited space. They • In comparison with three degradation models, both so- will be further explored in the future. cial regularization, diversity regularization are useful for SNSR, and group sparsity regularization is also effec- Parameters Settings and Tunings tive in obtaining salient tweets patterns from corpus-level In our experiment, there are mainly four parameters to be rather than cluster-level. analyzed: social regularization α, diversity regularization γ, • Through observing the last four rows in Table 2, we can group sparse regularization λ and diversity threshold param- discover that social regularization has an obviously effect eter θ. We tune four parameters greedily through setting step on ROUGE-2 and ROUGE-SU4, and diversity regulariza- size, such as α in range [0, 1], setting step size as 0.01. tion dose well on ROUGE-1. Through preliminary experiments, we set α = 0.03, In summary, our SNSR method achieves the better per- λ = 1, and γ = 1. And we set the similarity threshold formance. It suggests that integrating social network infor- θ = 0.1, which is consistent with the observation that the mation into the proposed sparse reconstruction framework similarity between sentences is mostly distributed in the area helps improve Twitter summarization. Mining the group of [0, 0.1]. Through this, we try to avoid the “similar” recon- sparsity patterns of salient tweets and designing the diver- struction phenomena. sity regularization in terms of redundancy brought by social network are also effective. Conclusion The large scale short and noisy texts in social media bring Effect of Different Regularization new challenges for summarization, which make traditional To further evaluate the effectiveness of (1) Social regu- document summarization methods unsuitable for twitter. In larization, we conduct four groups of comparison experi- this paper, we study Twitter summarization from the sparse ments, seen in Figure 1(a). For each group, two models are reconstruction perspective, and propose a SNSR framework. presented. In addition, we conduct similar evaluations on Social network makes tweets be correlated by user relations (2) sparse regularization and (3) diversity regularization, and also bring more serious coherent redundancy. Therefore, seen in Figure 1(b) and Figure 1(c) respectively. By analyz- we model the networked tweet-level information by social ing experiment results, we have the following observations: relations as a social regularization, and integrate it into the sparse optimization framework. Meanwhile, we also design • The experiment performance will drop down without any the diversity regularization to avoid the redundancy, espe- of these three terms. It demonstrates the effectiveness of cially due to the inherent redundancy brought by social net- adding social regularization, group sparse regularization work and the consistent property of topic-oriented corpus. and diversity regularization. And we use group sparse regularization to extract the better • For each regularization, we compute the average growth tweets patterns to form summary from the corpus-level, min- percentage under ROUGE-1, ROUGE2 and ROUGE-SU4 ing the salient tweets as well as keeping the diverse tweets. in comparison with all of the corresponding degradation Apart from this, we construct the CTS corpus. Experimen- models. Seen from Table 3, social regularization has the tal results on this corpus show that our model achieves the greatest influence for the entire model, secondly sparse better performance and the proposed framework is effective. 5793
References Liu, X.; Li, Y.; Wei, F.; and Zhou, M. 2012. Graph-based Alsaedi, N.; Burnap, P.; and Rana, O. 2016. Automatic multi-tweet summarization using social signals. COLING. summarization of real world events using twitter. The Tenth Liu, H.; Yu, H.; and Deng, Z. 2016. Multi-document sum- International AAAI Conference on Web and Social Media. marization based on two-level sparse representation model. Bi, B.; Tian, Y.; and Sismanis, Y. 2014. Scalable topic- AAAI. specific influence analysis on microblogs. ACM Interna- Mihalcea, R., and Tarau, P. 2004. Textrank: Bringing order tional Conference on Web Search and Data Mining. into texts. In Proceedings of the 2004 conference on Empir- Cai, X.; Li, W.; Ouyang, Y.; and Yan, H. 2010. Simulta- ical Methods in Natural Language Processing. neous ranking and clustering of sentences: A reinforcement Nesterov, Y., and Nesterov, I. 2004. Introductory lectures on approach to multi-document summarization. COLING. convex optimization. A basic course. Chang, Y.; Wang, X.; Mei, Q.; and Liu, Y. 2013. Towards Nichols, J.; Mahmud, J.; and Drews, C. 2012. Summarizing twitter context summarization with user influence models. sporting events using twitter. IUI. WSDM. P.Abelson, R. 1983. Whatever became of consistency the- Chang, Y.; Tang, J.; Yin, D.; Yamada, M.; and Liu, Y. 2016. ory? Personality and Social Psychology Bulletin, Vol. 9 No. Timeline summarization from social media with life cycle 1, March 1983 37-54. models. IJCAI. Park, S.; Lee, J.-H.; Kim, D.-H.; and Ahn, C.-M. 2007. Duan, Y.; Chen, Z.; Wei, F.; Zhou, M.; and Shum, H.-Y. Multi-document summarization based on cluster using non- 2012. Twitter topic summarization by ranking tweets using negative matrix factorization. In Proceedings of Conference social influence and content quality. COLING. on Current Trends in Theory and Practice of Computer Sci- Erkan, G., and Radev, D. R. 2004. Lexrank: Graph-based ence (SOFSEM). lexical centrality as salience in text summarization. Journal Radev, D. R.; Blair-Goldensohn, S.; and Zhang, Z. 2001. of Artificial Intelligence Research 22 (2004) 457-479. Experiments in single and multidocument summarization Gao, D.; Li, W.; Ouyang, Y.; and Zhang, R. 2012. Lda- using mead. First Document Understanding Conference. based topic formation and topic-sentence reinforcement for Shalizi, C. R., and Thomas, A. C. 2011. Homophily and graph-based multi-document summarization. AIRS. contagion are generically confounded in observational so- Gong, Y., and Liu., X. 2001. Generic text summarization cial network. Sociological Methods & Research, 2011 , 40 using relevance measure and latent semantic analysis. In (2) :211. Proc. of the 24th ACM SIGIR, 19-25. ACM. Sharifi, B.; Hutton, M.; and Kalita, J. 2010. Summarizing Harrigan, N.; Achananuparp, P.; and Lim, E.-P. 2012. In- microblogs automatically. HLT-NAACL. fluentials, novelty, and social contagion: The viral power of Shen, C.; Li, T.; and Ding, C. H. Q. 2010. Integrating average friends, close communities, and old news. Social clustering and multi-document summarization by bi-mixture Networks, 34 (2012):470-480. probabilistic latent semantic analysis (plsa) with sentence He, Z.; Chen, C.; Bu, J.; Wang, C.; Zhang, L.; Cai, D.; and bases. AAAI. He, X. 2012. Document summarization based on data re- Vanderwende, L.; Suzukia, H.; Brocketta, C.; and construction. AAAI. Nenkovab, A. 2007. Beyond sumbasic: Task-focused He, X.; Rekatsinas, T.; Foulds, J.; Getoor, L.; and Liu, Y. summarization with sentence simplification and lexical 2015. Hawkestopic: A joint model for network inference expansion. Information Processing and Management and topic modeling from text-based cascades. ICML. 43(2007):1606-1618. Hu, X.; Tang, L.; Tang, J.; and Liu, H. 2013. Exploit- Wang, D.; Zhu, S.; Li, T.; and Gong, Y. 2009. Multi- ing social relations for sentiment analysis in microblogging. document summarization using sentence-based topic mod- WSDM. els. ACL-IJCNLP 2009 Conference Short Papers 297–300. Inouye, D., and Kalita, J. 2011. Comparing twitter summa- Wang, D.; Zhu, S.; Li, T.; Chi, Y.; and Gong, Y. 2011. In- rization algorithms for multiple post summaries. Socialcom. tegrating document clustering and multidocument summa- Ji, S., and Ye., J. 2009. An accelerated gradient method for rization. KDD. trace norm minimization. ICML. Wang, X.; YingWang; Zuo, W.; and Cai, G. 2015. Exploring Li, P.; Wang, Z.; Lam, W.; Ren, Z.; and Bing, L. 2017. social context for topic identification in short and noisy texts. Salience estimation via variational auto-encoders for multi- AAAI. document summarization. AAAI. Yao, J.; Wan, X.; and Xiao, J. 2015. Compressive document Lin, C.-Y. 2004. Rouge: A package for automatic evaluation summarization via sparse optimization. IJCAI. of summaries. In Proceedings of Workshop on Text Summa- Ye, J., and Liu, J. 2012. Sparse methods for biomedical rization Branches Out, Post-Conference Workshop of ACL data. SIGKDD Explorations Newsletter of the Special Inter- 2004. est Group on Knowledge Discovery & Data Mining 14(1):4– Litvak, M.; Vanetik, N.; Liu, C.; Xiao, L.; and Savas, O. 13. 2015. Improving summarization quality with topic model- ing. CIKM. 5794
You can also read