Topic-Aware Evaluation and Transformer Methods for Topic-Controllable Summarization
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Topic-Aware Evaluation and Transformer Methods for Topic-Controllable Summarization Tatiana Passali Grigorios Tsoumakas School of Informatics School of Informatics Aristotle University of Thessaloniki Aristotle University of Thessaloniki Thessaloniki, Greece Thessaloniki, Greece scpassali@csd.auth.gr greg@csd.auth.gr Abstract This challenge has been recently addressed by topic-controllable summarization techniques (Fan Topic-controllable summarization is an et al., 2018a; Krishna and Srinivasan, 2018; Fr- emerging research area with a wide range ermann and Klementiev, 2019). However, the arXiv:2206.04317v2 [cs.CL] 13 Jun 2022 of potential applications. However, existing automatic evaluation of such techniques remains approaches suffer from significant limita- an open problem. Existing methods use the typ- tions. First, there is currently no established evaluation metric for this task. Furthermore, ical ROUGE score (Lin, 2004) for measuring existing methods built upon recurrent summarization accuracy and then employ user architectures, which can significantly limit studies to qualitatively evaluate whether the topic their performance compared to more recent of the generated summaries matches the users’ Transformer-based architectures, while they needs (Krishna and Srinivasan, 2018; Bahrainian also require modifications to the model’s et al., 2021). However, ROUGE can only be used architecture for controlling the topic. In for measuring the quality of the summarization this work, we propose a new topic-oriented evaluation measure to automatically evaluate output and cannot be readily used to capture the the generated summaries based on the topic topical focus of the text. Thus, there is no clear affinity between the generated summary way to automatically evaluate such approaches, and the desired topic. We also conducted since there is no evaluation measure designed a user study that validates the reliability of specifically for topic-controllable summarization. this measure. Finally, we propose simple, At the same time, existing models for topic- yet powerful methods for topic-controllable summarization either incorporating topic controllable summarization either incorporate embeddings into the model’s architecture topic embeddings into the model’s architec- or employing control tokens to guide the ture (Krishna and Srinivasan, 2018; Frermann summary generation. Experimental results and Klementiev, 2019) or modify the attention show that control tokens can achieve better mechanism (Bahrainian et al., 2021). Since these performance compared to more complicated approaches are restricted to very specific neural embedding-based approaches while being at architectures, it is not straightforward to use them the same time significantly faster. with any summarization model. Even though con- trol tokens have been shown to be effective and 1 Introduction efficient for controlling the output of a model for Neural abstractive summarization models have entity-based summarization (Fan et al., 2018a; He matured enough to consistently produce high qual- et al., 2020; Chan et al., 2021) without any modifi- ity summaries (See et al., 2017; Song et al., 2019; cation to its codebase, they have not been explored Dong et al., 2019; Zhang et al., 2020; Lewis et al., yet for topic-controllable summarization. 2020). Building on top of this significant progress, Motivated by these observations, we propose an interesting challenge is to go beyond delivering a topic-aware evaluation measure for quantita- a generic summary of a document, and instead tively evaluating topic-controllable summariza- produce a summary that focuses on specific as- tion methods in an objective way, without involv- pects that pertain to the user’s interests. For exam- ing expensive and time-consuming user studies. ple, a financial journalist may need a summary of We conducted a user study to validate the reliabil- a news article to be focused on specific financial ity of this measure, as well as to interpret its score terms, like cryptocurrencies and inflation. ranges.
In addition, we extend prior work on topic- prepending information using special tokens (Fan controllable summarization in the following ways: et al., 2018a; He et al., 2020; Liu and Chen, 2021) a) we adapt topic embeddings from the traditional or using decoder-only architectures (Liu et al., RNN architectures to the more recent and power- 2021). Controllable summarization methods can ful Tranformer architecture (Vaswani et al., 2017), influence several aspects of a summary, includ- and b) we employ control tokens via three differ- ing its topic (Krishna and Srinivasan, 2018; Fr- ent approaches: i) prepending the thematic cate- ermann and Klementiev, 2019; Bahrainian et al., gory, ii) tagging the most representative tokens 2021), length (Takase and Okazaki, 2019; Liu of the desired topic, and iii) using both prepend- et al., 2020; Chan et al., 2021), style (Fan et al., ing and tagging. For the tagging-based method, 2018a), and the inclusion of named entities (Fan given a topic-labeled collection, we first extract et al., 2018a; He et al., 2020; Liu and Chen, 2021; keywords that are semantically related to the topic Dou et al., 2021; Chan et al., 2021). Despite the that the user requested and then employ special importance of controllable summarization, there tokens to tag them before feeding the document to are still limited datasets for this task, including the summarization model. We also demonstrate ASPECTNEWS (Ahuja et al., 2022) for extractive that control tokens can successfully be applied to aspect-based summarization and EntSUM (Mad- zero-shot topic-controllable summarization, while dela et al., 2022) for entity-based summarization. at the same time being significantly faster than In this paper, we focus on topic-controllable ab- the embedding-based formulation and can be ef- stractive summarization. fortlessly combined with any neural architecture. Our contributions can be summarized as fol- lows: 2.2 Improving summarization using topical • We propose a topic-aware measure to quan- information titatively evaluate topic-oriented summaries, validated by a user study. The integration of topic modeling into summariza- • We propose topic-controllable methods for tion models has been initially used in the literature Transformers. to improve the quality of existing state-of-the-art models. Ailem et al. (2019) enhance the decoder • We provide an extensive empirical evalua- of a pointer-generator network using the infor- tion of the proposed methods, including an mation of the latent topics that are derived from investigation of the zero-shot setting. LDA (Blei et al., 2003). Similar methods have been applied by Wang et al. (2020) using Pois- The rest of this paper is organized as follows. In son Factor Analysis (PFA) with a plug-and-play Section 2 we review existing related literature on architecture that uses topic embeddings as an addi- controllable summarization. In Section 3 we in- tional decoder input based on the most important troduce the proposed topic-aware measure, while topics from the input document. Liu and Yang in Section 4 we present the proposed methods for (2021) propose to enhance summarization models topic-controllable summarization. In section 5 we using an Extreme Multi-Label Text Classification discuss the experimental results. Finally, conclu- model to improve the consistency between the sions are drawn and interesting future research underlying topics of the input document and the directions are given in Section 6. summary, leading to summaries of higher qual- 2 Related Work ity. Zhu et al. (2021) use a topic-guided abstrac- tive summarization model for Wikipedia articles 2.1 Controllable text-to-text generation leveraging the topical information of Wikipedia Controllable summarization belongs to the categories. Even though Wang et al. (2020) refers broader field of controllable text-to-text gener- to the potential of controlling the output condi- ation (Liu and Yang, 2021; Pascual et al., 2021; tioned on a specific topic using GPT-2 (Radford Liu et al., 2021). Several approaches for con- et al., 2019), all the aforementioned approaches trolling the model’s output exist, either using are focused on improving the accuracy of existing embedding-based approaches (Krishna and Srini- summarization models instead of influencing the vasan, 2018; Frermann and Klementiev, 2019), summary generation towards a particular topic.
2.3 Topic-control in neural abstractive tf-idf vectors Sports summarization Some steps towards controlling the output of a summarization model conditioned on a thematic category have been made by (Krishna and Srini- vasan, 2018; Frermann and Klementiev, 2019), Politics who proposed embedding-based controllable sum- marization models on top of the pointer generator Business & Finance network (See et al., 2017). Krishna and Srini- vasan (2018) integrate the topical information into Health Care the model as a topic vector, which is then concate- nated with each of the word embeddings of the input text. Bahrainian et al. (2021) propose to Figure 1: Obtaining representative words, given a incorporate the topical information into the atten- topic-assigned document collection. tion mechanism of the pointer generator network, using an LDA model (Blei et al., 2003). All these maximum cosine similarity between the represen- methods are based on recurrent neural networks tations of the summary and all topics: and require modification to the architecture of a model. Our work adapts the embedding-based paradigm to Transformers (Vaswani et al., 2017), s(ys , yt ) and employs control tokens, which can be applied ST AS(ys , yt ) = , (2) max{s(ys , yz )} effortlessly and efficiently to any model architec- z∈T ture as well as to the zero-shot setup. where s(ys , yt ) indicates the cosine similarity be- 3 Topic-aware Evaluation Measure tween the two vectors ys and yt , computed as follows: We propose a new topic-aware measure, called Summarization Topic Affinity Score (STAS), to yt ys s(yt , ys ) = . (3) evaluate the generated summaries according to kyt kkys k their semantic similarity with the desired topic. STAS assumes the existence of a predefined set Summaries that are similar to the requested of topics T , and that each topic, t ∈ T , is defined topic receive high STAS values, while those that via a set, Dt , of relevant documents. are dissimilar receive low STAS values. It is worth STAS is computed on top of vector represen- noting that cosine similarity values can differ sig- tations of topics and summaries. Several options nificantly across topics, due to the varying over- exist for extracting such representations, ranging lap between the common words that appear in from simple bag-of-words models to sophisticated the summaries and the different topics. Dividing language models. For simplicity, this work em- by the maximum value takes this phenomenon ploys the tf-idf model, into account, leading to comparable STAS values S where idf is computed across topics. across all documents t∈T Dt . For each topic t, we compute a topic representation yt by averaging the tf-idf representations, xd , of each document 4 Topic Control with Transformers d ∈ Dt (see Fig. 1): In this section, we present the proposed topic- 1 X controllable methods that fall into two different yt = xd (1) categories: a) incorporating topic embeddings |Dt | d∈Dt into the Transformer architecture and b) employ- The representation of the predicted summary, ing control tokens before feeding the input to the ys , is computed using tf-idf too, so as to lie in the model. Note that similar to the STAS measure, same vector space with the topic vectors. for all the proposed methods, we assume the ex- To calculate STAS, we compute the cosine sim- istence of a predefined set of topics where each ilarity between the representations of the sum- topic is represented from a set of relevant docu- mary, ys , and the desired topic, yt , divided by the ments.
4.1 Topic Embeddings trol tokens: a) prepending the thematic category As described in Section 2, Krishna and Srinivasan as a special token to the document, b) tagging (2018) generate topic-oriented summaries by con- with special tokens the representative terms for catenating topic embeddings to the token embed- each topic, and c) combination of both control dings of a pointer generator network (See et al., codes. 2017). The topic embeddings are represented as 4.2.1 Prepending one-hot encoding vectors with a size equal to the As described in Section 1, there are several con- total number of topics. During training, the model trollable approaches that prepend information to takes as input the corresponding topic embedding the input source using special tokens to influ- along with the input document. ence the different aspects of the text such as the However, this method cannot be directly ap- style (Fan et al., 2018a) or the presence of a par- plied to pre-trained Transformer-based models ticular entity (He et al., 2020; Chan et al., 2021; due to the different shapes of the initialized Dou et al., 2021). Even though this technique can weights of the word and position embeddings. be readily combined with topic controllable sum- Unlike RNNs, Transformer-based models are typ- marization, this direction has not been explored ically trained for general tasks and then fine-tuned yet. To this end, we adapt these approaches to with less data for more specific tasks like summa- work with topic information by simply placing rization. Thus, the architecture of a pre-trained the desired thematic category at the beginning of model is already defined. Concatenating the topic the document. For example, prepending “Sports” embeddings with the contextual word embeddings to the beginning of a document represents that we of a Transformer-based model would require re- want to generate a summary based on the topic training the whole summarization model from "Sports". scratch with the appropriate dimension. However, this would be very computationally demanding as During training, we prepend to the input the it would require a large amount of data and time. topic of the target summary, according to the train- ing dataset. During inference, we also prepend Instead of concatenation, we propose to sum the document with a special token according to the topic embeddings, following the rationale of the user’s requested topic. positional encoding, where token embeddings are summed with positional encoding representations 4.2.2 Tagging to create an input representation that contains the Going one step further, we propose another position information. Instead of one-hot encoding method for controlling the output of a model embeddings, we use trainable embeddings allow- based on tagging the most representative terms ing the model to optimize them during training. for each thematic category using a special control The topic embeddings have the same dimension- token. The proposed mechanism assumes the ex- ality as the token embeddings. istence of a set of the most representative terms During training, we sum the trainable topic em- for each topic. Then, given a document and a beddings with token and positional embeddings requested topic, we employ a special token to tag and we modify the input representation as fol- the most representative terms before feeding the lows: document to the summarization model, aiming to guide the summary generation towards this topic. zi = W E(xi ) + P E(i) + T E, (4) We extract N representative terms for each where WE, PE and TE are the word embeddings, topic by considering the terms with the top N positional encoding and topic embeddings respec- tf-idf scores in the corresponding topical vector, tively, for token xi in position i. During inference, as extracted in Section 3. An example of some the model generates the summary based on the indicative representative words for a number of trained topic embeddings, according to the desired topics from the Vox dataset (Vox Media, 2017) is topic. shown in Table 1. Before training with a neural abstractive model, 4.2 Control Tokens we pre-process each input document by surround- We propose three different approaches to control ing the representative terms of its summary’s the generation of the output summaries using con- topic with the special token [TAG]. This way, dur-
Table 1: Representative terms for topics from 2017 5.1 Experimental Setup KDD Data Science+Journalism Workshop (Vox Me- dia, 2017) Datasets We use the following two datasets to evaluate the proposed models: Topic Terms • CNN/DailyMail is an abstractive summa- Politics policy, president, state, political, vote, law, country, election rization dataset with articles from CNN and DailyMail accompanied with human gener- Sports game, sport, team, football, fifa, nfl, player, play, soccer, league ated bullet summaries (Hermann et al., 2015). Health Care patient, uninsured, insurer, plan, cover- We use the anonymized version of the dataset age, care, insurance similar to See et al. (2017). Education student, college, school, education, test, score, loan, teacher • Topic-Oriented CNN/DailyMail is a syn- thetic version of CNN/Dailymail which con- tains super-articles of two different topics ing training the model learns to pay attention to accompanied with the summary for the one tagged words, as they are aligned with the topic topic. of the summary. To influence the summary gen- To compile the topic-oriented CNN/DailyMail eration towards a desired topic during inference, any dataset that contains topic annotations can be the terms of this topic that appear inside the input used. Following (Krishna and Srinivasan, 2018), document are again tagged with the special token. we also use the Vox Dataset (Vox Media, 2017) As mentioned in Section 4.2, the tagging-based which consists of 23,024 news articles of 185 method can be readily combined with the prepend- different topical categories. We discarded topics ing mechanism. with relatively low frequency, i.e. lower than 20 articles, as well as articles assigned to general 4.2.3 Topical Training Dataset categories that do not discuss explicitly a topic, i.e. “The Latest”, “Vox Articles”, “On Instagram” All the aforementioned methods assume the exis- and “On Snapchat”. The tf-idf topic vector repre- tence of a training dataset, where each summary sentations for each document in the corpus were is associated with a particular topic. However, extracted using tf-idf vectorizer provided by the there are no existing datasets for summarization Scikit-learn library (Pedregosa et al., 2011). After that contain summaries according to the different pre-processing, we end up with 14,312 articles topical aspects of the text. Thus, we adopt the ap- from 70 categories out of the 185 initial topi- proach of Krishna and Srinivasan (2018) to com- cal categories. The final topic-oriented dataset pile the same topic-oriented dataset that builds consists of 132,766, 5,248, and 6,242 articles for upon CNN/DailyMail (Hermann et al., 2015). training, validation, and test, respectively while To create this dataset, Krishna and Srinivasan the original CNN/DailyMail consists of 287,113, (2018) synthesize new super-articles by combin- 13,368 and 11,490 articles. Some dataset statistics ing two different articles of the original dataset are shown in Table 2. and keeping the summary of only one of them. Table 2: Dataset Statistics. The size is measured in The final topic-oriented dataset consists of super- articles and length is measured in tokens. articles that discuss two distinct topics, but are as- signed each time to one of the corresponding sum- Size Length maries. Therefore, the model learns to distinguish Dataset Train Val. Test Doc. Sum. CNN/DM (original) 287,113 13,368 11,490 781 56 the most important sentences for the correspond- CNN/DM (synthetic) 132,766 5,248 6,242 1,544 56 ing topic during training. We use this dataset to fine-tune all the aforementioned methods. Preprocessing. For the tagging-based method, all the words of the input document are lemma- 5 Empirical Evaluation tized to their roots using NLTK (Bird, 2006). Then, we tag the words between the existing In this section, we present and discuss the experi- lemmatized tokens and the representative words mental evaluation results. for the desired topic, based on the top-N =100
most representative terms for each topic. For the • BARTprepend is the proposed topic-oriented prepend method, the corresponding topic was prepend extension of BART model. prepended to the document. • BARTtag+prepend is the combination of the Evaluation Metrics. All methods were evaluated tagging-based extension and prepend mecha- using both the well-known ROUGE (Lin, 2004) nism. score, to measure the quality of the generated summary, as well as the proposed STAS measure. The experimental results reported in Table 3 indicate that topic-oriented methods indeed per- Models and training. For all the conducted ex- form significantly better compared to the base- periments we employed a BART-large architec- line methods that do not take into account the ture (Lewis et al., 2020), which is a transformer- topic requested by the user. Furthermore, the pro- based model with a bidirectional encoder and an posed BART-based formulation significantly out- auto-regressive decoder. BART-large consists of performs the generic PG approach, regardless of 12 layers for both encoder and decoder and 406M the applied topic mechanism (BARTemb , BARTtag parameters. We used the implementation pro- or BARTprepend ). However, when BARTtag is com- vided by Hugging Face (Wolf et al., 2020). bined with the prepending mechanism, we ob- All the models are fine-tuned for 100,000 steps tain the best results of all the evaluated methods. with a learning rate of 3×10−5 and batch size 4, Also, as we further demonstrate later, all the ap- with early stopping on the validation set. For proaches that use control tokens are significantly all the experiments, we use the established pa- faster than the embedding-based approach, lead- rameters for BART-large architecture, using label ing to the overall best trade-off between accuracy smoothed cross-entropy loss (Pereyra et al., 2017) and speed. with the label smoothing factor set to 0.1. For all Table 3: Experimental results on the compiled topic- the experiments, we use PyTorch version 1.10 and oriented dataset based on CNN/DailyMail dataset. Hugging Face version 4.11.0. The code and the We report f-1 scores for ROUGE-1 (R-1), ROUGE-2 compiled dataset are publicly available1 . (R-2) and ROUGE-L (R-L). 5.2 Results R-1 R-2 R-L The evaluation results on the compiled topic- PG 26.8 9.2 24.5 oriented dataset are shown in Table 3. Our results include the following models: BART 30.46 11.92 20.57 Topic-Oriented PG 34.1 13.6 31.2 • PG (See et al., 2017) is the generic pointer generator network which is based on the BARTtag (Ours) 39.30 18.06 36.67 RNN’s architecture. BARTemb (Ours) 40.15 18.53 37.41 • Topic-Oriented PG (Krishna and Srini- BARTprepend (Ours) 41.58 19.55 38.74 vasan, 2018) is the topic-oriented pointer BARTprepend+tag (Ours) 41.66 19.57 38.83 generator network also based on the RNN’s architecture. In Table 4, we also provide an experimental • BART (Lewis et al., 2020) is the generic evaluation using the proposed Summarization BART model which is based on the Topic Affinity Score (STAS) measure. The ef- Transformer-based architecture. fectiveness of using topic-oriented approaches is further highlighted using the proposed methods • BARTemb is the proposed topic-oriented since the improvements acquired when applying embedding-based extension of BART model. all the proposed methods are much higher com- pared to the ROUGE score. Also, both the em- • BARTtag is the proposed topic-oriented bedding and the tagging method lead to similar tagging-based extension of BART model. results (∼68.5%) using STAS measure, while the 1 https://github.com/tatianapassali/topic-controllable- prepending method achieves even better results summarization (∼72%). Finally, when we combine the proposed
method with the prepending mechanism, we ob- We do not report results for BARTemb since this serve additional gains, outperforming all the eval- method cannot be directly applied to a zero-shot uated methods with 72.36% STAS score. setting. Table 4: Evaluation based on the proposed Summa- Table 6: Results on the test set with unseen topics. rization Topic Affinity Score (STAS). R-1 R-2 R-L STAS (%) STAS (%) BARTtag 37.52 16.99 35.58 74.80 BART 51.86 BARTprepend 38.13 17.84 35.69 74.67 BARTtag (Ours) 68.42 BARTprepend+tag 39.22 18.81 36.81 77.94 BARTemb (Ours) 68.50 BARTprepend (Ours) 71.90 Even though the models have not seen the BARTprepend+tag (Ours) 72.36 zero-shot topics during training, they can suc- cessfully generate topic-oriented summaries for these topics achieving similar results in terms The results of the inference time for all the of both ROUGE-1 score and STAS metric, with methods are shown in Table 5. The inference time the BARTprepend+tag method outperforming all the of all the methods that use control tokens is signif- other methods with more than 39 ROUGE-1 score icantly smaller, improving the performance of the and ∼78% STAS score. This finding further con- model by almost one order of magnitude. Indeed, firms the capability of methods that use control all the control tokens approaches can perform in- tokens to generalize successfully to unseen topics. ference on 100 articles in less than 40 seconds, while the embedding-based formulation requires 5.4 Experimental Results on Original more than 300 seconds for the same task. CNN/DailyMail Table 5: Inference time for 100 articles. All numbers We evaluate the proposed methods on the origi- are reported in seconds. nal CNN/DailyMail using both an oracle setup, where the topic information is extracted from the Control Tokens Inference Total time target summary according to the assigned topical BARTemb - 303.0 303.0 dataset, and a non-oracle setup, where the topic BARTtag 7.1 32.0 39.1 information is extracted directly from the input BARTprepend
Table 7: Experimental results on the test set of origi- nal CNN/DailyMail, with oracle and non-oracle guid- 100 average ance. Oracle Non-oracle 80 R-1 R-2 R-L STAS(%) STAS(%) STAS (%) BARTemb 42.93 20.27 40.14 71.14 66.76 60 BARTtag 42.54 20.11 39.80 71.51 67.09 BARTprepend 42.75 20.20 39.94 74.20 69.67 40 BARTprepend+tag 43.35 20.66 40.53 74.23 70.09 20 Table 8: Correlation between human evaluation and 2 4 6 8 10 STAS measure. Human Annotators' scores Metric Correlation p-value Figure 2: Average STAS scores versus human annota- tors’ score highlighting the high correlation between Pearson 0.83 6.8e-16 Spearman 0.82 1.5e-16 human annotators and the proposed metric. beddings or control tokens demonstrating that participants (i.e., graduate and undergraduate stu- the latter can successfully influence the summary dents) to evaluate how relevant is the generated generation towards the desired topic, even in a summary with respect to the given topic in a scale zero-shot setup. Our empirical evaluation further of 1 to 10. Each time, we randomly picked a highlighted the effectiveness of control tokens relevant or an irrelevant topic for the given sum- achieving better performance than embedding- mary. Each summary was annotated from an av- based methods, while being significantly faster erage of 3.5 annotators and inter-annotations with and easier to apply. more than 6 degrees difference were manually Future research could examine other control- discarded. The inter-annotator agreement across lable aspects, such as style (Fan et al., 2018b), the raters for each sample was 0.90 and measured entities (He et al., 2020) or length (Takase and using Krippendorff’s alpha coefficient (Krippen- Okazaki, 2019). In addition, the tagging-based dorff, 2004), with ordinal weights (Hayes and method could be further extended to working with Krippendorff, 2007). any arbitrary topic, bypassing the requirement of To measure the correlation between human having a labeled document collection of a topic evaluation and the proposed metric, we obtained to guide the summary towards this topic. the STAS scores for the same summary-topic pairs. We use two statistical correlation measures: Spearman’s and Pearson’s correlation coefficient. References As shown in Table 8, the correlation for both met- Ojas Ahuja, Jiacheng Xu, Akshay Gupta, Kevin rics is positive and very close to 1 (∼0.8 for both Horecka, and Greg Durrett. 2022. ASPECT- metrics). Also, as indicated from both very low NEWS: Aspect-oriented summarization of news p-values, we can conclude with large confidence documents. In Proceedings of the 60th Annual that STAS is strongly correlated with the results of Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers), pages 6494– human evaluation. This observation is further con- 6506, Dublin, Ireland. Association for Computa- firmed by the almost linear relation between the tional Linguistics. average STAS scores and the human annotators scores, as shown in Fig. 2. Melissa Ailem, Bowen Zhang, and Fei Sha. 2019. Topic augmented generator for abstractive summa- rization. arXiv preprint arXiv:1908.07026. 6 Conclusions and Future Work Seyed Ali Bahrainian, George Zerveas, Fabio We proposed STAS, a structured way to evalu- Crestani, and Carsten Eickhoff. 2021. Cats: Cus- ate the generated summaries. In addition, we tomizable abstractive topic-based summarization. conducted a user study to validate and interpret ACM Trans. Inf. Syst., 40(1). the STAS score ranges. We also proposed topic- Steven Bird. 2006. NLTK: The Natural Language controllable methods that employ either topic em- Toolkit. In Proceedings of the COLING/ACL
2006 Interactive Presentation Sessions, pages 69– K. Krippendorff. 2004. Content Analysis: An Intro- 72, Sydney, Australia. Association for Computa- duction to Its Methodology. Content Analysis: An tional Linguistics. Introduction to Its Methodology. Sage. David M Blei, Andrew Y Ng, and Michael I Jordan. Kundan Krishna and Balaji Vasan Srinivasan. 2018. 2003. Latent dirichlet allocation. the Journal of Generating topic-oriented summaries using neural machine Learning research, 3:993–1022. attention. In Proceedings of the 2018 Confer- ence of the North American Chapter of the As- Hou Pong Chan, Lu Wang, and Irwin King. sociation for Computational Linguistics: Human 2021. Controllable summarization with con- Language Technologies, Volume 1 (Long Papers), strained markov decision process. Transactions pages 1697–1705. of the Association for Computational Linguistics, 9:1213–1232. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xi- Ghazvininejad, Abdelrahman Mohamed, Omer aodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Levy, Veselin Stoyanov, and Luke Zettlemoyer. and Hsiao-Wuen Hon. 2019. Unified language 2020. BART: Denoising sequence-to-sequence model pre-training for natural language under- pre-training for natural language generation, trans- standing and generation. In Advances in Neu- lation, and comprehension. In Proceedings of the ral Information Processing Systems, pages 13042– 58th Annual Meeting of the Association for Com- 13054. putational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics. Zi-Yi Dou, Pengfei Liu, Hiroaki Hayashi, Zhengbao Jiang, and Graham Neubig. 2021. GSum: A gen- Chin Yew Lin. 2004. Rouge: A package for auto- eral framework for guided neural abstractive sum- matic evaluation of summaries. In Proceedings of marization. In Proceedings of the 2021 Conference the workshop on text summarization branches out of the North American Chapter of the Association (WAS 2004). for Computational Linguistics: Human Language Technologies, pages 4830–4842, Online. Associa- Alisa Liu, Maarten Sap, Ximing Lu, Swabha tion for Computational Linguistics. Swayamdipta, Chandra Bhagavatula, Noah A. Smith, and Yejin Choi. 2021. DExperts: Decoding- Angela Fan, David Grangier, and Michael Auli. time controlled text generation with experts and 2018a. Controllable abstractive summarization. In anti-experts. In Proceedings of the 59th Annual Proceedings of the 2nd Workshop on Neural Ma- Meeting of the Association for Computational Lin- chine Translation and Generation, pages 45–54, guistics and the 11th International Joint Confer- Melbourne, Australia. Association for Computa- ence on Natural Language Processing (Volume 1: tional Linguistics. Long Papers), pages 6691–6706, Online. Associa- tion for Computational Linguistics. Angela Fan, David Grangier, and Michael Auli. 2018b. Controllable abstractive summarization. In Jingzhou Liu and Yiming Yang. 2021. Enhancing Proceedings of the 2nd Workshop on Neural Ma- summarization with text classification via topic chine Translation and Generation, pages 45–54, consistency. In Joint European Conference on Melbourne, Australia. Association for Computa- Machine Learning and Knowledge Discovery in tional Linguistics. Databases, pages 661–676. Springer. Lea Frermann and Alexandre Klementiev. 2019. In- Yizhu Liu, Zhiyi Luo, and Kenny Q. Zhu. 2020. Con- ducing document structure for aspect-based sum- trolling length in abstractive summarization using marization. In Proceedings of the 57th Annual a convolutional neural network. In Proceedings of Meeting of the Association for Computational Lin- the 2018 Conference on Empirical Methods in Nat- guistics, pages 6263–6273. ural Language Processing, EMNLP 2018. Andrew F Hayes and Klaus Krippendorff. 2007. An- Zhengyuan Liu and Nancy Chen. 2021. Control- swering the call for a standard reliability measure lable neural dialogue summarization with personal for coding data. Communication methods and mea- named entity planning. In Proceedings of the 2021 sures, 1(1):77–89. Conference on Empirical Methods in Natural Lan- Junxian He, Wojciech Kryscinski, Bryan McCann, guage Processing, pages 92–106, Online and Punta Nazneen Rajani, and Caiming Xiong. 2020. Ctrl- Cana, Dominican Republic. Association for Com- sum: Towards generic controllable text summariza- putational Linguistics. tion. ArXiv, abs/2012.04281. Mounica Maddela, Mayank Kulkarni, and Daniel Karl Moritz Hermann, Tomas Kocisky, Edward Preotiuc-Pietro. 2022. EntSUM: A data set for Grefenstette, Lasse Espeholt, Will Kay, Mustafa entity-centric extractive summarization. In Pro- Suleyman, and Phil Blunsom. 2015. Teaching ceedings of the 60th Annual Meeting of the Asso- machines to read and comprehend. In Advances ciation for Computational Linguistics (Volume 1: in neural information processing systems, pages Long Papers), pages 3355–3366, Dublin, Ireland. 1693–1701. Association for Computational Linguistics.
Damian Pascual, Beni Egressy, Clara Meister, Ryan Mariama Drame, Quentin Lhoest, and Alexander Cotterell, and Roger Wattenhofer. 2021. A plug- Rush. 2020. Transformers: State-of-the-art natural and-play method for controlled text generation. In language processing. In Proceedings of the 2020 Findings of the Association for Computational Lin- Conference on Empirical Methods in Natural Lan- guistics: EMNLP 2021, pages 3973–3997. guage Processing: System Demonstrations, pages 38–45, Online. Association for Computational Lin- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, guistics. B. Thirion, O. Grisel, M. Blondel, P. Pretten- hofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- Jingqing Zhang, Yao Zhao, Mohammad Saleh, and sos, D. Cournapeau, M. Brucher, M. Perrot, and Peter Liu. 2020. PEGASUS: Pre-training with ex- E. Duchesnay. 2011. Scikit-learn: Machine learn- tracted gap-sentences for abstractive summariza- ing in Python. Journal of Machine Learning Re- tion. In Proceedings of the 37th International search, 12:2825–2830. Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages Gabriel Pereyra, George Tucker, Jan Chorowski, 11328–11339. PMLR. Łukasz Kaiser, and Geoffrey Hinton. 2017. Reg- ularizing neural networks by penalizing con- Fangwei Zhu, Shangqing Tu, Jiaxin Shi, Juanzi Li, fident output distributions. arXiv preprint Lei Hou, and Tong Cui. 2021. TWAG: A topic- arXiv:1701.06548. guided Wikipedia abstract generator. In Proceed- ings of the 59th Annual Meeting of the Associa- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, tion for Computational Linguistics and the 11th In- Dario Amodei, Ilya Sutskever, et al. 2019. Lan- ternational Joint Conference on Natural Language guage models are unsupervised multitask learners. Processing (Volume 1: Long Papers), pages 4623– OpenAI blog, 1(8):9. 4635, Online. Association for Computational Lin- guistics. Abigail See, Peter J Liu, and Christopher D Man- ning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 2017 Annual Meeting of the Association for A Example of generated Summaries Computational Linguistics, pages 1073–1083. We present some examples of generated sum- Shengli Song, Haitao Huang, and Tongxiao Ruan. maries using the tagging-based model on the cre- 2019. Abstractive text summarization using ated dataset for different topics along with the LSTM-CNN based deep learning. Multimedia generic summary generated by the generic BART Tools and Applications, 78. model as shown in Table 9 and 10. The proposed Sho Takase and Naoaki Okazaki. 2019. Positional model can shift the generation towards the desired encoding to control output sequence length. In topic of the super-article which contains different NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computa- topics in contrary with the generic BART model tional Linguistics: Human Language Technologies which discusses only one topic of the article. - Proceedings of the Conference, volume 1. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Pro- cessing Systems. Vox Media. 2017. Vox Dataset (DS+J Workshop). Zhengjue Wang, Zhibin Duan, Hao Zhang, Chaojie Wang, Long Tian, Bo Chen, and Mingyuan Zhou. 2020. Friendly topic assistant for transformer based abstractive summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 485–497, Online. Association for Computational Linguistics. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger,
Table 9: Generated summaries according to the two different topics of the super-article containing articles of these topics along with the generic summary. Table 10: Generated summaries similar to Table 9 Generic summary: Sir Bradley Wiggins has . been included in Team Sky’s line-up for Sun- Generic summary: Ford unveiled two proto- day’s Gent-Wevelgem sprint classic. Wig- type electric bikes at Mobile World Congress gins, the 2012 Tour de France champion, will in Barcelona. MoDe:Me and Mo de:Pro are be joined by fellow Great Britain Olympian powered by 200-watt motors and fold to fit on Geraint Thomas for the 239-kilometre one-day a train or in the boot of a car. With pedal assist event in Belgium. Bernhard Eisel, who won they help riders reach speeds of up to 15mph the event in 2010, is also included in an eight- (25km/h) The bikes are part of an experiment man team along with Christian Knees, Ian Stan- by Ford called Handle on Mobility. nard, Andy Fenn, Luke Rowe and Elia Viviani. Transportation: Ford unveiled two prototype Gender-based Violence: Staff at Venice High electric bikes at Mobile World Congress in School in Los Angeles informed police of a Barcelona. The MoDe: Me and Mo de: Pro suspected attack on a female student on Tues- are powered by 200-watt motors. They fold day. However the LAPD found the sexual mis- to fit on a train or in the boot of a car.With conduct involved another student and had oc- pedal assist, riders reach speeds of up to 15mph curred frequently for more than a year. An (25km/h) investigation has since identified 14 suspects between the ages of 14 and 17 who have al- Neuroscience: Researchers from Bristol Uni- legedly been involved in sexual activity at the versity measured biosonar bat calls to calcu- school. Eight youngsters from the school were late what members of group perceived as they arrested on Friday while another was arrested foraged for food. Pair of Daubenton’s bats at another location. foraged low over water for stranded insects at a site near the village of Barrow Gurney, in Sports: Sir Bradley Wiggins has been included Somerset. It found the bats interact by swap- in Team Sky’s line-up for Sunday’s Gent- ping between leading and following, and they Wevelgem sprint classic. Wiggins, the 2012 swap these roles by copying the route a nearby Tour de France champion, will be joined by fel- individual was using up to 500 milliseconds low Great Britain Olympian Geraint Thomas earlier. for the 239-kilometre one-day event in Bel- gium. Giant-Alpecin’s John Degenkolb is the defending champion in Flanders.
You can also read