Reactive Supervision: A New Method for Collecting Sarcasm Data
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Reactive Supervision: A New Method for Collecting Sarcasm Data Boaz Shmueli1,2,3,∗, Lun-Wei Ku2 and Soumya Ray3 1 Social Networks and Human-Centered Computing, Taiwan International Graduate Program 2 Institute of Information Science, Academia Sinica 3 Institute of Service Science, National Tsing Hua University Abstract generated labels such as the #sarcasm hashtag Sarcasm detection is an important task in af- (Davidov et al., 2010; Ptáček et al., 2014). This fective computing, requiring large amounts of method generates large amounts of data at low cost, labeled data. We introduce reactive supervi- but labels are often noisy and biased (Bamman and sion, a novel data collection method that uti- Smith, 2015). lizes the dynamics of online conversations to To improve quality, manual annotation asks hu- overcome the limitations of existing data col- mans to label given tweets as sarcastic or not. Since lection techniques. We use the new method finding sarcasm in a large corpus is “a needle-in-a- to create and release a first-of-its-kind large dataset of tweets with sarcasm perspective la- haystack problem” (Liebrecht et al., 2013), manual bels and new contextual features. The dataset annotation can be combined with distant supervi- is expected to advance sarcasm detection re- sion (Riloff et al., 2013). Still, low inter-annotator search. Our method can be adapted to other reliability is often reported (Swanson et al., 2014), affective computing domains, thus opening up resulting not only from the subjective nature of sar- new research opportunities. casm but also the lack of cultural context (Joshi 1 Introduction et al., 2016). Moreover, neither method collects both sarcasm perspectives: distant supervision col- Sarcasm is ubiquitous in human conversations. As lects intended sarcasm, while manual annotation a form of insincere speech, the intent behind a can only collect perceived sarcasm. sarcastic utterance is integral to its meaning. Per- Lastly, in manual collection, humans are asked ceiving a sarcastic utterance as genuine will often to gather and report sarcastic texts, either their own result in a complete reversal of the intended mean- (Oprea and Magdy, 2020) or by others (Filatova, ing, and vice versa (Gibbs, 1986). It is therefore 2012). However, both manual methods are slower crucial for affective computing systems and tasks, and more expensive than distant supervision, result- such as sentiment analysis and dialogue systems, to ing in smaller datasets. automatically detect sarcasm from the perspective To overcome the above limitations, we propose of the author as well as the reader in order to avoid reactive supervision, a novel conversation-based misunderstandings. Oprea and Magdy (2019) re- method that offers automated, high-volume, “in- cently pioneered the study of intended sarcasm (by the-wild” collection of high-quality intended and the author) vs. perceived sarcasm (by the reader) in perceived sarcasm data. We use our method to the context of sarcasm detection tasks. The training create and release the SPIRS sarcasm dataset1 . of models for these tasks requires large amounts of labeled sarcasm data, with Twitter becoming a ma- 2 Reactive Supervision jor source due to its popularity as a social network as well as the huge amounts of conversational text Reactive supervision exploits the frequent use in its users generate. Previous works describe three online conversations of a cue tweet — a reply that methods for collecting sarcasm data: distant super- highlights sarcasm in a prior tweet. Figure 1 (left vision, manual annotation, and manual collection. panel) shows a typical exchange on Twitter: C Distant supervision automatically collects “in- posts a sarcastic tweet. Unaware of C’s sarcastic the-wild” sarcastic tweets by leveraging author- intent, B replies with an oblivious tweet. Lastly, A ∗ 1 Corresponding author: shmueli@iis.sinica.edu.tw github.com/bshmueli/SPIRS 2553 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 2553–2559, November 16–20, 2020. c 2020 Association for Computational Linguistics
Person Example Cue Regular Expression Example Author Sequences 1st I was only being sarcastic lol ˆA[ˆA]*(A)[ˆA]*$ ABA, ABAC, ABAB 2nd Why are you being sarcastic? ˆAA*(B)A*$ AB, ABA, ABAA 3rd She was just being sarcastic! ˆAA*B[AB]*(C)[AB]*$ ABC, ABCB, ABAC Table 1: The three grammatical person classes, with example cue tweets, corresponding regular expressions, and examples of matching author sequences. The bold author letter corresponds to the position of the sarcastic tweet. User_C User_C Algorithm Given a thread {tn , tn−1 , . . . , t1 } The app we use for work Just watched Forrest with cue tweet tn by an = A, our aim is to identify emails is not working. Gump. Great film! I feel terrible about this! User_A the sarcastic tweet among {tn−1 , . . . , t1 }. We first So Tom Hanks can act! examine the personal subject pronoun used in the User_B Who knew??? Not your fault. Do not cue (I, you, s/he) and map it to a grammatical per- User_B feel guilty! Literally everyone!!! son class (1st, 2nd, 3rd). This informs us whether User_A the sarcastic author is also the author of the cue User_A Replying to @User_B Replying to @User_B (1st), its addressee (2nd), or another party (3rd). She was just being sarcastic! I was being sarcastic lol For each person class we then apply a heuristic to identify the sarcastic tweet. Figure 1: Conversation threads. Left panel: 3rd-person For example, for a 1st-person cue tweet (e.g., I cue with author sequence ABC. Right panel: 1st- was just being sarcastic!), the sarcastic tweet must person cue with author sequence ABAC. also be authored by A. If the earlier tweets in T contain exactly one tweet from A, it is unambigu- ously the sarcastic tweet. Otherwise, if there are alerts B by replying with a cue tweet (She was just two or more earlier tweets from A (or none), the sar- being sarcastic!). Since A replies to B but refers castic tweet cannot be unambiguously pinpointed to the sarcastic author in the 3rd person (She), C and the entire thread is discarded. We formalize is necessarily the author of the perceived sarcastic this rule by requiring the author sequence to match tweet. Similarly, Figure 1 (right panel) shows how the regular expression /ˆA[ˆA]*(A)[ˆA]*$/, a 1st person cue (I was just being sarcastic!) can where the capturing group (A) corresponds to the be used to unequivocally label intended sarcasm. sarcastic tweet2 . We are able to use regular expres- To capture sarcastic tweets, we thus first search sions because we use a string of letters to represent for cue tweets (using the query phrase “being sar- the author sequence. 2nd- and 3rd-person cues castic”, often used in responses to sarcastic tweets), produce corresponding rules and patterns. Table 1 then carefully examine each cue tweet to identify lists the three person classes, corresponding regular the corresponding sarcastic tweet. expressions, and example author sequences. The following formalizes our method. 2.2 Advantages 2.1 Method Additional Tweet Types Along with each sar- castic tweet, we collect the oblivious tweet (the Definitions We define a thread to be a sequence unsuspecting reply to the sarcastic tweet) when of tweets {tn , tn−1 , . . . , t1 }, where ti+1 is a re- available. As far as we know, this is the first ply to ti , i = 1, . . . , n − 1. Tweets are listed work that identifies and collects oblivious texts, in reverse chronological order, with t1 being a new type of data that can improve research on the the root tweet. The corresponding author se- (mis)understanding of sarcasm, with applications quence is an an−1 . . . a1 , were we replace the orig- such as automated assistive systems for people with inal author names with consecutive capital letters emotional or cognitive disabilities. If the sarcastic (A, B, C, ...), starting with an = A. For exam- tweet is a reply, we also capture the eliciting tweet, ple, Figure 1 (right panel) depicts a thread of which is the tweet that evoked the sarcastic reply. length n = 4 with author sequence ABAC. Here We provide more details in Appendix A. a4 = a2 = A, a3 = B, and a1 = C is the author 2 of the root tweet. We use Perl-Compatible Regular Expressions (PCRE). 2554
Extraction of Semantic Relations Being able Algorithm 1: Data collection pipeline. to identify the various tweets types (cue, oblivious, Result: Set S of Sarcastic Tweets sarcastic, eliciting), reactive supervision can be S ← {} candidates ← Fetch(’being sarcastic’) understood more abstractly as capturing semantic for cue in candidates do dependency relations between utterances3 . Reac- switch Classify(cue) do tive supervision can thus be useful in the context case 1st person do regexp ← ˆA[ˆA]*(A)[ˆA]*$ of discourse analysis. case 2nd person do regexp ← ˆAA*(B)A*$ Context-Aware Annotation Our method uses case 3rd person do cues from thread participants, who therefore serve regexp ← ˆAA*B[AB]*(C)[AB]*$ case unknown do as de facto annotators. As participants are familiar continue with the conversation’s context, we overcome some end {tn (= cue), tn−1 , . . . , t1 } ← Traverse(cue) quality issues of using external annotators, who are an an−1 . . . a1 ← authors({tn , tn−1 , . . . , t1 }) often unfamiliar with the conversation context due if i ← Match(regexp, an an−1 . . . a1 ) then to cultural and social gaps (Joshi et al., 2016). S ← S ∪ {ti } end Sarcasm Perspective Previous datasets contain end either intended or perceived sarcasm, but not both (Oprea and Magdy, 2019). Our method identifies and labels both intended and perceived sarcasm 3 SPIRS Dataset within the same data context: by their essence, 1st- We implemented reactive supervision using a 4- person cue tweets capture intended sarcasm, while step pipeline (see Algorithm 1): 2nd- and 3rd-person cues capture perceived sar- 1. Fetch calls the Twitter Search API to collect casm. We label a tweet as perceived sarcasm when cue tweets, using “being sarcastic” as the query. at least one reader perceives the tweet as sarcastic 2. Classify is a rule-based, precision-oriented and posts a cue tweet. Detecting perceived sarcasm classifier that classifies cues as 1st-, 2nd-, or 3rd- is useful, for example, for training algorithms that person according to the referred pronoun (I, you, flag sensitive texts which might be (mis)perceived s/he). If the cue cannot be accurately classified as sarcastic (even by a single reader). (e.g., a pronoun cannot be found, the cue contains Faster Data Collection We tested González- multiple pronouns, negation words are present), the Ibáñez et al. (2011)’s distant supervision method cue is classified as unknown and discarded. of collecting tweets ending with #sarcasm and re- 3. Traverse calls the Twitter Lookup API to lated hashtags, fetching 171 tweets/day on average. retrieve the thread by starting from the cue tweet During the same period, our method collected 312 and repeatedly fetching the parent tweet up to the tweets/day on average, an 82% rate improvement. root tweet. Summary of Advantages Table 2 summarizes 4. Finally, Match matches the thread’s author se- the advantages of our best-of-all-worlds method quence with the corresponding regular expression. over other approaches. Reactive supervision offers Unmatched sequences are discarded. Otherwise, automated, in-the-wild, and context-aware detec- the sarcastic tweet is identified and saved along tion of intended and perceived sarcasm data. with the cue tweet, as well as the eliciting and oblivious tweets when available. Method → Distant Manual Manual Reactive The pipeline collected 65K cue tweets contain- Feature ↓ Supervision Annotation Collection Supervision ing the phrase “being sarcastic” and corresponding Automatic 3 7 7 4 In-the-wild 3 7 7 4 threads during 48 days in October and November Oblivious Tweet 7 7 7 4 2019. 77% of the cues were classified as unknown Context-Aware 3 Maybe Maybe 4 and discarded, ending with 15 000 English sarcas- Perspective Intended Perceived Either Both Samples/Day 171 Manual Manual 312 tic tweets. In addition, 10 648 oblivious and 9 156 eliciting tweets were automatically captured. Table Table 2: Comparison of data collection methods. 3 summarizes the SPIRS dataset. We added 15 000 negative instances by sampling random English 3 It is worth noting that Hearst (1992) uses patterns to tweets captured during the same period, discarding automatically extract lexical relations between words. tweets with sarcasm-related words or hashtags. 2555
# Tweets (Devlin et al., 2019). For all three models, we Person Perspective Sarcastic Oblivious Eliciting used 5-fold cross-validation for training, holding 1st Intended 10 300 9 065 8 075 out 20% of the data for testing. 2nd Perceived 3 000 — 842 Results are shown in Table 5 (top panel). BERT 3rd Perceived 1 700 1583 239 is the best performing model, with 70.3% accuracy. Total 15 000 10 648 9 156 We compared SPIRS’s classification results to the Ptáček et al. (2014) dataset, commonly used in sar- Table 3: SPIRS data breakdown by person class. casm benchmarks. We found that Ptáček’s accuracy is significantly higher (86.6%). We posit that it is Sarcastic tweets can be either root tweets or because sarcasm is confounded with locale in the replies. We found that the majority of intended Ptáček (sarcastic tweets are from worldwide users; sarcasm tweets are replies (78.4%), while the ma- non-sarcastic tweets are from users near Prague), jority of perceived sarcasm tweets are root tweets and thus classifiers learn features correlated to lo- (77.0%). Further dataset statistics on author se- cale. We tested our hypothesis by replacing our quence and tweet position distributions are avail- negative samples with Ptáček’s, which indeed re- able in Appendices B and C. sulted in boosting the accuracy by 19.1%. Reliability To assess our method’s reliability in 4.2 Detection with Conversation Context capturing sarcastic tweets, we manually inspected Our second sarcasm classification experiment uses 200 random sarcastic tweets, along with their cue conversation context by adding eliciting and obliv- tweets, from each person class. The accuracy of ious tweets to the model. As far as we know, this sarcastic tweet labeling was high: 98.5%, 98%, is the first sarcasm-related task that uses oblivious and 97% for 1st-, 2nd-, and 3rd-person cue tweets, texts. Our model concatenated the outputs of three respectively. Table 4 shows samples of correct and identical 100-unit BiLSTMs (one per tweet: sarcas- incorrect cue tweet classifications. tic, oblivious, eliciting) before feeding it into dense layers for classification. Tweets without surround- Cue Tweet Pers. Correct? ing context were not used in this task. Results are Shudda been more clear...I was being sarcastic 1st 3 shown in Table 5 (middle panel). Accuracy for the I’m almost always being sarcastic, but this was real 1st 7 Take it you are being sarcastic 2nd 3 full-context model was 74.7% (MCC 0.398). You do realize @user was being sarcastic right? 2nd 7 She was being sarcastic. You missed the joke 3rd 3 Ablation Study We conducted context ablation Mind blown. Had no idea he was being sarcastic 3rd 7 experiments to identify the contribution of each tweet type. We found that removing the elicit- Table 4: Correctly and incorrectly classified cue tweets. ing tweets reduces accuracy by 0.5% and MCC by 0.026. Removing the oblivious tweets, however, lowered accuracy by 3.4% to 71.4%, and the MCC 4 Experiments and Analysis dropped significantly by 31%, from 0.398 to 0.275. We present dataset baselines for three tasks: sar- This illustrates the importance of the new oblivious casm detection, sarcasm detection with conversa- text data provided in the dataset and suggests its tion context, and sarcasm perspective classification, usefulness in sarcasm-related tasks. a new task enabled by our dataset. 4.3 Perspective Classification 4.1 Sarcasm Detection Taking advantage of the new labels in our dataset, we propose a new task to classify a sarcastic text’s The first experiment is sarcasm detection. We perspective: intended vs. perceived. Our results are trained a total of three models: CNN (100 filters displayed in Table 5 (bottom panel), demonstrating with a kernel size 3) and BiLSTM (100 units), both the superiority of BERT over the other models, with max-pooled and Adam-optimized with a learning an accuracy of 68.2% and MCC of 0.366. rate of 0.0005; data was preprocessed as described in Tay et al. (2018); the embedding layer was pre- Error Analysis We carefully examined the er- loaded with GloVe embeddings (Twitter data, 100 rors to analyze the causes of perspective misclassifi- dimensions) (Pennington et al., 2014). We also cation. We observed that misclassified-as-intended fine-tuned a pre-trained base uncased BERT model tweets (e.g., “You’re lost!”, “Omg that was so 2556
Task Dataset Model P R F1 Acc MCC Sarcasm SPIRS CNN 67.2 (1.8) 73.6 (5.1) 65.0 (1.2) 65.8 (0.5) 0.308 (0.011) Detection (our dataset) BiLSTM 68.9 (2.1) 75.4 (5.5) 67.1 (0.9) 67.9 (0.3) 0.350 (0.008) N =19 384 BERT 70.1 (1.1) 77.4 (1.2) 69.9 (0.5) 70.3 (0.5) 0.402 (0.008) Ptáček CNN 79.1 (0.8) 87.5 (1.3) 77.9 (0.6) 79.2 (0.6) 0.566 (0.012) N =49 766 BiLSTM 82.4 (1.6) 87.6 (2.9) 80.9 (0.1) 81.7 (0.2) 0.622 (0.002) BERT 87.0 (0.6) 90.9 (0.6) 86.0 (0.2) 86.6 (0.2) 0.721 (0.004) Ptáček (−) CNN 84.3 (1.6) 82.6 (2.5) 83.6 (0.8) 83.6 (0.8) 0.673 (0.017) SPIRS (+) BiLSTM 86.2 (2.8) 86.7 (2.8) 86.4 (0.7) 86.4 (0.7) 0.729 (0.012) N =21 138∗ BERT 89.8 (0.7) 89.1 (0.7) 89.4 (0.2) 89.4 (0.2) 0.788 (0.004) Sarcasm SPIRS 3 X BiLSTM 77.7 (1.1) 87.9 (3.5) 68.9 (0.7) 74.8 (0.6) 0.398 (0.007) Detection (our dataset) w/o eliciting 75.6 (1.1) 91.4 (2.8) 66.3 (1.4) 74.3 (0.3) 0.372 (0.005) w/ Conversation N =7 810∗ w/o oblivious 72.4 (2.4) 93.3 (4.5) 58.8 (6.2) 71.4 (1.4) 0.275 (0.053) Context w/o both 73.2 (2.7) 90.8 (6.6) 60.3 (4.6) 71.2 (0.4) 0.282 (0.033) Sarcasm SPIRS CNN 65.5 (1.2) 61.7 (3.3) 64.4 (0.5) 64.5 (0.5) 0.291 (0.009) Perspective (our dataset) BiLSTM 66.8 (2.3) 63.1 (5.8) 65.5 (0.7) 65.6 (0.7) 0.315 (0.015) Classification N =6 324∗ BERT 70.0 (2.9) 63.8 (5.7) 68.0 (1.7) 68.2 (1.6) 0.366 (0.032) Table 5: Baselines. We report precision, recall, macro-F1, accuracy, and MCC (Matthews correlation coefficient). Mean and standard deviation were calculated using 5-fold cross-validation. N is the number of instances after preprocessing. ∗ Dataset classes were balanced using majority class downsampling. Intended sarcasm 5 Conclusion 0.04 Perceived sarcasm Probability We present an innovative method for collecting 0.02 sarcasm data that exploits the natural dynamics of online conversations. Our approach has multiple 0.00 0 10 20 30 40 50 60 advantages over all existing methods. We used it to Word count create and release SPIRS, a large sarcasm dataset Figure 2: Word count distribution in SPIRS with multiple novel features. These new features, including labels for sarcasm perspective and unique context (e.g., oblivious texts), offer opportunities funny”) had, on average, almost half the word count for advances in sarcasm detection. of misclassified-as-perceived tweets (17.2 vs. 27.8). Reactive supervision is generalizable. By modi- We posit that longer, more informative texts make fying the cue tweet selection criteria, our method sarcasm easier to perceive; hence, short perceived can be adapted to related domains such as senti- sarcasm or long intended sarcasm might introduce ment analysis and emotion detection, thereby ad- errors. Analysis of the dataset’s word count distri- vancing the quality and quantity of data collection bution supports our hypothesis (see Figure 2). and offering new research directions in affective Looking for further error sources, we inspected computing. short intended tweets that were misclassified, for Acknowledgements example “great friends i have!” and “My mom is so beautiful”. These tweets can be read as root This research was partially supported by the Min- tweets and not as replies, yet most intended sar- istry of Science and Technology of Taiwan under casm tweets are replies while most perceived sar- contracts MOST 108-2221-E-001-012-MY3 and casm tweets are root tweets (see Section 3). We hy- MOST 108-2321-B-009-006-MY2. pothesize that the classifier learns discourse-related features (original tweet vs. reply tweet), which can lead to these errors. Further analysis of sarcasm perspective and its interplay with sarcasm pragmat- ics is a promising avenue for future research. 2557
References Silviu Oprea and Walid Magdy. 2019. Exploring au- thor context for detecting intended vs perceived sar- David Bamman and Noah A Smith. 2015. Contextual- casm. In Proceedings of the 57th Annual Meet- ized Sarcasm Detection on Twitter. In Ninth Interna- ing of the Association for Computational Linguis- tional AAAI Conference on Web and Social Media. tics, pages 2854–2859, Florence, Italy. Association Dmitry Davidov, Oren Tsur, and Ari Rappoport. 2010. for Computational Linguistics. Semi-supervised Recognition of Sarcastic Sentences Silviu Oprea and Walid Magdy. 2020. iSarcasm: A in Twitter and Amazon. In Proceedings of the Dataset of Intended Sarcasm. In Proceedings of the Fourteenth Conference on Computational Natural 58th Annual Meeting of the Association for Compu- Language Learning, CoNLL ’10, pages 107–116, tational Linguistics. Association for Computational Stroudsburg, PA, USA. Association for Computa- Linguistics. tional Linguistics. Event-place: Uppsala, Sweden. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Jeffrey Pennington, Richard Socher, and Christopher Kristina Toutanova. 2019. BERT: Pre-training of Manning. 2014. GloVe: Global vectors for word deep bidirectional transformers for language under- representation. In Proceedings of the 2014 Confer- standing. In Proceedings of the 2019 Conference ence on Empirical Methods in Natural Language of the North American Chapter of the Association Processing (EMNLP), pages 1532–1543, Doha, for Computational Linguistics: Human Language Qatar. Association for Computational Linguistics. Technologies, Volume 1 (Long and Short Papers), Tomáš Ptáček, Ivan Habernal, and Jun Hong. 2014. pages 4171–4186, Minneapolis, Minnesota. Associ- Sarcasm Detection on Czech and English Twitter. ation for Computational Linguistics. In Proceedings of COLING 2014, the 25th Inter- Elena Filatova. 2012. Irony and Sarcasm: Corpus national Conference on Computational Linguistics: Generation and Analysis Using Crowdsourcing. In Technical Papers, pages 213–223, Dublin, Ireland. Proceedings of the Eighth International Conference Dublin City University and Association for Compu- on Language Resources and Evaluation (LREC’12), tational Linguistics. pages 392–398, Istanbul, Turkey. European Lan- guage Resources Association (ELRA). Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and Ruihong Huang. 2013. Raymond W Gibbs. 1986. On the psycholinguistics of Sarcasm as Contrast between a Positive Sentiment sarcasm. Journal of Experimental Psychology: Gen- and Negative Situation. In Proceedings of the 2013 eral, 115(1):3. Conference on Empirical Methods in Natural Lan- guage Processing, pages 704–714, Seattle, Washing- Roberto González-Ibáñez, Smaranda Muresan, and ton, USA. Association for Computational Linguis- Nina Wacholder. 2011. Identifying Sarcasm in Twit- tics. ter: A Closer Look. In Proceedings of the 49th Annual Meeting of the Association for Computa- Reid Swanson, Stephanie Lukin, Luke Eisenberg, tional Linguistics: Human Language Technologies: Thomas Corcoran, and Marilyn Walker. 2014. Get- Short Papers - Volume 2, HLT ’11, pages 581–586, ting reliable annotations for sarcasm in online dia- Stroudsburg, PA, USA. Association for Computa- logues. In Proceedings of the Ninth International tional Linguistics. Event-place: Portland, Oregon. Conference on Language Resources and Evalua- tion (LREC’14), pages 4250–4257, Reykjavik, Ice- Marti A. Hearst. 1992. Automatic acquisition of hy- land. European Language Resources Association ponyms from large text corpora. In COLING 1992 (ELRA). Volume 2: The 15th International Conference on Computational Linguistics. Yi Tay, Anh Tuan Luu, Siu Cheung Hui, and Jian Su. 2018. Reasoning with Sarcasm by Reading In- Aditya Joshi, Pushpak Bhattacharyya, Mark Carman, Between. In Proceedings of the 56th Annual Meet- Jaya Saraswati, and Rajita Shukla. 2016. How ing of the Association for Computational Linguistics Do Cultural Differences Impact the Quality of Sar- (Volume 1: Long Papers), pages 1010–1020, Mel- casm Annotation?: A Case Study of Indian Anno- bourne, Australia. Association for Computational tators and American Text. In Proceedings of the Linguistics. 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humani- A Search Pattern Production ties, pages 95–99, Berlin, Germany. Association for Computational Linguistics. We construct the regular expression for capturing Christine Liebrecht, Florian Kunneman, and Antal all tweet types — sarcastic, oblivious, and elicit- van den Bosch. 2013. The perfect solution for de- ing — given a 3rd-person cue tweet. Similar logic tecting sarcasm in tweets #not. In Proceedings produces the patterns for 1st- and 2nd-person cues. of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analy- The cue tweet author (A) refers to the sarcas- sis, pages 29–37, Atlanta, Georgia. Association for tic tweet author in the 3rd person (e.g., She was Computational Linguistics. being sarcastic!); we thus assume that A’s tweet 2558
is a response to a second author B, but refers to # Tweets a third author C (the sarcastic author). To unam- Person Patterns Sarcast. Obliv. Elicit. biguously pinpoint the sarcastic tweet, C can only 1st ABAC 2 841 2 841 2 841 appear once in the author sequence. Moreover, (Intended) ABA 1 818 1 818 — only A, B, and C can participate in the thread. ABAB 1 551 1 551 1 551 Other 4 090 2 855 2 683 Finally, C’s tweet can either be a root tweet or Subtotal 10 300 9 065 8 075 a reply to another tweet. The combination of these constraints leads to the regular expression 2nd AB 2 122 — — (Perceived) ABA 782 — 782 /ˆ(A)(A*B[AB]*)(C)([AB]*)$/. Other 96 — 60 (A) is the cue tweet. (A*B[AB]*) forces at Subtotal 3 000 — 842 least one tweet from B (to which A responded). 3rd ABC 1 235 1 235 — (C) is the sarcastic tweet. Finally, ([AB]*) rep- (Perceived) ABCB 119 119 119 resents optional tweets from A or B. If the author ABAC 110 110 — Other 236 119 120 sequence matches the regular expression, we can unambiguously identify the sarcastic author and Subtotal 1 700 1 583 239 the corresponding sarcastic tweet. We also use Total 15 000 10 648 9 156 the search pattern to find the oblivious and elicit- ing tweets. We assume that the cue tweet (A) is Table 7: The most common author patterns by person class. The colors denote the locations of the cue, obliv- triggered by an oblivious tweet from B. Thus, if ious, sarcastic and eliciting tweets. (A*B[AB]*) contains exactly one B, we desig- nate the corresponding tweet as oblivious. Like- wise, ([AB]*) contains the eliciting tweet. C Tweet Position Distribution Table 6 lists the search patterns for the three Reactive supervision enables the measurement of person classes. Note that the 2nd-person pattern conversation position statistics for sarcastic tweets does not include an oblivious tweet because A’s on Twitter. Given a thread {tn , . . . , ti = s, . . . , t1 } cue tweet is a response to a sarcastic tweet from B, with cue tweet tn , sarcastic tweet ti = s, and root i.e., it is not triggered by an oblivious tweet. tweet t1 , we define the position of the sarcastic Person Regular Expression tweet as the distance i − 1 between the sarcastic tweet and the root. Furthermore, the cue lag is the 1st ˆ(A)([ˆA]*)(A)([ˆA]*)$ distance n − i between the cue and the sarcastic 2nd ˆ(A)A*(B)(A*)$ tweet. Table 8 shows the distribution of sarcastic 3rd ˆ(A)(A*B[AB]*)(C)([AB]*)$ tweets by position and cue lag in the SPIRS dataset. Root tweets (position = 0) account for 39% of Table 6: Person classes and their search patterns. The capturing groups’ colors correspond to the locations of sarcastic tweets. A further 39% of sarcastic tweets the cue, oblivious, sarcastic and eliciting tweets. are direct replies to root tweets (position = 1). Interestingly, only 25% of cue tweets are direct replies to their sarcastic targets (lag = 1), while an B Author Sequence Distribution overwhelming 71% have a lag of 2, mostly reflect- Table 7 shows the most common author sequences ing a response to an intermediate oblivious tweet. in SPIRS. The different colors correspond to the We further find that the average thread length is 3.9 different tweet types. The most common pattern tweets, while the average lag is 1.8 tweets. for 1st-person cues is ABAC (as in Figure 1, right panel). AB is the most common pattern for 2nd- Distance from the root tweet person cues, which denote a sarcastic root tweet Cue lag 0 1 2 3 4 5+ Total followed immediately by a cue tweet (e.g., Why 1 16.5 7.2 0.9 0.3 0.1 0.2 25.1 are you being sarcastic?). For 3rd-person cues, the 2 20.6 30.6 11.4 3.8 1.7 2.3 70.4 most common pattern is ABC (as in Figure 1, left 3+ 1.9 1.3 0.7 0.3 0.1 0.2 4.5 panel). Note that some patterns appear in more Total 39.0 39.1 13.0 4.3 1.9 2.7 100.0 than one person class. For example, ABA appears in both 1st- and 2nd-person classes, while ABAC Table 8: % of sarcastic tweets by position (distance appears in both 1st- and 3rd-person. from the root tweet) and cue lag. 2559
You can also read