Reactive Supervision: A New Method for Collecting Sarcasm Data

Page created by Mildred Holmes

Science

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Reactive Supervision: A New Method for Collecting Sarcasm Data

                         Boaz Shmueli1,2,3,∗, Lun-Wei Ku2 and Soumya Ray3
    1
        Social Networks and Human-Centered Computing, Taiwan International Graduate Program
                            2
                              Institute of Information Science, Academia Sinica
                      3
                        Institute of Service Science, National Tsing Hua University

                             Abstract                              generated labels such as the #sarcasm hashtag
        Sarcasm detection is an important task in af-              (Davidov et al., 2010; Ptáček et al., 2014). This
        fective computing, requiring large amounts of              method generates large amounts of data at low cost,
        labeled data. We introduce reactive supervi-               but labels are often noisy and biased (Bamman and
        sion, a novel data collection method that uti-             Smith, 2015).
        lizes the dynamics of online conversations to                 To improve quality, manual annotation asks hu-
        overcome the limitations of existing data col-             mans to label given tweets as sarcastic or not. Since
        lection techniques. We use the new method
                                                                   finding sarcasm in a large corpus is “a needle-in-a-
        to create and release a first-of-its-kind large
        dataset of tweets with sarcasm perspective la-
                                                                   haystack problem” (Liebrecht et al., 2013), manual
        bels and new contextual features. The dataset              annotation can be combined with distant supervi-
        is expected to advance sarcasm detection re-               sion (Riloff et al., 2013). Still, low inter-annotator
        search. Our method can be adapted to other                 reliability is often reported (Swanson et al., 2014),
        affective computing domains, thus opening up               resulting not only from the subjective nature of sar-
        new research opportunities.                                casm but also the lack of cultural context (Joshi
1        Introduction                                              et al., 2016). Moreover, neither method collects
                                                                   both sarcasm perspectives: distant supervision col-
Sarcasm is ubiquitous in human conversations. As                   lects intended sarcasm, while manual annotation
a form of insincere speech, the intent behind a                    can only collect perceived sarcasm.
sarcastic utterance is integral to its meaning. Per-                  Lastly, in manual collection, humans are asked
ceiving a sarcastic utterance as genuine will often                to gather and report sarcastic texts, either their own
result in a complete reversal of the intended mean-                (Oprea and Magdy, 2020) or by others (Filatova,
ing, and vice versa (Gibbs, 1986). It is therefore                 2012). However, both manual methods are slower
crucial for affective computing systems and tasks,                 and more expensive than distant supervision, result-
such as sentiment analysis and dialogue systems, to                ing in smaller datasets.
automatically detect sarcasm from the perspective                     To overcome the above limitations, we propose
of the author as well as the reader in order to avoid              reactive supervision, a novel conversation-based
misunderstandings. Oprea and Magdy (2019) re-                      method that offers automated, high-volume, “in-
cently pioneered the study of intended sarcasm (by                 the-wild” collection of high-quality intended and
the author) vs. perceived sarcasm (by the reader) in               perceived sarcasm data. We use our method to
the context of sarcasm detection tasks. The training               create and release the SPIRS sarcasm dataset1 .
of models for these tasks requires large amounts of
labeled sarcasm data, with Twitter becoming a ma-                  2       Reactive Supervision
jor source due to its popularity as a social network
as well as the huge amounts of conversational text                 Reactive supervision exploits the frequent use in
its users generate. Previous works describe three                  online conversations of a cue tweet — a reply that
methods for collecting sarcasm data: distant super-                highlights sarcasm in a prior tweet. Figure 1 (left
vision, manual annotation, and manual collection.                  panel) shows a typical exchange on Twitter: C
   Distant supervision automatically collects “in-                 posts a sarcastic tweet. Unaware of C’s sarcastic
the-wild” sarcastic tweets by leveraging author-                   intent, B replies with an oblivious tweet. Lastly, A
        ∗                                                              1
            Corresponding author: shmueli@iis.sinica.edu.tw                github.com/bshmueli/SPIRS

                                                              2553
             Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 2553–2559,
                              November 16–20, 2020. c 2020 Association for Computational Linguistics

Person Example Cue Regular Expression Example Author Sequences
1st I was only being sarcastic lol Â[Â]*(A)[Â]*$ ABA, ABAC, ABAB
2nd Why are you being sarcastic? ÂA*(B)A*$ AB, ABA, ABAA
3rd She was just being sarcastic! ÂA*B[AB]*(C)[AB]*$ ABC, ABCB, ABAC

Table 1: The three grammatical person classes, with example cue tweets, corresponding regular expressions, and
examples of matching author sequences. The bold author letter corresponds to the position of the sarcastic tweet.

User_C User_C Algorithm Given a thread {tn , tn−1 , . . . , t1 }
The app we use for work Just watched Forrest with cue tweet tn by an = A, our aim is to identify
emails is not working. Gump. Great film!
I feel terrible about this!
User_A
the sarcastic tweet among {tn−1 , . . . , t1 }. We first
So Tom Hanks can act! examine the personal subject pronoun used in the
User_B Who knew???
Not your fault. Do not
cue (I, you, s/he) and map it to a grammatical per-
User_B
feel guilty!
Literally everyone!!!
son class (1st, 2nd, 3rd). This informs us whether
User_A
the sarcastic author is also the author of the cue
User_A
Replying to @User_B Replying to @User_B (1st), its addressee (2nd), or another party (3rd).
She was just being sarcastic! I was being sarcastic lol For each person class we then apply a heuristic to
identify the sarcastic tweet.
Figure 1: Conversation threads. Left panel: 3rd-person For example, for a 1st-person cue tweet (e.g., I
cue with author sequence ABC. Right panel: 1st-
was just being sarcastic!), the sarcastic tweet must
person cue with author sequence ABAC.
also be authored by A. If the earlier tweets in T
contain exactly one tweet from A, it is unambigu-
ously the sarcastic tweet. Otherwise, if there are
alerts B by replying with a cue tweet (She was just
two or more earlier tweets from A (or none), the sar-
being sarcastic!). Since A replies to B but refers
castic tweet cannot be unambiguously pinpointed
to the sarcastic author in the 3rd person (She), C
and the entire thread is discarded. We formalize
is necessarily the author of the perceived sarcastic
this rule by requiring the author sequence to match
tweet. Similarly, Figure 1 (right panel) shows how
the regular expression /Â[Â]*(A)[Â]*$/,
a 1st person cue (I was just being sarcastic!) can
where the capturing group (A) corresponds to the
be used to unequivocally label intended sarcasm.
sarcastic tweet2 . We are able to use regular expres-
To capture sarcastic tweets, we thus first search sions because we use a string of letters to represent
for cue tweets (using the query phrase “being sar- the author sequence. 2nd- and 3rd-person cues
castic”, often used in responses to sarcastic tweets), produce corresponding rules and patterns. Table 1
then carefully examine each cue tweet to identify lists the three person classes, corresponding regular
the corresponding sarcastic tweet. expressions, and example author sequences.
The following formalizes our method.
2.2 Advantages
2.1 Method Additional Tweet Types Along with each sar-
castic tweet, we collect the oblivious tweet (the
Definitions We define a thread to be a sequence unsuspecting reply to the sarcastic tweet) when
of tweets {tn , tn−1 , . . . , t1 }, where ti+1 is a re- available. As far as we know, this is the first
ply to ti , i = 1, . . . , n − 1. Tweets are listed work that identifies and collects oblivious texts,
in reverse chronological order, with t1 being a new type of data that can improve research on the
the root tweet. The corresponding author se- (mis)understanding of sarcasm, with applications
quence is an an−1 . . . a1 , were we replace the orig- such as automated assistive systems for people with
inal author names with consecutive capital letters emotional or cognitive disabilities. If the sarcastic
(A, B, C, ...), starting with an = A. For exam- tweet is a reply, we also capture the eliciting tweet,
ple, Figure 1 (right panel) depicts a thread of which is the tweet that evoked the sarcastic reply.
length n = 4 with author sequence ABAC. Here We provide more details in Appendix A.
a4 = a2 = A, a3 = B, and a1 = C is the author
2
of the root tweet. We use Perl-Compatible Regular Expressions (PCRE).

2554

Extraction of Semantic Relations Being able Algorithm 1: Data collection pipeline.
to identify the various tweets types (cue, oblivious, Result: Set S of Sarcastic Tweets
sarcastic, eliciting), reactive supervision can be S ← {}
candidates ← Fetch(’being sarcastic’)
understood more abstractly as capturing semantic for cue in candidates do
dependency relations between utterances3 . Reac- switch Classify(cue) do
tive supervision can thus be useful in the context case 1st person do
regexp ← Â[Â]*(A)[Â]*$
of discourse analysis. case 2nd person do
regexp ← ÂA*(B)A*$
Context-Aware Annotation Our method uses case 3rd person do
cues from thread participants, who therefore serve regexp ← ÂA*B[AB]*(C)[AB]*$
case unknown do
as de facto annotators. As participants are familiar continue
with the conversation’s context, we overcome some end
{tn (= cue), tn−1 , . . . , t1 } ← Traverse(cue)
quality issues of using external annotators, who are an an−1 . . . a1 ← authors({tn , tn−1 , . . . , t1 })
often unfamiliar with the conversation context due if i ← Match(regexp, an an−1 . . . a1 ) then
to cultural and social gaps (Joshi et al., 2016). S ← S ∪ {ti }
end
Sarcasm Perspective Previous datasets contain end
either intended or perceived sarcasm, but not both
(Oprea and Magdy, 2019). Our method identifies
and labels both intended and perceived sarcasm 3 SPIRS Dataset
within the same data context: by their essence, 1st-
We implemented reactive supervision using a 4-
person cue tweets capture intended sarcasm, while
step pipeline (see Algorithm 1):
2nd- and 3rd-person cues capture perceived sar-
1. Fetch calls the Twitter Search API to collect
casm. We label a tweet as perceived sarcasm when
cue tweets, using “being sarcastic” as the query.
at least one reader perceives the tweet as sarcastic
2. Classify is a rule-based, precision-oriented
and posts a cue tweet. Detecting perceived sarcasm
classifier that classifies cues as 1st-, 2nd-, or 3rd-
is useful, for example, for training algorithms that
person according to the referred pronoun (I, you,
flag sensitive texts which might be (mis)perceived
s/he). If the cue cannot be accurately classified
as sarcastic (even by a single reader).
(e.g., a pronoun cannot be found, the cue contains
Faster Data Collection We tested González- multiple pronouns, negation words are present), the
Ibáñez et al. (2011)’s distant supervision method cue is classified as unknown and discarded.
of collecting tweets ending with #sarcasm and re- 3. Traverse calls the Twitter Lookup API to
lated hashtags, fetching 171 tweets/day on average. retrieve the thread by starting from the cue tweet
During the same period, our method collected 312 and repeatedly fetching the parent tweet up to the
tweets/day on average, an 82% rate improvement. root tweet.
Summary of Advantages Table 2 summarizes 4. Finally, Match matches the thread’s author se-
the advantages of our best-of-all-worlds method quence with the corresponding regular expression.
over other approaches. Reactive supervision offers Unmatched sequences are discarded. Otherwise,
automated, in-the-wild, and context-aware detec- the sarcastic tweet is identified and saved along
tion of intended and perceived sarcasm data. with the cue tweet, as well as the eliciting and
oblivious tweets when available.
Method → Distant Manual Manual Reactive The pipeline collected 65K cue tweets contain-
Feature ↓ Supervision Annotation Collection Supervision
ing the phrase “being sarcastic” and corresponding
Automatic 3 7 7 4
In-the-wild 3 7 7 4
threads during 48 days in October and November
Oblivious Tweet 7 7 7 4 2019. 77% of the cues were classified as unknown
Context-Aware 3 Maybe Maybe 4 and discarded, ending with 15 000 English sarcas-
Perspective Intended Perceived Either Both
Samples/Day 171 Manual Manual 312 tic tweets. In addition, 10 648 oblivious and 9 156
eliciting tweets were automatically captured. Table
Table 2: Comparison of data collection methods. 3 summarizes the SPIRS dataset. We added 15 000
negative instances by sampling random English
3
It is worth noting that Hearst (1992) uses patterns to tweets captured during the same period, discarding
automatically extract lexical relations between words. tweets with sarcasm-related words or hashtags.

2555

# Tweets (Devlin et al., 2019). For all three models, we
Person Perspective Sarcastic Oblivious Eliciting used 5-fold cross-validation for training, holding
1st Intended 10 300 9 065 8 075 out 20% of the data for testing.
2nd Perceived 3 000 — 842 Results are shown in Table 5 (top panel). BERT
3rd Perceived 1 700 1583 239 is the best performing model, with 70.3% accuracy.
Total 15 000 10 648 9 156 We compared SPIRS’s classification results to the
Ptáček et al. (2014) dataset, commonly used in sar-
Table 3: SPIRS data breakdown by person class. casm benchmarks. We found that Ptáček’s accuracy
is significantly higher (86.6%). We posit that it is
Sarcastic tweets can be either root tweets or because sarcasm is confounded with locale in the
replies. We found that the majority of intended Ptáček (sarcastic tweets are from worldwide users;
sarcasm tweets are replies (78.4%), while the ma- non-sarcastic tweets are from users near Prague),
jority of perceived sarcasm tweets are root tweets and thus classifiers learn features correlated to lo-
(77.0%). Further dataset statistics on author se- cale. We tested our hypothesis by replacing our
quence and tweet position distributions are avail- negative samples with Ptáček’s, which indeed re-
able in Appendices B and C. sulted in boosting the accuracy by 19.1%.

Reliability To assess our method’s reliability in 4.2 Detection with Conversation Context
capturing sarcastic tweets, we manually inspected Our second sarcasm classification experiment uses
200 random sarcastic tweets, along with their cue conversation context by adding eliciting and obliv-
tweets, from each person class. The accuracy of ious tweets to the model. As far as we know, this
sarcastic tweet labeling was high: 98.5%, 98%, is the first sarcasm-related task that uses oblivious
and 97% for 1st-, 2nd-, and 3rd-person cue tweets, texts. Our model concatenated the outputs of three
respectively. Table 4 shows samples of correct and identical 100-unit BiLSTMs (one per tweet: sarcas-
incorrect cue tweet classifications. tic, oblivious, eliciting) before feeding it into dense
layers for classification. Tweets without surround-
Cue Tweet Pers. Correct? ing context were not used in this task. Results are
Shudda been more clear...I was being sarcastic 1st 3 shown in Table 5 (middle panel). Accuracy for the
I’m almost always being sarcastic, but this was real 1st 7
Take it you are being sarcastic 2nd 3 full-context model was 74.7% (MCC 0.398).
You do realize @user was being sarcastic right? 2nd 7
She was being sarcastic. You missed the joke 3rd 3 Ablation Study We conducted context ablation
Mind blown. Had no idea he was being sarcastic 3rd 7 experiments to identify the contribution of each
tweet type. We found that removing the elicit-
Table 4: Correctly and incorrectly classified cue tweets. ing tweets reduces accuracy by 0.5% and MCC
by 0.026. Removing the oblivious tweets, however,
lowered accuracy by 3.4% to 71.4%, and the MCC
4 Experiments and Analysis dropped significantly by 31%, from 0.398 to 0.275.
We present dataset baselines for three tasks: sar- This illustrates the importance of the new oblivious
casm detection, sarcasm detection with conversa- text data provided in the dataset and suggests its
tion context, and sarcasm perspective classification, usefulness in sarcasm-related tasks.
a new task enabled by our dataset. 4.3 Perspective Classification
4.1 Sarcasm Detection Taking advantage of the new labels in our dataset,
we propose a new task to classify a sarcastic text’s
The first experiment is sarcasm detection. We
perspective: intended vs. perceived. Our results are
trained a total of three models: CNN (100 filters
displayed in Table 5 (bottom panel), demonstrating
with a kernel size 3) and BiLSTM (100 units), both
the superiority of BERT over the other models, with
max-pooled and Adam-optimized with a learning
an accuracy of 68.2% and MCC of 0.366.
rate of 0.0005; data was preprocessed as described
in Tay et al. (2018); the embedding layer was pre- Error Analysis We carefully examined the er-
loaded with GloVe embeddings (Twitter data, 100 rors to analyze the causes of perspective misclassifi-
dimensions) (Pennington et al., 2014). We also cation. We observed that misclassified-as-intended
fine-tuned a pre-trained base uncased BERT model tweets (e.g., “You’re lost!”, “Omg that was so

2556

Task                           Dataset              Model                      P           R            F1          Acc           MCC
Sarcasm                        SPIRS                CNN                    67.2 (1.8)   73.6 (5.1)   65.0 (1.2)   65.8 (0.5)   0.308 (0.011)
Detection                      (our dataset)        BiLSTM                 68.9 (2.1)   75.4 (5.5)   67.1 (0.9)   67.9 (0.3)   0.350 (0.008)
                               N =19 384            BERT                   70.1 (1.1)   77.4 (1.2)   69.9 (0.5)   70.3 (0.5)   0.402 (0.008)
                               Ptáček             CNN                    79.1 (0.8)   87.5 (1.3)   77.9 (0.6)   79.2 (0.6)   0.566 (0.012)
                               N =49 766            BiLSTM                 82.4 (1.6)   87.6 (2.9)   80.9 (0.1)   81.7 (0.2)   0.622 (0.002)
                                                    BERT                   87.0 (0.6)   90.9 (0.6)   86.0 (0.2)   86.6 (0.2)   0.721 (0.004)
                               Ptáček (−)         CNN                    84.3 (1.6)   82.6 (2.5)   83.6 (0.8)   83.6 (0.8)   0.673 (0.017)
                               SPIRS (+)            BiLSTM                 86.2 (2.8)   86.7 (2.8)   86.4 (0.7)   86.4 (0.7)   0.729 (0.012)
                               N =21 138∗           BERT                   89.8 (0.7)   89.1 (0.7)   89.4 (0.2)   89.4 (0.2)   0.788 (0.004)
Sarcasm                        SPIRS                3 X BiLSTM             77.7 (1.1)   87.9 (3.5)   68.9 (0.7)   74.8 (0.6)   0.398 (0.007)
Detection                      (our dataset)        w/o eliciting          75.6 (1.1)   91.4 (2.8)   66.3 (1.4)   74.3 (0.3)   0.372 (0.005)
w/ Conversation                N =7 810∗            w/o oblivious          72.4 (2.4)   93.3 (4.5)   58.8 (6.2)   71.4 (1.4)   0.275 (0.053)
Context                                             w/o both               73.2 (2.7)   90.8 (6.6)   60.3 (4.6)   71.2 (0.4)   0.282 (0.033)
Sarcasm                        SPIRS                CNN                    65.5 (1.2)   61.7 (3.3)   64.4 (0.5)   64.5 (0.5)   0.291 (0.009)
Perspective                    (our dataset)        BiLSTM                 66.8 (2.3)   63.1 (5.8)   65.5 (0.7)   65.6 (0.7)   0.315 (0.015)
Classification                 N =6 324∗            BERT                   70.0 (2.9)   63.8 (5.7)   68.0 (1.7)   68.2 (1.6)   0.366 (0.032)

Table 5: Baselines. We report precision, recall, macro-F1, accuracy, and MCC (Matthews correlation coefficient).
Mean and standard deviation were calculated using 5-fold cross-validation. N is the number of instances after
preprocessing. ∗ Dataset classes were balanced using majority class downsampling.

                                                          Intended sarcasm         5    Conclusion
              0.04                                        Perceived sarcasm
Probability

                                                                                   We present an innovative method for collecting
              0.02                                                                 sarcasm data that exploits the natural dynamics of
                                                                                   online conversations. Our approach has multiple
              0.00
                     0    10      20        30       40      50       60           advantages over all existing methods. We used it to
                                          Word count
                                                                                   create and release SPIRS, a large sarcasm dataset
                 Figure 2: Word count distribution in SPIRS                        with multiple novel features. These new features,
                                                                                   including labels for sarcasm perspective and unique
                                                                                   context (e.g., oblivious texts), offer opportunities
funny”) had, on average, almost half the word count                                for advances in sarcasm detection.
of misclassified-as-perceived tweets (17.2 vs. 27.8).                                 Reactive supervision is generalizable. By modi-
We posit that longer, more informative texts make                                  fying the cue tweet selection criteria, our method
sarcasm easier to perceive; hence, short perceived                                 can be adapted to related domains such as senti-
sarcasm or long intended sarcasm might introduce                                   ment analysis and emotion detection, thereby ad-
errors. Analysis of the dataset’s word count distri-                               vancing the quality and quantity of data collection
bution supports our hypothesis (see Figure 2).                                     and offering new research directions in affective
   Looking for further error sources, we inspected                                 computing.
short intended tweets that were misclassified, for
                                                                                   Acknowledgements
example “great friends i have!” and “My mom is
so beautiful”. These tweets can be read as root                                    This research was partially supported by the Min-
tweets and not as replies, yet most intended sar-                                  istry of Science and Technology of Taiwan under
casm tweets are replies while most perceived sar-                                  contracts MOST 108-2221-E-001-012-MY3 and
casm tweets are root tweets (see Section 3). We hy-                                MOST 108-2321-B-009-006-MY2.
pothesize that the classifier learns discourse-related
features (original tweet vs. reply tweet), which can
lead to these errors. Further analysis of sarcasm
perspective and its interplay with sarcasm pragmat-
ics is a promising avenue for future research.

                                                                              2557

References Silviu Oprea and Walid Magdy. 2019. Exploring au-
thor context for detecting intended vs perceived sar-
David Bamman and Noah A Smith. 2015. Contextual- casm. In Proceedings of the 57th Annual Meet-
ized Sarcasm Detection on Twitter. In Ninth Interna- ing of the Association for Computational Linguis-
tional AAAI Conference on Web and Social Media. tics, pages 2854–2859, Florence, Italy. Association
Dmitry Davidov, Oren Tsur, and Ari Rappoport. 2010. for Computational Linguistics.
Semi-supervised Recognition of Sarcastic Sentences
Silviu Oprea and Walid Magdy. 2020. iSarcasm: A
in Twitter and Amazon. In Proceedings of the
Dataset of Intended Sarcasm. In Proceedings of the
Fourteenth Conference on Computational Natural
58th Annual Meeting of the Association for Compu-
Language Learning, CoNLL ’10, pages 107–116,
tational Linguistics. Association for Computational
Stroudsburg, PA, USA. Association for Computa-
Linguistics.
tional Linguistics. Event-place: Uppsala, Sweden.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Jeffrey Pennington, Richard Socher, and Christopher
Kristina Toutanova. 2019. BERT: Pre-training of Manning. 2014. GloVe: Global vectors for word
deep bidirectional transformers for language under- representation. In Proceedings of the 2014 Confer-
standing. In Proceedings of the 2019 Conference ence on Empirical Methods in Natural Language
of the North American Chapter of the Association Processing (EMNLP), pages 1532–1543, Doha,
for Computational Linguistics: Human Language Qatar. Association for Computational Linguistics.
Technologies, Volume 1 (Long and Short Papers),
Tomáš Ptáček, Ivan Habernal, and Jun Hong. 2014.
pages 4171–4186, Minneapolis, Minnesota. Associ-
Sarcasm Detection on Czech and English Twitter.
ation for Computational Linguistics.
In Proceedings of COLING 2014, the 25th Inter-
Elena Filatova. 2012. Irony and Sarcasm: Corpus national Conference on Computational Linguistics:
Generation and Analysis Using Crowdsourcing. In Technical Papers, pages 213–223, Dublin, Ireland.
Proceedings of the Eighth International Conference Dublin City University and Association for Compu-
on Language Resources and Evaluation (LREC’12), tational Linguistics.
pages 392–398, Istanbul, Turkey. European Lan-
guage Resources Association (ELRA). Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra
De Silva, Nathan Gilbert, and Ruihong Huang. 2013.
Raymond W Gibbs. 1986. On the psycholinguistics of Sarcasm as Contrast between a Positive Sentiment
sarcasm. Journal of Experimental Psychology: Gen- and Negative Situation. In Proceedings of the 2013
eral, 115(1):3. Conference on Empirical Methods in Natural Lan-
guage Processing, pages 704–714, Seattle, Washing-
Roberto González-Ibáñez, Smaranda Muresan, and ton, USA. Association for Computational Linguis-
Nina Wacholder. 2011. Identifying Sarcasm in Twit- tics.
ter: A Closer Look. In Proceedings of the 49th
Annual Meeting of the Association for Computa- Reid Swanson, Stephanie Lukin, Luke Eisenberg,
tional Linguistics: Human Language Technologies: Thomas Corcoran, and Marilyn Walker. 2014. Get-
Short Papers - Volume 2, HLT ’11, pages 581–586, ting reliable annotations for sarcasm in online dia-
Stroudsburg, PA, USA. Association for Computa- logues. In Proceedings of the Ninth International
tional Linguistics. Event-place: Portland, Oregon. Conference on Language Resources and Evalua-
tion (LREC’14), pages 4250–4257, Reykjavik, Ice-
Marti A. Hearst. 1992. Automatic acquisition of hy- land. European Language Resources Association
ponyms from large text corpora. In COLING 1992 (ELRA).
Volume 2: The 15th International Conference on
Computational Linguistics. Yi Tay, Anh Tuan Luu, Siu Cheung Hui, and Jian
Su. 2018. Reasoning with Sarcasm by Reading In-
Aditya Joshi, Pushpak Bhattacharyya, Mark Carman, Between. In Proceedings of the 56th Annual Meet-
Jaya Saraswati, and Rajita Shukla. 2016. How ing of the Association for Computational Linguistics
Do Cultural Differences Impact the Quality of Sar- (Volume 1: Long Papers), pages 1010–1020, Mel-
casm Annotation?: A Case Study of Indian Anno- bourne, Australia. Association for Computational
tators and American Text. In Proceedings of the Linguistics.
10th SIGHUM Workshop on Language Technology
for Cultural Heritage, Social Sciences, and Humani- A Search Pattern Production
ties, pages 95–99, Berlin, Germany. Association for
Computational Linguistics. We construct the regular expression for capturing
Christine Liebrecht, Florian Kunneman, and Antal all tweet types — sarcastic, oblivious, and elicit-
van den Bosch. 2013. The perfect solution for de- ing — given a 3rd-person cue tweet. Similar logic
tecting sarcasm in tweets #not. In Proceedings produces the patterns for 1st- and 2nd-person cues.
of the 4th Workshop on Computational Approaches
to Subjectivity, Sentiment and Social Media Analy-
The cue tweet author (A) refers to the sarcas-
sis, pages 29–37, Atlanta, Georgia. Association for tic tweet author in the 3rd person (e.g., She was
Computational Linguistics. being sarcastic!); we thus assume that A’s tweet

2558

is a response to a second author B, but refers to # Tweets
a third author C (the sarcastic author). To unam- Person Patterns Sarcast. Obliv. Elicit.
biguously pinpoint the sarcastic tweet, C can only 1st ABAC 2 841 2 841 2 841
appear once in the author sequence. Moreover, (Intended) ABA 1 818 1 818 —
only A, B, and C can participate in the thread. ABAB 1 551 1 551 1 551
Other 4 090 2 855 2 683
Finally, C’s tweet can either be a root tweet or
Subtotal 10 300 9 065 8 075
a reply to another tweet. The combination of
these constraints leads to the regular expression 2nd AB 2 122 — —
(Perceived) ABA 782 — 782
/ˆ(A)(A*B[AB]*)(C)([AB]*)$/. Other 96 — 60
(A) is the cue tweet. (A*B[AB]*) forces at Subtotal 3 000 — 842
least one tweet from B (to which A responded).
3rd ABC 1 235 1 235 —
(C) is the sarcastic tweet. Finally, ([AB]*) rep- (Perceived) ABCB 119 119 119
resents optional tweets from A or B. If the author ABAC 110 110 —
Other 236 119 120
sequence matches the regular expression, we can
unambiguously identify the sarcastic author and Subtotal 1 700 1 583 239
the corresponding sarcastic tweet. We also use Total 15 000 10 648 9 156
the search pattern to find the oblivious and elicit-
ing tweets. We assume that the cue tweet (A) is Table 7: The most common author patterns by person
class. The colors denote the locations of the cue, obliv-
triggered by an oblivious tweet from B. Thus, if
ious, sarcastic and eliciting tweets.
(A*B[AB]*) contains exactly one B, we desig-
nate the corresponding tweet as oblivious. Like-
wise, ([AB]*) contains the eliciting tweet. C Tweet Position Distribution
Table 6 lists the search patterns for the three
Reactive supervision enables the measurement of
person classes. Note that the 2nd-person pattern
conversation position statistics for sarcastic tweets
does not include an oblivious tweet because A’s
on Twitter. Given a thread {tn , . . . , ti = s, . . . , t1 }
cue tweet is a response to a sarcastic tweet from B,
with cue tweet tn , sarcastic tweet ti = s, and root
i.e., it is not triggered by an oblivious tweet.
tweet t1 , we define the position of the sarcastic
Person Regular Expression tweet as the distance i − 1 between the sarcastic
tweet and the root. Furthermore, the cue lag is the
1st ˆ(A)([ˆA]*)(A)([ˆA]*)$ distance n − i between the cue and the sarcastic
2nd ˆ(A)A*(B)(A*)$ tweet. Table 8 shows the distribution of sarcastic
3rd ˆ(A)(A*B[AB]*)(C)([AB]*)$
tweets by position and cue lag in the SPIRS dataset.
Root tweets (position = 0) account for 39% of
Table 6: Person classes and their search patterns. The
capturing groups’ colors correspond to the locations of sarcastic tweets. A further 39% of sarcastic tweets
the cue, oblivious, sarcastic and eliciting tweets. are direct replies to root tweets (position = 1).
Interestingly, only 25% of cue tweets are direct
replies to their sarcastic targets (lag = 1), while an
B Author Sequence Distribution overwhelming 71% have a lag of 2, mostly reflect-
Table 7 shows the most common author sequences ing a response to an intermediate oblivious tweet.
in SPIRS. The different colors correspond to the We further find that the average thread length is 3.9
different tweet types. The most common pattern tweets, while the average lag is 1.8 tweets.
for 1st-person cues is ABAC (as in Figure 1, right
panel). AB is the most common pattern for 2nd- Distance from the root tweet
person cues, which denote a sarcastic root tweet Cue lag 0 1 2 3 4 5+ Total
followed immediately by a cue tweet (e.g., Why 1 16.5 7.2 0.9 0.3 0.1 0.2 25.1
are you being sarcastic?). For 3rd-person cues, the 2 20.6 30.6 11.4 3.8 1.7 2.3 70.4
most common pattern is ABC (as in Figure 1, left 3+ 1.9 1.3 0.7 0.3 0.1 0.2 4.5
panel). Note that some patterns appear in more Total 39.0 39.1 13.0 4.3 1.9 2.7 100.0
than one person class. For example, ABA appears
in both 1st- and 2nd-person classes, while ABAC Table 8: % of sarcastic tweets by position (distance
appears in both 1st- and 3rd-person. from the root tweet) and cue lag.

2559

You can also read