Code-Switching Patterns Can Be an Effective Route to Improve Performance of Downstream NLP Applications: A Case Study of Humour, Sarcasm and Hate ...

Page created by Herman Cobb

Uncategorized

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Code-Switching Patterns Can Be an Effective Route
         to Improve Performance of Downstream NLP Applications:
        A Case Study of Humour, Sarcasm and Hate Speech Detection

    Srijan Bansal1 , G Vishal2 , Ayush Suhane3 , Jasabanta Patro4 , Animesh Mukherjee5
            Indian Institute of Technology, Kharagpur, West Bengal, India - 721302
        { srijanbansal97, 2 vishal g, 3 ayushsuhane99 }@iitkgp.ac.in,
         1
       4
         jasabantapatro@iitkgp.ac.in, 5 animeshm@cse.iitkgp.ac.in

                     Abstract                                1.1    Motivation
                                                             The primary motivation for the current work is de-
    In this paper we demonstrate how code-                   rived from (Vizcaı́no, 2011) where the author notes
    switching patterns can be utilised to improve            – “The switch itself may be the object of humour”.
    various downstream NLP applications. In par-             In fact, (Siegel, 1995) has studied humour in the
    ticular, we encode different switching features
                                                             Fijian language and notes that when trying to be
    to improve humour, sarcasm and hate speech
    detection tasks. We believe that this simple             comical, or convey humour, speakers switch from
    linguistic observation can also be potentially           Fijian to Hindi. Therefore, humour here is pro-
    helpful in improving other similar NLP appli-            duced by the change of code rather than by the
    cations.                                                 referential meaning or content of the message. The
                                                             paper also talks about similar phenomena observed
                                                             in Spanish-English cases.
1   Introduction
                                                                In a study of English-Hindi code-switching and
Code-mixing/switching in social media has become             swearing patterns on social networks (Agarwal
commonplace. Over the past few years, the NLP                et al., 2017), the authors show that when people
research community has in fact started to vigor-             code-switch, there is a strong preference for swear-
ously investigate various properties of such code-           ing in the dominant language. These studies to-
switched posts to build downstream applications.             gether lead us to hypothesize that the patterns of
The author in (Hidayat, 2012) demonstrated that              switching might be useful in building various NLP
inter-sentential switching is preferred more than            applications.
intra-sentential switching by Facebook users. Fur-
ther while 45% of the switching was done for real            1.2    The present work
lexical needs, 40% was for discussing a particular           To corroborate our hypothesis, in this paper, we
topic and 5% for content classification. In another          consider three downstream applications – (i) hu-
study (Dey and Fung, 2014) interviewed Hindi-                mour detection (Khandelwal et al., 2018), (ii) sar-
English bilingual students and reported that 67% of          casm detection (Swami et al., 2018) and (iii) hate
the words were in Hindi and 33% in English. Re-              speech detection (Bohra et al., 2018) for Hindi-
cently, many down stream applications have been              English code-switched data. We first provide em-
designed for code-mixed text. (Han et al., 2012)             pirical evidence that the switching patterns between
attempted to construct a normalisation dictionary            native (Hindi) and foreign (English) words distin-
offline using the distributional similarity of tokens        guish the two classes of the post, i.e., humour vs
plus their string edit distance. (Vyas et al., 2014)         non-humour or sarcastic vs non-sarcastic or hateful
developed a POS tagging framework for Hindi-                 vs non-hateful. We then featurise these patterns
English data.                                                and pump them in the state-of-the-art classification
   More nuanced applications like humour detec-              models to show the benefits. We obtain a macro-F1
tion (Khandelwal et al., 2018), sarcasm detec-               improvement of 2.62%, 1.85% and 3.36% over the
tion (Swami et al., 2018) and hate speech detec-             baselines on the tasks of humour detection, sarcasm
tion (Bohra et al., 2018) have been targeted for             detection and hate speech detection respectively.
code-switched data in the last two to three years.           As a next step, we introduce a modern deep neu-

                                                        1018
       Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1018–1023
                          July 5 - 10, 2020. c 2020 Association for Computational Linguistics

ral model (HAN - Hierarchical Attention Network                   Sarcasm: Sarcasm dataset released by (Swami
(Yang et al., 2016)) to improve the performance                   et al., 2018) contains tweets that have hashtags
of the models further. Finally, we concatenate the                #sarcasm and #irony. Authors used other keywords
switching features in the last hidden layer of the                such as ‘bollywood’, ‘cricket’ and ‘politics’ to col-
HAN and pass it to the softmax layer for classifi-                lect sarcastic tweets from these domains. In this
cation. This final architecture allows us to obtain a             case, the dataset is heavily unbalanced (see Ta-
macro-F1 improvement of 4.9%, 4.7% and 17.7%                      ble 1). Here the positive class refers to sarcastic
over the original baselines on the tasks of humour                tweets and the negative class means non-sarcastic
detection, sarcasm detection and hate speech detec-               tweets. Some representative examples from our
tion respectively.                                                data showing the point where the sarcasm starts
                                                                  and ends.
2       Dataset                                                      • said aib filthy pandit ji, sarcasmstart aap
                                                                        jo bol rahe ho woh kya shuddh sanskrit hai
We consider three datasets consisting of Hindi (hi)
                                                                        sarcasmend ? irony shameonyou4
- English (en) code-mixed tweets scraped from
                                                                     • irony bappi lahiri sings sarcasmstart sona
Twitter for our experiments - Humour, Sarcasm
                                                                        nahi chandi nahi yaar toh mila arre pyaar kar
and Hate. We discuss the details of each of these
                                                                        le sarcasmend 5
datasets below.
                                                                  Hate speech: (Bohra et al., 2018) created the cor-
               +       -     Tweets    Tokens    Switching*       pus using the tweets posted online in the last five
 Humour      1755    1698     3453      9851        2.20          years which have a good propensity to contain hate
 Sarcasm      504    4746     5250     14930        2.13          speech (see Table 1). Authors mined tweets by
  Hate       1661    2914     4575     10453        4.34
                                                                  selecting certain hashtags and keywords from ‘pol-
Table 1: Dataset description (* denotes average/tweet).           itics’, ‘public protests’, ‘riots’ etc. The positive
                                                                  class refers to a hateful tweets while the negative
Humour: Humour dataset was released by (Khan-                     class means non-hateful tweets6 . An example of
delwal et al., 2018) and has Hindi-English code-                  hate tweet showing the point of switch correspond-
mixed tweets from domains like ‘sports’, ‘politics’,              ing to the start and the end of the hate component.
‘entertainment’ etc. The dataset has uniform dis-                    • I hate my university, hatestart koi us jagah ko
tribution of tweets in each category to yield better                    aag laga dey hateend 7 .
supervised classification results (see Table 1) as                3     Switching features
described by (Du et al., 2014). Here the positive
class refers to humorous tweets while the negative                In this section, we outline the key contribution of
class corresponds to non-humorous tweet. Some                     this work. In particular, we identify how patterns
representative examples from the data showing the                 of switching correlate with the tweet text being hu-
point of switch corresponding to the start and the                morous, sarcastic or hateful. We outline a synopsis
end of the humour component.                                      of our investigation below.
   • women can crib on things like humourstart
                                                                  3.1    Switching and NLP tasks
     bhaiyya ye shakkar bahot zyada meethi hai
     humourend , koi aur quality dikhao1                          In this section, we identify how switching be-
   • shashi kapoor trending on mothersday                         havior is related to the three NLP tasks at our
                                                                      4
     how apt, humourstart mere paas ma hai                              Gloss: said aib filthy pandit ji, whatever you are telling is
     humourend 2                                                  it pure sanskrit? irony shameonyou.
                                                                      5
                                                                        irony bappi lahiri sings Gloss: doesn’t matter you do not
   • political journey of kejriwal, from                          get gold or silver, you have got a friend to love.
     humourstart mujhe chahiye swaraj                                 6
                                                                        The dataset released by this paper only had the hate/non-
     humourend to humourstart mujhe chahiye                       hate tags for each tweet. However, the language tag for each
                                                                  word required for our experiments was not available. Two
     laluraj humourend 3                                          of the authors independently language tagged the data and
                                                                  obtained an agreement of 98.1%. While language tagging, we
    1
      Gloss: women can crib on things like brother the sugar is   noted that the dataset is a mixed bag including hate speech,
a little more sweet, show a different quality.                    offensive and abusive tweets which have already been shown
    2
      Gloss: shashi kapoor trending on mothersday how apt, I      to be different in earlier works (Waseem et al., 2017). How-
have my mother with me.                                           ever, this was the only Hindi-English code-mixed hate speech
    3                                                             dataset available.
      Gloss: political journey of kejriwal, from I want swaraj
                                                                      7
to I want laluraj.                                                      Gloss: I hate my university. Someone burn that place.

                                                              1019

Feature name Description
hand. Let Q be the property that a sentence has en hi switches The number of en to hi switches in a sentence
en words which are surrounded by hi words, that hi en switches The number of hi to en switches in a sentence
is there exists an English word in a Hindi con- V The total number of switches in a sentence
fraction en Fraction of English words in a sentence
text. For instance, the tweet koi hi to hi pray en fraction hi Fraction of Hindi words in a sentence
karo hi mere hi liye hi bhi hi satisfies the property mean hi en Mean of hi en vector
stddev hi en Standard deviation of hi en vector
Q. However, bumrah hi dono hi wicketo hi ke hi mean en hi Mean of en hi vector
beech hi gumrah hi ho hi gaya hi does not satisfy stddev en hi Standard deviation of en hi vector

Q. Table 3: Description of the switching features.
We performed a statistical analysis to determine
the correlation between the switching patterns and
a classification task at hand (represented by T ). Let While we have tested on one language pair (Hindi-
us denote the probability that a tweet belongs to English), our hypothesis is generic and has been
a positive class for a task T given that it satisfies already noted by linguists earlier (Vizcaı́no, 2011).
property Q by p(T |Q). Similarly, let p(T | ∼ Q)
be the probability that the tweet belongs to the 3.2 Construction of the feature vector
positive class for task T and does not satisfy the
property Q. Motivated by the observations in the previous sec-
Further let avg(S|T ) be the average switching tion we construct a vector hi en[i] that denotes
in positive samples for the task T and avg(S| ∼ T ) the number of Hindi (hi) words before the ith En-
denote the average switching in negative samples glish (en) word and a vector en hi[i] that denotes
for the task T . the number of English (en) words before the ith
Hindi (hi) word. This can also be interpreted as
T : Humour T : Sarcasm T : Hate the run-lengths of the Hindi and the English words
p(T |Q) 0.56 0.28 0.36
p(T | ∼ Q) 0.50 0.42 0.41 in the code-mixed tweets. Based on these vectors
avg(S|T ) 7.84 0.60 1.49 we define nine different features that capture the
avg(S| ∼ T ) 6.50 0.89 1.54
switching patterns in the code-mixed tweets8 .
Table 2: Correlation of switching with different classi- An example feature vector computation: Con-
fication tasks. sider the sentence - koi hi to hi pray en karo hi
mere hi liye hi bhi hi.
The main observations from this analysis for the hi en : [0, 0, 2, 0, 0, 0, 0]
three tasks – humour, sarcasm and hate are noted in en hi : [0, 0, 0, 1, 1, 1, 1]
Table 2. For the humour task, p(humour|Q) dom- Feature vector: [1, 1, 2, 71 , 67 , 27 , 0.69, 47 , 0.49]
inates over p(humour| ∼ Q). Further the average
number of switching for the positive samples in
the humour task is larger than the average number 4 Experiments
of switching for the negative samples. Finally, we
observe a positive Pearson’s correlation coefficient 4.1 Pre-processing
of 0.04 between a text being humorous and the text
having the property Q. This together indicates that Tweets are tokenized and punctuation marks are
the switching behavior has a positive connection removed. All the hashtags, mentions and urls are
with a tweet being humorous. stored and converted to string ‘hashtag’, ‘mention’
On the other hand p(sarcasm| ∼ Q) as well and ‘url’ to capture the general semantics of the
as p(hate| ∼ Q) respectively dominate over tweet. Camel-case hashtags were segregated and
p(sarcasm|Q) and p(hate|Q). Moreover the av- included in the tokenized tweets (see (Belainine
erage number of switching for the negative sam- et al., 2016), (Khandelwal et al., 2017)). For exam-
ples for both these tasks is larger than the average ple, #AadabArzHai can be decomposed into three
number of switching for the positive samples. The distinct words: Aadab, Arz and Hai. We use the
Pearson’s correlation between a text being sarcas- same pre-processing for all the results presented in
tic (hateful) and the text having the property Q this paper.
is negative: -0.17 (-0.04). This shows there is an 8
We tried with different other variants but empirically ob-
overall negative connection between the switching serve that these nine features already subsumes all the neces-
behavior and sarcasm/hate speech detection tasks. sary distinguishing qualities.

1020

4.2 Machine learning baselines the three tasks.
Humour baseline (Khandelwal et al., 2018): Uses Handling data imbalance by sub-sampling:
features such as n-grams, bag-of-words, common Since the sarcasm dataset is heavily unbalanced
words and hashtags to train the standard machine we sub-sampled the data to balance the classes. To
learning models such as SVM and Random-Forest. this purpose, we categorise the negative samples
The authors used character n-grams, as previous into those that are easy or hard to classify. Hypoth-
work shows that this feature is very efficient in clas- esizing that if a model can predict the hard samples
sifying text because they do not require expensive reliably it can do the same with the easy samples.
text pre-processing techniques like tokenization, We trained a classifier model on the training dataset
stemming and stop words removal. They are also and obtained the softmax score which represents
language independent and can be used in code- p(sarcastic|text) for the test samples. Those test
mixed texts. In their paper, the authors report the samples which have a score less than a very low
results for tri-grams. confidence score (say 0.001) are removed imag-
ining them to be easy samples. The dataset thus
Sarcasm baseline (Swami et al., 2018): This
got reduced and more balanced. It is important
model also uses a combination of word n-grams,
to note that positive samples are never removed.
character n-grams, presence or absence of certain
We validated this hypothesis through the test set.
emoticons and sarcasm indicative tokens as fea-
Our trained HAN model achieves an accuracy of
tures. A sarcasm indicative score is computed and
94.4% in classifying the easy (thrown out) samples
chi-squared feature reduction is used to take the
as non-sarcastic thus justifying the sub-sampling.
top 500 most relevant words. These were incorpo-
rated into features used for classification. Standard Switching features: We include the switching fea-
off-the-shelf machine learning models like SVM tures to the pre-final fully-connected layer of HAN
and Random Forest were used. to observe if this harnesses additional benefits (see
Figure 1).
Hate baseline (Bohra et al., 2018): The hate
speech detection baseline also consists of similar
features such as character n-grams, word n-grams,
negation words 9 and a lexicon of hate indicative
tokens. Chi-squared feature reduction method was
used to decrease the dimensionality of the features.
Once again SVM and Random Forest based classi-
fiers were used for this task.
Switching features: We plug in the nine switching
features introduced in the previous section to the
three baseline models for humour, sarcasm and hate
speech detection.

4.3 Deep learning architecture
In order to draw the benefits of the modern deep Figure 1: The overall HAN architecture along with the
learning machinery, we build an end-to-end model switching features in the final layer.
for the three tasks at hand. We use the Hierarchi-
cal Attention Network (HAN) (Yang et al., 2016)
which is one of the state-of-the-art models for text 4.4 Experimental Setup
and document classification. It can represent sen- Train-test split: For all datasets, we maintain a
tences in different levels of granularity by stacking train-test split of 0.8 - 0.2 and perform 10-fold
recurrent neural networks on character, word and cross validation.
sentence level by attending over the words which Parameters of the HAN: BiLSTMs: no dropout,
are informative. We use the GRU implementation early stopping patience: 15, optimizer = ‘adam’
of HAN to encode the text representation for all (learning rate = 0.001, beta 1 = 0.9), loss = binary
9
cross entropy, epochs = 200, batch size = 32, pre-
see Christopher Pott’s sentiment tutorial:
http://sentiment.christopherpotts.net/ trained word-embedding size = 50, hidden size:
lingstruc.html [20, 60], dense output size (before concatenation):

1021

Model Humour Sarcasm Hate to Hindi to express the humour; such observations
Baseline (B) 69.34 78.4 33.60
Baseline + Feature (BF) 71.16 79.85 34.73 have also been made in (Rudra et al., 2016) where
HAN (H) 72.04 81.36 38.78 opinion expression was cited as a reason for switch-
HAN + Feature (HF) 72.71 82.07 39.54
ing.
Table 4: Summary of the results from different models Sarcasm being negatively correlated with switch-
in terms of macro-F1 scores. M-W U test shows all ing, a tweet without having switching is more likely
improvements of HF over B are significant. to be sarcastic. For instance, the tweet naadaan hi
baalak hi kalyug hi ka hi vardaan hi hai hi ye hi,
which bears no switching was labeled non-sarcastic
[15, 30].
by the baseline. Our models (BF and HF) have rec-
Pre-trained embeddings: We obtained pre-
tified it and correctly detected it as sarcastic.
trained embeddings by training GloVe from scratch
Similarly, hate being negatively correlated with
using the large code-mixed dataset (725173 tweets)
switching, a tweet with no switching - shilpa hi
released by (Patro et al., 2017) plus all the tweets
ji hi aap hi ravidubey hi jaise hi tuchho hi ko hi
(13278) in our three datasets.
jawab hi mat hi dijiye hi ye hi log hi aap hi ke hi
5 Results sath hi kabhi hi nahi hi was labeled as non-hateful
by the baseline, was detected as hateful by our
We compare the baseline models along with (i) methods (BF and HF).
the baseline + switching feature-based models and
(ii) the HAN models. We use macro-F1 score for 6 Conclusion
comparison all through. The main results are sum-
In this paper, we identified how switching patterns
marized in Table 4. The interesting observations
can be effective in improving three different NLP
that one can make from these results are – (i) inclu-
applications. We present a set of nine features that
sion of the switching features always improves the
improve upon the state-of-the-art baselines. In addi-
overall performance of any model (machine learn-
tion, we exploit the modern deep learning machin-
ing or deep learning) for all the three tasks, (ii) the
ery to improve the performance further. Finally,
deep learning models are always better than the
this model can be improved further by pumping
machine learning models. Inclusion of switching
the switching features in the final layer of the deep
features into the machine learning models (indi-
network.
cated as BF in Table 4) allows us to obtain a macro-
In future, we would like to extend this work for
F1 improvement of 2.62%, 1.85% and 3.36% over
other language pairs. For instance, we have seen
the baselines (indicated as B in Table 4) on the
examples of such switching in English-Spanish10
tasks of humour detection, sarcasm detection and
and English-Telugu11 pairs also. Further we plan
hate speech detection respectively. Inclusion of
to investigate other NLP applications that can ben-
the switching feature in the HAN model (indicated
efit from the simple linguistic features introduced
as HF in Table 4) allows us to obtain a macro-F1
here.
improvement of 4.9%, 4.7% and 17.7% over the
original baselines (indicated as B in Table 4) on the
tasks of humour detection, sarcasm detection and References
hate speech detection respectively.
P. Agarwal, A. Sharma, J. Grover, M. Sikka, K. Rudra,
Success of our model: Success of our approach and M. Choudhury. 2017. I may talk in english but
is evident from the following examples. For gaali toh hindi mein hi denge : A study of english-
instance, as we had demonstrated earlier, hu- hindi code-switching and swearing pattern on social
mour is positively correlated with switching, a networks. In 2017 9th International Conference
on Communication Systems and Networks (COM-
tweet having a switching pattern like - anurag hi SNETS), pages 554–557.
kashyap hi can en never en join en aap hi be-
10
cause en ministers en took en oath en, “main hi Don’t forget humourstart la toalla cuando go to le playa
humourend ; Gloss: Don’t forget the towel when you go to
kisi hi anurag hi aur hi dwesh hi ke hi bina hi the beach.
kaam hi karunga hi” which was not detected as hu- 11
Votes kosam quality leni liquor bottles supply chesina
morous by the baseline (B) but was detected so by politicians sarcasmstart ee roju quality gurinchi mat-
laduthunnaru sarcasmend ; Gloss:Politicians who supplied
our models (BF and HF). Note that the author of the low quality liquor bottles for votes are talking about quality
above tweet seems to have categorically switched today.

1022

Billal Belainine, Alexsandro Fonseca, and Fatiha Sa- Koustav Rudra, Shruti Rijhwani, Rafiya Begum, Ka-
dat. 2016. Named entity recognition and hash- lika Bali, Monojit Choudhury, and Niloy Ganguly.
tag decomposition to improve the classification of 2016. Understanding language preference for ex-
tweets. In Proceedings of the 2nd Workshop on pression of opinion and sentiment: What do hindi-
Noisy User-generated Text (WNUT), pages 102–111, english speakers do on twitter? In Proceedings
Osaka, Japan. The COLING 2016 Organizing Com- of the 2016 Conference on Empirical Methods in
mittee. Natural Language Processing, EMNLP 2016, Austin,
Texas, USA, November 1-4, 2016, pages 1131–1141.
Aditya Bohra, Deepanshu Vijay, Vinay Singh,
Syed Sarfaraz Akhtar, and Manish Shrivastava. Jeff Siegel. 1995. How to get a laugh in fijian:
2018. A dataset of Hindi-English code-mixed social Code-switching and humor. Language in Society,
media text for hate speech detection. In Proceedings 24(1):95–110.
of the Second Workshop on Computational Model-
ing of People’s Opinions, Personality, and Emotions Sahil Swami, Ankush Khandelwal, Vinay Singh,
in Social Media, pages 36–41, New Orleans, Syed Sarfaraz Akhtar, and Manish Shrivastava. 2018.
Louisiana, USA. Association for Computational A corpus of english-hindi code-mixed tweets for sar-
Linguistics. casm detection. CoRR, abs/1805.11869.

Anik Dey and Pascale Fung. 2014. A Hindi-English Marı́a José Garcı́a Vizcaı́no. 2011. Association humor
code-switching corpus. In Proceedings of the Ninth in code-mixed airline advertising.
International Conference on Language Resources
and Evaluation (LREC’14). Yogarshi Vyas, Spandana Gella, Kalika Bali, and
Monojit Choudhury. 2014. Pos tagging of english-
Mian Du, Matthew Pierce, Lidia Pivovarova, and Ro- hindi code-mixed social media content. pages 974–
man Yangarber. 2014. Supervised classification us- 979.
ing balanced training. pages 147–158.
Zeerak Waseem, Thomas Davidson, Dana Warmsley,
Bo Han, Paul Cook, and Timothy Baldwin. 2012. Au- and Ingmar Weber. 2017. Understanding abuse: A
tomatically constructing a normalisation dictionary typology of abusive language detection subtasks. In
for microblogs. In Proceedings of the 2012 Joint Proceedings of the First Workshop on Abusive Lan-
Conference on Empirical Methods in Natural Lan- guage Online, pages 78–84, Vancouver, BC, Canada.
guage Processing and Computational Natural Lan- Association for Computational Linguistics.
guage Learning, pages 421–432, Jeju Island, Korea.
Association for Computational Linguistics. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He,
Alex Smola, and Eduard Hovy. 2016. Hierarchical
Taofik Hidayat. 2012. An analysis of code switching attention networks for document classification. In
used by facebookers (a case study in a social net- Proceedings of the 2016 Conference of the North
work site). American Chapter of the Association for Computa-
tional Linguistics: Human Language Technologies,
Ankush Khandelwal, Sahil Swami, Syed Sarfaraz
pages 1480–1489, San Diego, California. Associa-
Akhtar, and Manish Shrivastava. 2017. Classifica-
tion for Computational Linguistics.
tion of spanish election tweets (COSET) 2017 : Clas-
sifying tweets using character and word level fea-
tures. In Proceedings of the Second Workshop on
Evaluation of Human Language Technologies for
Iberian Languages (IberEval 2017) co-located with
33th Conference of the Spanish Society for Natural
Language Processing (SEPLN 2017), Murcia, Spain,
September 19, 2017, pages 49–54.
Ankush Khandelwal, Sahil Swami, Syed Sarfaraz
Akhtar, and Manish Shrivastava. 2018. Humor de-
tection in english-hindi code-mixed social media
content : Corpus and baseline system. CoRR,
abs/1806.05513.
Jasabanta Patro, Bidisha Samanta, Saurabh Singh, Ab-
hipsa Basu, Prithwish Mukherjee, Monojit Choud-
hury, and Animesh Mukherjee. 2017. All that is
English may be Hindi: Enhancing language identi-
fication through automatic ranking of the likeliness
of word borrowing in social media. In Proceed-
ings of the 2017 Conference on Empirical Methods
in Natural Language Processing, pages 2264–2274,
Copenhagen, Denmark. Association for Computa-
tional Linguistics.

1023

You can also read