Explaining NLP Models via Minimal Contrastive Editing (MICE)
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Explaining NLP Models via Minimal Contrastive Editing (M I CE) Alexis Ross† Ana Marasović†} Matthew E. Peters† Allen Institute for Artificial Intelligence, Seattle, WA, USA † } Paul G. Allen School of Computer Science and Engineering, University of Washington {alexisr,anam,matthewp}@allenai.org Abstract Humans have been shown to give contrastive explanations, which explain why an observed event happened rather than some other coun- terfactual event (the contrast case). De- spite the influential role that contrastivity plays in how humans explain, this property is largely missing from current methods for explaining NLP models. We present M IN - IMAL C ONTRASTIVE E DITING (M I CE), a method for producing contrastive explanations of model predictions in the form of edits to inputs that change model outputs to the contrast case. Our experiments across three tasks—binary sentiment classification, topic classification, and multiple-choice question Figure 1: An example M I CE edit for a multiple-choice answering—show that M I CE is able to pro- question from the R ACE dataset. M I CE generates con- duce edits that are not only contrastive, but trastive explanations in the form of edits to inputs that also minimal and fluent, consistent with human change model predictions to target (contrast) predic- contrastive edits. We demonstrate how M I CE tions. The edit (bolded in red) is minimal and fluent, edits can be used for two use cases in NLP sys- and it changes the model’s prediction from “by train” to tem development—debugging incorrect model the contrast prediction “on foot” (highlighted in gray). outputs and uncovering dataset artifacts—and thereby illustrate that producing contrastive ex- planations is a promising research direction for instead of at Ann’s home on foot; such information model interpretability. is captured by the edit (bolded red) that results in the new model prediction “on foot.” For a differ- 1 Introduction ent contrast prediction, such as “by car,” we would Cognitive science and philosophy research has provide a different explanation. In this work, we shown that human explanations are contrastive propose to give contrastive explanations of model (Miller, 2019): People explain why an observed predictions in the form of targeted minimal edits, as event happened rather than some counterfactual shown in Figure 1, that cause the model to change event called the contrast case. This contrast case its original prediction to the contrast prediction. plays a key role in modulating what explanations Given the key role that contrastivity plays in are given. Consider Figure 1. When we seek an ex- human explanations, making model explanations planation of the model’s prediction “by train,” we contrastive could make them more user-centered seek it not in absolute terms, but in contrast to an- and thus more useful for their intended purposes, other possible prediction (i.e. “on foot”). Addition- such as debugging and exposing dataset biases ally, we tailor our explanation to this contrast case. (Ribera and Lapedriza, 2019)—purposes which re- For instance, we might explain why the prediction quire that humans work with explanations (Alvarez- is “by train” and not “on foot” by saying that the Melis et al., 2019). However, many currently pop- writer discusses meeting Ann at the train station ular instance-based explanation methods produce 3840 Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3840–3852 August 1–6, 2021. ©2021 Association for Computational Linguistics
highlights—segments of input that support a pre- 2 M I CE: Minimal Contrastive Editing diction (Zaidan et al., 2007; Lei et al., 2016; Chang et al., 2019; Bastings et al., 2019; Yu et al., 2019; This section describes our proposed method, M INI - MAL C ONTRASTIVE E DITING , or M I CE, for ex- DeYoung et al., 2020; Jain et al., 2020; Belinkov and Glass, 2019) that can be derived through gradi- plaining NLP models with contrastive edits. ents (Simonyan et al., 2014; Smilkov et al., 2017; 2.1 M I CE Edits as Contrastive Explanations Sundararajan et al., 2017), approximations with simpler models (Ribeiro et al., 2016), or attention Contrastive explanations are answers to questions (Wiegreffe and Pinter, 2019; Sun and Marasović, of the form Why p and not q? They explain why 2021). These methods are not contrastive, as they the observed event p happened instead of another leave the contrast case undetermined; they do not event q, called the contrast case.3 A long line of tell us what would have to be different for a model research in the cognitive sciences and philosophy to have predicted a particular contrast label.1 has found that human explanations are contrastive As an alternative approach to NLP model expla- (Van Fraassen, 1980; Lipton, 1990; Miller, 2019). nation, we introduce M INIMAL C ONTRASTIVE Human contrastive explanations have several hall- E DITING (M I CE)—a two-stage approach to gen- mark characteristics. First, they cite contrastive erating contrastive explanations in the form of tar- features: features that result in the contrast case geted minimal edits (as shown in Figure 1). Given when they are changed in a particular way (Chin- an input, a fixed P REDICTOR model, and a contrast Parker and Cantelon, 2017). Second, they are min- prediction, M I CE generates edits to the input that imal in the sense that they rarely cite the entire change the P REDICTOR’s output from the original causal chain of a particular event, but select just a prediction to the contrast prediction. We formally few relevant causes (Hilton, 2017). In this work, define our edits and describe our approach in §2. we argue that a minimal edit to a model input that We design M I CE to produce edits with prop- causes the model output to change to the contrast erties motivated by human contrastive explana- case has both these properties and can function as tions. First, we desire edits to be minimal, alter- an effective contrastive explanation. We first give ing only small portions of input, a property which an illustration of contrastive explanations humans has been argued to make explanations more intel- might give and then show how minimal contrastive ligible (Alvarez-Melis et al., 2019; Miller, 2019). edits offer analogous contrastive information. Second, M I CE edits should be fluent, resulting As an example, suppose we want to explain why in text natural for the domain and ensuring that the answer to the question “Q: Where can you find any changes in model predictions are not driven a clean pillow case that is not in use?” is “A: the by inputs falling out of distribution of naturally drawer.”4 If someone asks why the answer is not occurring text. Our experiments (§3) on three “C1: on the bed,” we might explain: “E1: Because English-language datasets, I MDB, N EWSGROUPS, only the drawer stores pillow cases that are not and R ACE, validate that M I CE edits are indeed in use.” However, E1 would not be an explana- contrastive, minimal, and fluent. tion of why the answer is not “C2: in the laundry We also analyze the quality of M I CE edits (§4) hamper,” since both drawers and laundry hampers and show how they may be used for two use cases store pillow cases that are not in use. For contrast in NLP system development. First, we show that case C2, we might instead explain: “E2: Because M I CE edits are comparable in size and fluency to only laundry hampers store pillow cases that are human edits on the I MDB dataset. Next, we illus- not clean.” We cite different parts of the original trate how M I CE edits can facilitate debugging in- question depending on the contrast case. dividual model predictions. Finally, we show how In this work, we propose to offer contrastive ex- M I CE edits can be used to uncover dataset artifacts planations in the form of minimal edits that result learned by a powerful P REDICTOR model.2 in the contrast case as model output. Such edits are effective contrastive explanations because, by con- 1 Free-text rationales (Narang et al., 2020) can be con- struction, they highlight contrastive features. For trastive if human justifications are collected by asking “why... 3 instead of...” which is not the case with current benchmarks Related work also calls it the foil (Miller, 2019). (Camburu et al., 2018; Rajani et al., 2019; Zellers et al., 2019). 4 Inspired by an example in Talmor et al. (2019): Question: 2 Our code and trained E DITOR models are publicly avail- “Where would you store a pillow case that is not in use?” able at https://github.com/allenai/mice. Choices: “drawer, kitchen cupboard, bedding store, england.” 3841
Figure 2: An overview of M I CE, our two-stage approach to generating edits. In Stage 1 (§2.3), we train the E DITOR to make edits targeting specific predictions from the P REDICTOR. In Stage 2 (§2.4), we make contrastive edits with the E DITOR model from Stage 1 such that the P REDICTOR changes its output to the contrast prediction. example, a contrastive edit of the original question target label as input. In Stage 2 of M I CE, we gener- for contrast case C1 would be: “Where can you find ate contrastive edits e(x) using the E DITOR model a clean pillow case that is not in use?”; the informa- from Stage 1. Specifically, we generate candidate tion provided by this edit—that it is whether or not edits e(x) by masking different percentages of x the pillow case is in use that determines whether and giving masked inputs with prepended contrast the answer is “the drawer” or “on the bed”—is anal- label yc to the E DITOR; we use binary search to ogous to the information provided by E1. Similarly, find optimal masking percentages and beam search a contrastive edit for contrast case C2 that changed to keep track of candidate edits that result in the the question to “Where can you find a clean dirty highest probability of the contrast labels p(yc |e(x)) pillow case that is not in use?” provides analogous given by the P REDICTOR. information to E2. 2.3 Stage 1: Fine-tuning the E DITOR 2.2 Overview of M I CE In Stage 1 of M I CE, we fine-tune the E DITOR to We define a contrastive edit to be a modifica- infill masked spans of text in a targeted manner. tion of an input instance that causes a P REDIC - Specifically, we fine-tune a pretrained model to in- TOR model (whose behavior is being explained) fill masked spans given masked text and a target to change its output from its original prediction end-task label as input. In this work, we use the for the unedited input to a given target (contrast) T EXT- TO -T EXT T RANSFER T RANSFORMER (T5) prediction. Formally, for textual inputs, given a model (Raffel et al., 2020) as our pretrained E DI - fixed P REDICTOR f , input x = (x1 , x2 , ..., xN ) TOR , but any model suitable for span infilling can of N tokens, original prediction f (x) = yp and in principle be the E DITOR in M I CE. The addition contrast prediction yc 6= yp , a contrastive edit is a of the target label allows the highly-contextualized mapping e : (x1 , ..., xN ) ! (x01 , ..., x0M ) such that E DITOR to condition its predictions on both the f (e(x)) = yc . masked context and the given target label such that We propose M I CE, a two-stage approach to gen- the contrast label is not ignored in Stage 2. What to erating contrastive edits, illustrated in Figure 2. In use as target labels during Stage 1 depends on who Stage 1, we prepare a highly-contextualized E DI - the end-users of M I CE are. The end-user could TOR model to associate edits with given end-task be: (1) a model developer who has access to the labels (i.e., labels for the task of the P REDICTOR) labeled data used to train the predictor, or (2) lay- such that the contrast label yc is not ignored in users, domain experts, or other developers without M I CE’s second stage. Intuitively, we do this by access to the labeled data. In the former case, we masking the spans of text that are “important” for could use the gold label as targets, and in the latter the given target label (as measured by the P REDIC - case, we could use the labels predicted by P REDIC - TOR’s gradients) and training our E DITOR to recon- TOR. Therefore, during fine-tuning, we experiment struct these spans of text given the masked text and with using both gold labels and original predictions 3842
yp of our P REDICTOR model as target labels. To 3 Evaluation provide target labels, we prepend them to inputs to the E DITOR. For more information about how This section presents empirical findings that M I CE these inputs are formatted, see Appendix B. Results produces minimal and fluent contrastive edits. in Table 2 show that fine-tuning with target labels 3.1 Experimental Setup results in better edits than fine-tuning without them. The above procedure allows our E DITOR to con- Tasks We evaluate M I CE on three English- dition its infilled spans on both the context and the language datasets: IMDB, a binary sentiment clas- target label. But this still leaves open the ques- sification task (Maas et al., 2011), a 6-class ver- tion of where to mask our text. Intuitively, we sion of the 20 N EWSGROUPS topic classification want to mask the tokens that contribute most to task (Lang, 1995), and R ACE, a multiple choice the P REDICTOR’s predictions, since these are the question-answering task (Lai et al., 2017).6 tokens that are most strongly associated with the P REDICTORS M I CE can be used to make con- target label. We propose to use gradient attribu- trastive edits for any differentiable P REDICTOR tion (Simonyan et al., 2014) to choose tokens to model, i.e., any end-to-end neural model. In this mask. For each instance, we take the gradient of paper, for each task, we train a P REDICTOR model the predicted logit for the target label with respect f built on RO BERTA - LARGE (Liu et al., 2019), to the embedding layers of f and take the `1 norm and fix it during evaluation. The test accuracies across the embedding dimension. We then mask of our P REDICTORS are 95.9%, 85.3% and 84% the n1 % of tokens with the highest gradient norms. for I MDB, N EWSGROUPS, and R ACE, respectively. We replace consecutive tokens (i.e., spans) with For training details, see Appendix A.1. sentinel tokens, following Raffel et al. (2020). Re- sults in Table 1 show that gradient-based masking E DITORS Our E DITORS build on the base ver- outperforms random masking. sion of T5. For fine-tuning our E DITORS (Stage 1), we use the original training data used to train P RE - 2.4 Stage 2: Making Edits with the E DITOR DICTORS . We randomly split the data, 75%/25% In the second stage of our approach, we use our fine- for fine-tuning/validation and fine-tune until the tuned E DITOR to make edits using beam search validation loss stops decreasing (for a max of 10 (Reddy, 1977). In each round of edits, we mask epochs) with n1 % of tokens masked, where n1 is consecutive spans of n2 % of tokens in the original a randomly chosen value in [20, 55]. For more input, prepend the contrast prediction to the masked details, see Appendix A.2. In Stage 2, for each input, and feed the resulting masked instance to the instance, we set the label with the second highest E DITOR; the E DITOR then generates m edits. The predicted probability as the contrast prediction. We masking procedure during this stage is gradient- set beam width b = 3, consider s = 4 search levels based as in Stage 1. during binary search over n2 in each edit round, In one round of edits, we conduct a binary search and run our search for a max of 3 edit rounds. For with s levels over values of n2 between values each n2 , we sample m = 15 generations from our n2 = 0% to n2 = 55% to efficiently find a value fine-tuned E DITORS with p = 0.95, k = 30.7 of n2 that is large enough to result in the contrast Metrics We evaluate M I CE on the test sets of prediction while also modifying only minimal parts the three datasets. The R ACE and N EWSGROUPS of the input. After each round of edits, we get f ’s test sets contain 4,934 and 7,307 instances, respec- predictions on the edited inputs, order them by con- tively.8 For I MDB, we randomly sample 5K of the trast prediction probabilities, and update the beam to store the top b edited instances. As soon as an 6 We create this 6-class version by mapping the 20 exist- edit e⇤ = e(t) is found that results in the contrast ing subcategories to their respective larger categories—i.e. “talk.politics.guns” and “talk.religion.misc” ! “talk.” We do prediction, i.e., f (e⇤ ) = yc , we stop the search this in order to make the label space smaller. The resulting procedure and return this edit. For generation, we classes are: alt, comp, misc, rec, sci, and talk. 7 use a combination of top-k (Fan et al., 2018) and We tune these hyperparameters on a 50-instance subset of the I MDB validation set prior to evaluation. We note that top-p (nucleus) sampling (Holtzman et al., 2020).5 for larger values of n2 , the generations produced by the T5 E DITORS sometimes degenerate; see Appendix C for details. 5 8 We use this combination because we observed in prelimi- For the N EWSGROUPS test set, there are 7,307 instances nary experiments that it led to good results. remaining after filtering out empty strings. 3843
M I CE I MDB N EWSGROUPS R ACE " # ⇡1 " # ⇡1 " # ⇡1 VARIANT Flip Rate Minim. Fluen. Flip Rate Minim. Fluen. Flip Rate Minim. Fluen. *P RED + G RAD 1.000 0.173 0.981 0.992 0.261 0.968 0.915 0.331 0.981 *G OLD + G RAD 1.000 0.185 0.979 0.992 0.271 0.966 0.945 0.335 0.979 P RED + R AND 1.000 0.257 0.958 0.968 0.378 0.928 0.799 0.440 0.953 G OLD + R AND 1.000 0.302 0.952 0.965 0.370 0.929 0.801 0.440 0.955 N O -F INETUNE 0.995 0.360 0.960 0.941 0.418 0.938 – – – Table 1: Efficacy of the M I CE procedure. We evaluate M I CE edits on three metrics (described in §3.1): flip rate, minimality, and fluency. We report mean values for minimality and fluency. * marks full M I CE variants; others explore ablations. For each property (i.e., column), the best value across M I CE variants is bolded. We experiment with P REDICTOR’s predictions (P RED) and gold labels (G OLD) as target labels during Stage 1. Across datasets, our G RAD M I CE procedure achieves a high flip rate with small and fluent edits. 25K instances in the test set for evaluation because high flip rate across all three tasks. This is the out- of the computational demands of evaluation.9 come regardless of whether predicted target labels For each dataset, we measure the following three (first row, 91.5–100% flip rate) or gold target labels properties: (1) flip rate: the proportion of in- (second row, 94.5–100% flip rate) are used for fine- stances for which an edit results in the contrast tuning in Stage 1. We observe a slight improvement label; (2) minimality: the “size” of the edit as from using the gold labels for the R ACE P REDIC - measured by the word-level Levenshtein distance TOR , which may be explained by the fact that it is between the original and edited input, which is the less accurate (with a training accuracy of 89.9%) minimum number of deletions, insertions, or sub- than the I MDB and N EWSGROUPS classifiers. stitutions required to transform one into the other. M I CE achieves a high flip-rate while its edits We report a normalized version of this metric with remain small and result in fluent text. In particular, a range from 0 to 1—the Levenshtein distance di- M I CE on average changes 17.3–33.1% of the origi- vided by the number of words in the original in- nal tokens when predicted labels are used in Stage 1 put; (3) fluency: a measure of how similarly dis- and 18.5–33.5% with gold labels. Fluency is close tributed the edited output is to the original data. We to 1.0 indicating no notable change in mask lan- evaluate fluency by comparing masked language guage modeling loss after the edit—i.e., edits fall modeling loss on both the original and edited inputs in distribution of the original data. We achieve the using a pretrained model. Specifically, given the best results across metrics on the I MDB dataset, as original N -length sequence, we create N copies, expected since I MDB is a binary classification task each with a different token replaced by a mask to- with a small label space. These results demonstrate ken, following Salazar et al. (2020). We then take that M I CE presents a promising research direction a pretrained T 5- BASE model and compute the aver- for the generation of contrastive explanations; how- age loss across these N copies. We compute this ever, there is still room for improvement, especially loss value for both the original input and edited for more challenging tasks such as R ACE. input and report their ratio—i.e., edited / original. In the rest of this section, we provide results We aim for a value of 1.0, which indicates equiva- from several ablation experiments. lent losses for the original and edited texts. When M I CE finds multiple edits, we report metrics for Fine-tuning vs. No Fine-tuning We investigate the edit with the smallest value for minimality. the effect of fine-tuning (Stage 1) with a base- line that skips Stage 1 altogether. For this N O - 3.2 Results F INETUNE baseline variant of M I CE, we use the vanilla pretrained T5- BASE as our E DITOR. As Results are shown in Table 1. Our proposed G RAD shown in Table 1, the N O -F INETUNE variant un- M I CE procedure (upper part of Table 1) achieves a derperforms all other (two-stage) variants of M I CE 9 A single contrastive edit is expensive and takes an average for the I MDB and N EWSGROUPS datasets.10 Fine- of ⇡ 15 seconds per I MDB instance (⇡ 230 tokens). Calculat- 10 ing the fluency metric adds an additional average of ⇡ 16.5 We leave R ACE out from our evaluation with the N O - seconds per I MDB instance. For more details, see Section 5. F INETUNE baseline because we observe that the pretrained 3844
I MDB Condition " # ⇡1 bels in both stages provides signal that allows the Stage 1 Stage 2 Flip Rate Minim. Fluen. E DITOR in Stage 2 to generate prediction-flipping No Label No Label 0.994 0.369 0.966 edits at lower masking percentages. No Label Label 0.997 0.362 0.967 Label No Label 0.999 0.327 0.968 4 Analysis of Edits Label Label 1.000 0.173 0.981 In this section, we compare M I CE edits with hu- Table 2: Effect of using target end-task labels during man contrastive edits. Then, we turn to a key mo- the two stages of PRED+GRAD M I CE on the I MDB tivation for this work: the potential for contrastive dataset. When end-task labels are provided, they are explanations to assist in NLP system development. original P REDICTOR labels during Stage 1 and contrast We show how M I CE edits can be used to debug labels during Stage 2. The best values for each property incorrect predictions and uncover dataset artifacts. (column) are bolded. Using end-task labels during both Stage 1 (E DITOR fine-tuning) and Stage 2 (making ed- 4.1 Comparison with Human Edits its) of M I CE outperforms all other conditions. We ask whether the contrastive edits produced by M I CE are minimal and fluent in a meaningful tuning particularly improves the minimality of ed- sense. In particular, we compare these two met- its, while leaving the flip rate high. We hypothesize rics for M I CE edits and human contrastive edits. that this effect is due to the effectiveness of Stage We work with the I MDB contrast set created by 2 of M I CE at finding contrastive edits: Because Gardner et al. (2020), which consists of original we iteratively generate many candidate edits using test inputs and human-edited inputs that cause a beam search, we are likely to find a prediction- change in true label. We report metrics on the sub- flipping edit. Fine-tuning allows us to find such an set of this contrast set for which the human-edited edit at a lower masking percentage. inputs result in a change in model prediction for our I MDB P REDICTOR; this subset consists of 76 in- Gradient vs. Random Masking We study the stances. The flip rate of M I CE edits on this subset impact of using gradient-based masking in Stage is 100%. The mean minimality values of human 1 of the M I CE procedure with a R AND variant, and M I CE edits are 0.149 (human) and 0.179 which masks spans of randomly chosen tokens. As (M I CE), and the mean fluency values are 1.01 (hu- shown in the middle part of Table 1, gradient-based man) and 0.949 (M I CE). The similarity of these masking outperforms random masking when using values suggests that M I CE edits are comparable to both predicted and gold labels across all three tasks human contrastive edits along these dimensions. and metrics, suggesting that the gradient-based at- We also ask to what extent human edits overlap tribution used to mask text during Stage 1 of M I CE with M I CE edits. For each input, we compute the is an important part of the procedure. The differ- overlap between the original tokens changed by hu- ences are especially notable for R ACE, which is the mans and the original tokens edited by M I CE. The most challenging task according to our metrics. mean number of overlapping tokens, normalized by the number of original tokens edited by humans, is Targeted vs. Un-targeted Infilling We investi- 0.298. Thus, while there is some overlap between gate the effect of using target labels in both stages M I CE and human contrastive edits, they gener- of M I CE by experimenting with removing target ally change different parts of text.11 This analysis labels during Stage 1 (E DITOR fine-tuning) and suggests that there may exist multiple informative Stage 2 (making edits). As shown in Table 2, we contrastive edits for a single input. Future work observe that giving target labels to our E DITORS can investigate and compare the different kinds of during both stages of M I CE improves edit qual- insight that can be obtained through human and ity. Fine-tuning E DITORS without labels in Stage 1 model-driven contrastive edits. (“No Label”) leads to worse flip rate, minimality, and fluency than does fine-tuning E DITORS with la- 4.2 Use Case 1: Debugging Incorrect Outputs bels (“Label”). Minimality is particularly affected, Here, we illustrate how M I CE edits can be used to and we hypothesize that using target end-task la- debug incorrect model outputs. Consider the R ACE 11 T5 model does not generate text formatted as span infills; we M I CE edits explain P REDICTORS’ behavior and therefore hypothesize that this model has not been trained to generate need not be similar to human edits, which are designed to infills for masked inputs formatted as multiple choice inputs. change gold labels. 3845
Original pred yp = positive Contrast pred yc = negative An interesting pairing of stories, this little flick manages to bring together seemingly different characters and story lines all in the backdrop of WWII and succeeds in tying them together without losing the audience. I MDB I was impressed by the depth portrayed by the different characters and also by how much I really felt I understood them and their motivations, even though the time spent on the development of each character was very limited. The outstanding acting abilities of the individuals involved with this picture are easily noted. A fun, stylized movie with a slew of comic moments and a bunch more head shaking events. 7/10 4/10 Question: Mark went up in George’s plane . (a) twice (b) only once (c) several times (d) once or twice. Original pred yp = (a) twice Contrast pred yc = (b) only once When George was thirty-five, he bought a small plane and learned to fly it. He soon became very good and R ACE made his plane do all kinds of tricks. George had a friend, whose name was Mark. One day George offered to take Mark up in his plane. Mark thought, "I’ve traveled in a big plane several times, but I’ve never been in a small one, so I’ll go." They went up, and George flew around for half an hour and did all kinds of tricks in the air. When they came down again, Mark was glad to be back safely, and he said to his friend in a shaking voice, "Well, George, thank you very much for those two trips tricks in your plane." George was very surprised and said, "Two trips? tricks." Yes, That’s my first and my last time, George." answered said Mark. Table 3: Examples of edits produced by M I CE. Insertions are bolded in red. Deletions are struck through. yp is the P REDICTOR’s original prediction, and yc the contrast prediction. True labels for original inputs are underlined. input in Table 3, for which the R ACE P REDICTOR yc = positive yc = negative gives an incorrect prediction. In this case, a model Removed Inserted Removed Inserted developer may want to understand why the model 4/10 excellent 10/10 awful ridiculous enjoy 8/10 disappointed got the answer wrong. This setting naturally brings horrible amazing 7/10 1 rise to a contrastive question, i.e., Why did the 4 entertaining 9 4 model predict the wrong choice (“twice”) instead predictable 10 enjoyable annoying of the correct one (“only once”)? Table 4: Top 5 I MDB tokens edited by M I CE at a higher The M I CE edit shown offers insight into this rate than expected given their original frequency (§4.3). question: Firstly, it highlights which part of Results are separated by contrast predictions. the paragraph has an influence on the model prediction—the last few sentences. Secondly, it reveals that a source of confusion is Mark’s joke ative prediction from the P REDICTOR even though about having traveled in George’s plane twice, as the edited text is overwhelmingly positive. We test changing Mark’s dialogue from talking about a this hypothesis by investigating whether numerical “first and...last” trip to a single trip results in a cor- tokens are more likely to be edited by M I CE. rect model prediction. We analyze the edits produced by M I CE (G OLD M I CE edits can also be used to debug model + G RAD) described in §3.1. We limit our analy- capabilities by offering hypotheses about “bugs” sis to a subset of the 5K instances for which the present in models: For instance, the edit in Table edit produced by M I CE has a minimality value of 3 might prompt a developer to investigate whether 0.05, as we are interested in finding simple arti- this P REDICTOR lacks non-literal language under- facts driving the predictions of the I MDB P REDIC - standing capabilities. In the next section, we show TOR ; this subset has 902 instances. We compute how insight from individual M I CE edits can be three metrics for each unique token, i.e., type t: used to uncover a bug in the form of a dataset-level artifact learned by a model. In Appendix D, we fur- p(t) = #_occurrences(t)/ #_all_tokens, ther analyze the debugging utility of M I CE edits pr (t) = #_removals(t)/ #_all_removals, with a P REDICTOR designed to contain a bug. pi (t) = #_insertions(t)/ #_all_insertions, 4.3 Use Case 2: Uncovering Dataset Artifacts and report the tokens with the highest values for Manual inspection of some edits for I MDB suggests the ratios pr (t)/p(t) and pi (t)/p(t). Intuitively, that the I MDB P REDICTOR has learned to rely heav- these tokens are removed/inserted at a higher rate ily on numerical ratings. For instance, in the I MDB than expected given the frequency with which they example in Table 3, the M I CE edit results in a neg- appear in the original I MDB inputs. We exclude 3846
tokens that occur
trolled text generation methods to generate targeted 7 Conclusion counterfactuals and explores their use as test cases and augmented examples in the context of clas- We argue that contrastive edits, which change the sification. Another concurrent work by Wu et al. output of a P REDICTOR to a given contrast pre- (2021) presents P OLYJUICE, a general-purpose, un- diction, are effective explanations of neural NLP targeted counterfactual generator. Very recent work models. We propose M INIMAL C ONTRASTIVE by Sha et al. (2021), introduced after the submis- E DITING (M I CE), a method for generating such sion of M I CE, proposes a method for targeted con- edits. We introduce evaluation criteria for con- trastive editing for Q&A that selects answer-related trastive edits that are motivated by human con- tokens, masks them, and generates new tokens. Our trastive explanations—minimality and fluency— work differs from these works in our novel frame- and show that M I CE edits for the I MDB, N EWS - work for efficiently finding minimal edits (M I CE GROUPS , and R ACE datasets are contrastive, flu- Stage 2) and our use of edits as explanations. ent, and minimal. Through qualitative analysis of M I CE edits, we show that they have utility for Connection to Adversarial Examples Adver- robust and reliable NLP system development. sarial examples are minimally edited inputs that cause models to incorrectly change their predic- tions despite no change in true label (Jia and Liang, 8 Broader Impact Statement 2017; Ebrahimi et al., 2018; Pal and Tople, 2020). Recent methods for generating adversarial exam- M I CE is intended to aid the interpretation of NLP ples also preserve fluency (Zhang et al., 2019; Li models. As a model-agnostic explanation method, et al., 2020b; Song et al., 2020)15 ; however, ad- it has the potential to impact NLP system devel- versarial examples are designed to find erroneous opment across a wide range of models and tasks. change in model outputs; contrastive edits place no In particular, M I CE edits can benefit NLP model such constraint on model correctness. Thus, cur- developers in facilitating debugging and exposing rent approaches to generating adversarial examples, dataset artifacts, as discussed in §4. As a conse- which can exploit semantics-preserving operations quence, they can also benefit downstream users of (Ribeiro et al., 2018) such as paraphrasing (Iyyer NLP models by facilitating access to less biased et al., 2018) or word replacement (Alzantot et al., and more robust systems. 2018; Ren et al., 2019; Garg and Ramakrishnan, While the focus of our work is on interpreting 2020), cannot be used to generate contrastive edits. NLP models, there are potential misuses of M I CE that involve other applications. Firstly, malicious Connection to Style Transfer The goal of style actors might employ M I CE to generate adversarial transfer is to generate minimal edits to inputs to examples; for instance, they may aim to generate result in a target style (sentiment, formality, etc.) hate speech that is minimally edited such that it (Fu et al., 2018; Li et al., 2018; Goyal et al., 2020). fools a toxic language classifier. Secondly, naively Most existing approaches train an encoder to learn applying M I CE for data augmentation could plau- style-agnostic latent representation of inputs and sibly lead to less robust and more biased models: train attribute-specific decoders to generate text Because M I CE edits are intended to expose issues reflecting the content of inputs but exhibiting a in models, straightforwardly using them as addi- different target attribute (Fu et al., 2018; Li et al., tional training examples could reinforce existing 2018; Goyal et al., 2020). Recent works by Wu artifacts and biases present in data. To mitigate et al. (2019) and Malmi et al. (2020) adopt two- this risk, we encourage researchers exploring data stage approaches that first identify where to make augmentation to carefully think about how to select edits and then make them using pretrained language and label edited instances. models. Such approaches can only be applied to We also encourage researchers to develop more generate contrastive edits for classification tasks efficient methods of generating minimal contrastive with well-defined “styles,” which exclude more edits. As discussed in §5, a limitation of M I CE is complex tasks such as question answering. its computational demand. Therefore, we recom- 15 mend that future work focus on creating methods Song et al. (2020) propose a method to produce fluent se- mantic collisions, which they call the “inverse” of adversarial that require less compute. examples. 3848
References 31–36, Melbourne, Australia. Association for Com- putational Linguistics. David Alvarez-Melis, Hal Daumé III, Jennifer Wort- man Vaughan, and Hanna Wallach. 2019. Weight Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hi- of evidence as a basis for human-oriented expla- erarchical neural story generation. In Proceedings nations. In Workshop on Human-Centric Machine of the 56th Annual Meeting of the Association for Learning at the 33rd Conference on Neural Informa- Computational Linguistics (Volume 1: Long Papers), tion Processing Systems (NeurIPS 2019), Vancouver, pages 889–898, Melbourne, Australia. Association Canada. for Computational Linguistics. Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Amir Feder, Nadav Oved, Uri Shalit, and Roi Re- Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. ichart. 2021. CausaLM: Causal Model Explanation 2018. Generating natural language adversarial ex- Through Counterfactual Language Models. Compu- amples. In Proceedings of the 2018 Conference on tational Linguistics, pages 1–54. Empirical Methods in Natural Language Processing, pages 2890–2896, Brussels, Belgium. Association Zhenxin Fu, Xiaoye Tan, Nanyun Peng, Dongyan Zhao, for Computational Linguistics. and Rui Yan. 2018. Style transfer in text: Explo- ration and evaluation. In AAAI Conference on Artifi- Jasmijn Bastings, Wilker Aziz, and Ivan Titov. 2019. cial Intelligence. Interpretable neural predictions with differentiable binary variables. In Proceedings of the 57th Annual Matt Gardner, Yoav Artzi, Victoria Basmov, Jonathan Meeting of the Association for Computational Lin- Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, guistics, pages 2963–2977, Florence, Italy. Associa- Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, tion for Computational Linguistics. Nitish Gupta, Hannaneh Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nel- Yonatan Belinkov and James Glass. 2019. Analysis son F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Methods in Neural Language Processing: A Survey. Singh, Noah A. Smith, Sanjay Subramanian, Reut Transactions of the Association for Computational Tsarfaty, Eric Wallace, Ally Zhang, and Ben Zhou. Linguistics, 7:49–72. 2020. Evaluating models’ local decision boundaries Oana-Maria Camburu, Tim Rocktäschel, Thomas via contrast sets. In Findings of the Association Lukasiewicz, and Phil Blunsom. 2018. e-snli: Nat- for Computational Linguistics: EMNLP 2020, pages ural language inference with natural language expla- 1307–1323. Association for Computational Linguis- nations. In Advances in Neural Information Process- tics. ing Systems, volume 31, pages 9539–9549. Curran Matt Gardner, Joel Grus, Mark Neumann, Oyvind Associates, Inc. Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Shiyu Chang, Yang Zhang, Mo Yu, and Tommi Peters, Michael Schmitz, and Luke S. Zettlemoyer. Jaakkola. 2019. A game theoretic approach to class- 2017. Allennlp: A deep semantic natural language wise selective rationalization. In Advances in Neural processing platform. Information Processing Systems, volume 32, pages 10055–10065. Curran Associates, Inc. Siddhant Garg and Goutham Ramakrishnan. 2020. BAE: BERT-based adversarial examples for text Seth Chin-Parker and Julie A Cantelon. 2017. Con- classification. In Proceedings of the 2020 Confer- trastive constraints guide explanation-based cate- ence on Empirical Methods in Natural Language gory learning. Cognitive Science, 41 6:1645–1655. Processing (EMNLP), pages 6174–6181. Associa- tion for Computational Linguistics. Susanne Dandl, Christoph Molnar, Martin Binder, and Bernd Bischl. 2020. Multi-objective counterfactual Navita Goyal, Balaji Vasan Srinivasan, N. Anand- explanations. In Parallel Problem Solving from Na- havelu, and Abhilasha Sancheti. 2020. Multi- ture – PPSN XVI, pages 448–469, Cham. Springer dimensional style transfer for partially annotated International Publishing. data using language models as discriminators. ArXiv, arXiv:2010.11578. Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Byron C. Wallace. 2020. ERASER: A benchmark to Parikh, and Stefan Lee. 2019. Counterfactual vi- evaluate rationalized NLP models. In Proceedings sual explanations. In Proceedings of the 36th In- of the 58th Annual Meeting of the Association for ternational Conference on Machine Learning, vol- Computational Linguistics, pages 4443–4458. Asso- ume 97 of Proceedings of Machine Learning Re- ciation for Computational Linguistics. search, pages 2376–2384, Long Beach, California, USA. PMLR. Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White-box adversarial exam- Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, ples for text classification. In Proceedings of the and Zeynep Akata. 2018. Grounding visual expla- 56th Annual Meeting of the Association for Compu- nations. In Computer Vision – ECCV 2018, pages tational Linguistics (Volume 2: Short Papers), pages 269–286, Cham. Springer International Publishing. 3849
Denis Hilton. 2017. Social Attribution and Explana- Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. tion. Oxford University Press. Rationalizing neural predictions. In Proceedings of the 2016 Conference on Empirical Methods in Nat- Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and ural Language Processing, pages 107–117, Austin, Yejin Choi. 2020. The curious case of neural text de- Texas. Association for Computational Linguistics. generation. In International Conference on Learn- ing Representations. Chuanrong Li, Lin Shengshuo, Zeyu Liu, Xinyi Wu, Xuhui Zhou, and Shane Steinert-Threlkeld. 2020a. Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Linguistically-informed transformations (LIT): A Zettlemoyer. 2018. Adversarial example generation method for automatically generating contrast sets. with syntactically controlled paraphrase networks. In Proceedings of the Third BlackboxNLP Workshop In Proceedings of the 2018 Conference of the North on Analyzing and Interpreting Neural Networks for American Chapter of the Association for Computa- NLP, pages 126–135, Online. Association for Com- tional Linguistics: Human Language Technologies, putational Linguistics. Volume 1 (Long Papers), pages 1875–1885, New Juncen Li, Robin Jia, He He, and Percy Liang. 2018. Orleans, Louisiana. Association for Computational Delete, retrieve, generate: a simple approach to sen- Linguistics. timent and style transfer. In Proceedings of the 2018 Conference of the North American Chapter of the Alon Jacovi and Y. Goldberg. 2020. Aligning faithful Association for Computational Linguistics: Human interpretations with their social attribution. ArXiv, Language Technologies, Volume 1 (Long Papers), arXiv:2006.01067. pages 1865–1874, New Orleans, Louisiana. Associ- ation for Computational Linguistics. Alon Jacovi, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin Choi, and Yoav Goldberg. 2021. Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, Contrastive explanations for model interpretability. and Xipeng Qiu. 2020b. BERT-ATTACK: Adver- ArXiv:2103.01378. sarial attack against BERT using BERT. In Proceed- ings of the 2020 Conference on Empirical Methods Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, and By- in Natural Language Processing (EMNLP), pages ron C. Wallace. 2020. Learning to faithfully rational- 6193–6202. Association for Computational Linguis- ize by construction. In Proceedings of the 58th An- tics. nual Meeting of the Association for Computational Linguistics, pages 4459–4473. Association for Com- Weixin Liang, James Zou, and Zhou Yu. 2020. ALICE: putational Linguistics. Active learning with contrastive natural language ex- planations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Process- Robin Jia and Percy Liang. 2017. Adversarial exam- ing (EMNLP), pages 4380–4391, Online. Associa- ples for evaluating reading comprehension systems. tion for Computational Linguistics. In Proceedings of the 2017 Conference on Empiri- cal Methods in Natural Language Processing, pages Peter Lipton. 1990. Contrastive explanation. Royal 2021–2031, Copenhagen, Denmark. Association for Institute of Philosophy Supplement, 27:247–266. Computational Linguistics. Y. Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- Amir-Hossein Karimi, G. Barthe, B. Balle, and dar Joshi, Danqi Chen, Omer Levy, M. Lewis, I. Valera. 2020. Model-agnostic counterfactual ex- Luke Zettlemoyer, and Veselin Stoyanov. 2019. planations for consequential decisions. Proceedings RoBERTa: A robustly optimized BERT pretraining of the 23rd International Conference on Artificial In- approach. ArXiv, arXiv:1907.11692. telligence and Statistics (AISTATS). Arnaud Van Looveren and Janis Klaise. 2019. Inter- Divyansh Kaushik, Eduard Hovy, and Zachary Lipton. pretable counterfactual explanations guided by pro- 2020. Learning the difference that makes a differ- totypes. ArXiv, arXiv:1907.02584. ence with counterfactually-augmented data. In Inter- Andrew L. Maas, Raymond E. Daly, Peter T. Pham, national Conference on Learning Representations. Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analy- Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, sis. In Proceedings of the 49th Annual Meeting of and Eduard Hovy. 2017. RACE: Large-scale ReAd- the Association for Computational Linguistics: Hu- ing comprehension dataset from examinations. In man Language Technologies, pages 142–150, Port- Proceedings of the 2017 Conference on Empirical land, Oregon, USA. Association for Computational Methods in Natural Language Processing, pages Linguistics. 785–794, Copenhagen, Denmark. Association for Computational Linguistics. Nishtha Madaan, Inkit Padhi, Naveen Panwar, and Dip- tikalyan Saha. 2021. Generate your counterfactu- Ken Lang. 1995. Newsweeder: Learning to filter net- als: Towards controlled counterfactual generation news. In Proceedings of the Twelfth International for text. In Proceedings of the AAAI Conference on Conference on Machine Learning, pages 331–339. Artificial Intelligence. 3850
Eric Malmi, Aliaksei Severyn, and Sascha Rothe. 2020. Chris Russell. 2019. Efficient search for diverse coher- Unsupervised text style transfer with padded masked ent explanations. In Proceedings of the Conference language models. In Proceedings of the 2020 Con- on Fairness, Accountability, and Transparency. ference on Empirical Methods in Natural Language Processing (EMNLP), pages 8671–8680. Associa- Julian Salazar, Davis Liang, Toan Q. Nguyen, and tion for Computational Linguistics. Katrin Kirchhoff. 2020. Masked language model scoring. In Proceedings of the 58th Annual Meet- Tim Miller. 2019. Explanation in Artificial Intelli- ing of the Association for Computational Linguis- gence: Insights from the social sciences. Artificial tics, pages 2699–2712. Association for Computa- Intelligence, 267:1–38. tional Linguistics. Sharan Narang, Colin Raffel, Katherine Lee, Adam Roberts, Noah Fiedel, and Karishma Malkan. 2020. Lei Sha, Patrick Hohenecker, and Thomas Lukasiewicz. WT5?! training text-to-text models to explain their 2021. Controlling text edition by changing answers predictions. arXiv:2004.14546. of specific questions. ArXiv:2105.11018. B. Pal and S. Tople. 2020. To transfer or not to trans- Karen Simonyan, Andrea Vedaldi, and Andrew Zisser- fer: Misclassification attacks against transfer learned man. 2014. Deep inside convolutional networks: Vi- text classifiers. ArXiv, arXiv:2001.02438. sualising image classification models and saliency maps. In 2nd International Conference on Learn- Judea Pearl. 2009. Causality: Models, Reasoning and ing Representations, ICLR 2014, Banff, AB, Canada, Inference, 2nd edition. Cambridge University Press, April 14-16, 2014, Workshop Track Proceedings. USA. Jacob Sippy, Gagan Bansal, and Daniel S. Weld. 2020. Colin Raffel, Noam Shazeer, Adam Roberts, Kather- Data staining: A method for comparing faithfulness ine Lee, Sharan Narang, Michael Matena, Yanqi of explainers. In 2020 ICML Workshop on Human Zhou, Wei Li, and Peter J. Liu. 2020. Exploring Interpretability in Machine Learning (WHI 2020). the limits of transfer learning with a unified text-to- text transformer. Journal of Machine Learning Re- D. Smilkov, Nikhil Thorat, Been Kim, F. Viégas, and search, 21(140):1–67. M. Wattenberg. 2017. Smoothgrad: removing noise Nazneen Fatema Rajani, Bryan McCann, Caiming by adding noise. In ICML Workshop on Visualiza- Xiong, and Richard Socher. 2019. Explain yourself! tion for Deep Learning. leveraging language models for commonsense rea- soning. In Proceedings of the 57th Annual Meet- Congzheng Song, Alexander Rush, and Vitaly ing of the Association for Computational Linguis- Shmatikov. 2020. Adversarial semantic collisions. tics, pages 4932–4942, Florence, Italy. Association In Proceedings of the 2020 Conference on Empirical for Computational Linguistics. Methods in Natural Language Processing (EMNLP), pages 4198–4210, Online. Association for Computa- D Raj Reddy. 1977. Speech understanding systems: A tional Linguistics. summary of results of the five-year research effort. Kaiser Sun and Ana Marasović. 2021. Effective atten- Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che. tion sheds light on interpretability. In Findings of 2019. Generating natural language adversarial ex- ACL. amples through probability weighted word saliency. In Proceedings of the 57th Annual Meeting of the Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Association for Computational Linguistics, pages Axiomatic attribution for deep networks. In Pro- 1085–1097, Florence, Italy. Association for Compu- ceedings of the 34th International Conference on tational Linguistics. Machine Learning - Volume 70, page 3319–3328. JMLR.org. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "why should i trust you?": Explain- Alon Talmor, Jonathan Herzig, Nicholas Lourie, and ing the predictions of any classifier. Proceedings of Jonathan Berant. 2019. CommonsenseQA: A ques- the 22nd ACM SIGKDD International Conference tion answering challenge targeting commonsense on Knowledge Discovery and Data Mining. knowledge. In Proceedings of the 2019 Conference Marco Tulio Ribeiro, Sameer Singh, and Carlos of the North American Chapter of the Association Guestrin. 2018. Semantically equivalent adversar- for Computational Linguistics: Human Language ial rules for debugging NLP models. In Proceedings Technologies, Volume 1 (Long and Short Papers), of the 56th Annual Meeting of the Association for pages 4149–4158, Minneapolis, Minnesota. Associ- Computational Linguistics (Volume 1: Long Papers), ation for Computational Linguistics. pages 856–865, Melbourne, Australia. Association for Computational Linguistics. Damien Teney, Ehsan Abbasnejad, and A. V. D. Hen- gel. 2020. Learning what makes a difference from Mireia Ribera and Àgata Lapedriza. 2019. Can We Do counterfactual examples and gradient supervision. Better Explanations? A Proposal of User-Centered In Proceedings of the European Conference on Com- Explainable AI. In ACM IUI Workshop. puter Vision (ECCV). 3851
Bas C Van Fraassen. 1980. The scientific image. Ox- and the 9th International Joint Conference on Natu- ford University Press. ral Language Processing (EMNLP-IJCNLP), pages 4094–4103, Hong Kong, China. Association for Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Computational Linguistics. Sharon Qian, Daniel Nevo, Yaron Singer, and Stu- art Shieber. 2020. Investigating gender bias in lan- Omar Zaidan, Jason Eisner, and Christine Piatko. 2007. guage models using causal mediation analysis. In Using “annotator rationales” to improve machine Advances in Neural Information Processing Systems, learning for text categorization. In Human Lan- volume 33, pages 12388–12401. Curran Associates, guage Technologies 2007: The Conference of the Inc. North American Chapter of the Association for Com- putational Linguistics; Proceedings of the Main S. Wachter, Brent D. Mittelstadt, and Chris Russell. Conference, pages 260–267, Rochester, New York. 2017. Counterfactual explanations without opening Association for Computational Linguistics. the black box: Automated decisions and the gdpr. European Economics: Microeconomics & Industrial Rowan Zellers, Yonatan Bisk, Ali Farhadi, and Yejin Organization eJournal. Choi. 2019. From Recognition to Cognition: Visual Commonsense Reasoning. In CVPR. Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not explanation. In Proceedings of the 2019 Con- Huangzhao Zhang, Hao Zhou, Ning Miao, and Lei Li. ference on Empirical Methods in Natural Language 2019. Generating fluent adversarial examples for Processing and the 9th International Joint Confer- natural languages. In Proceedings of the 57th An- ence on Natural Language Processing (EMNLP- nual Meeting of the Association for Computational IJCNLP), pages 11–20, Hong Kong, China. Associ- Linguistics, pages 5564–5569, Florence, Italy. Asso- ation for Computational Linguistics. ciation for Computational Linguistics. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pier- ric Cistac, Tim Rault, Rémi Louf, Morgan Funtow- icz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-art natural language pro- cessing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Asso- ciation for Computational Linguistics. Tongshuang Wu, Marco Túlio Ribeiro, J. Heer, and Daniel S. Weld. 2021. Polyjuice: Auto- mated, general-purpose counterfactual generation. arXiv:2101.00288. Xing Wu, Tao Zhang, Liangjun Zang, Jizhong Han, and Songlin Hu. 2019. Mask and infill: Apply- ing masked language model for sentiment transfer. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI- 19, pages 5271–5277. International Joint Confer- ences on Artificial Intelligence Organization. Linyi Yang, Eoin Kenny, Tin Lok James Ng, Yi Yang, Barry Smyth, and Ruihai Dong. 2020. Generating plausible counterfactual explanations for deep trans- formers in financial text classification. In Proceed- ings of the 28th International Conference on Com- putational Linguistics, pages 6150–6160, Barcelona, Spain (Online). International Committee on Compu- tational Linguistics. Mo Yu, Shiyu Chang, Yang Zhang, and Tommi Jaakkola. 2019. Rethinking cooperative rationaliza- tion: Introspective extraction and complement con- trol. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing 3852
You can also read