Sustainable Modular Debiasing of Language Models
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Sustainable Modular Debiasing of Language Models Anne Lauscher,1∗† Tobias Lüken,2∗ Goran Glavaš2 1 MilaNLP, Bocconi University, Via Sarfatti 25, 20136 Milan, Italy 2 Data and Web Science Group, University of Mannheim, B 6, 26, 68159 Mannheim, Germany anne.lauscher@unibocconi.it, tlueken@mail.uni-mannheim.de, goran@informatik.uni-mannheim.de Abstract inter alia). The reason for this lies in the distribu- tional nature of these models: human-produced Unfair stereotypical biases (e.g., gender, racial, corpora on which these models are trained are or religious biases) encoded in modern pre- abundant with stereotypically biased concept co- trained language models (PLMs) have nega- arXiv:2109.03646v1 [cs.CL] 8 Sep 2021 tive ethical implications for widespread adop- occurrences (for instance, male terms like man or tion of state-of-the-art language technology. son appear more often together with certain ca- To remedy for this, a wide range of debiasing reer terms like doctor or programmer than female techniques have recently been introduced to terms like women or daughter) and the PLMs mod- remove such stereotypical biases from PLMs. els, being trained with language modeling objec- Existing debiasing methods, however, directly tives, consequently encode these biased associa- modify all of the PLMs parameters, which tions in their parameters. While this effect can lend – besides being computationally expensive – comes with the inherent risk of (catastrophic) itself to diachronic analysis of societal biases (e.g., forgetting of useful language knowledge ac- Garg et al., 2018; Walter et al., 2021), it represents quired in pretraining. In this work, we pro- stereotyping, one of the main types of representa- pose a more sustainable modular debiasing ap- tional harm (Blodgett et al., 2020) and, if unmiti- proach based on dedicated debiasing adapters, gated, may cause severe ethical issues in various dubbed A DELE. Concretely, we (1) inject sociotechnical deployment scenarios. adapter modules into the original PLM layers To alleviate this problem and ensure fair lan- and (2) update only the adapters (i.e., we keep guage technology, previous work introduced a wide the original PLM parameters frozen) via lan- guage modeling training on a counterfactually range of bias mitigation methods (e.g., Bordia augmented corpus. We showcase A DELE in and Bowman, 2019; Dev et al., 2020; Lauscher gender debiasing of BERT: our extensive eval- et al., 2020a, inter alia). All existing debiasing uation, encompassing three intrinsic and two approaches, however, modify all parameters of extrinsic bias measures, renders A DELE very the PLMs which has two prominent shortcom- effective in bias mitigation. We further show ings: (1) it comes with a high computational cost1 that – due to its modular nature – A DELE, cou- and (2) can lead to (catastrophic) forgetting (Mc- pled with task adapters, retains fairness even after large-scale downstream training. Finally, Closkey and Cohen, 1989; Kirkpatrick et al., 2017) by means of multilingual BERT, we success- of the useful distributional knowledge obtained dur- fully transfer A DELE to six target languages. ing pretraining. For example, Webster et al. (2020) incorporate counterfactual debiasing already into 1 Introduction BERT’s pretraining: this implies a debiasing frame- work in which a separate “debiased BERT” in- Recent work has shown that pretrained language stance needs to be trained from scratch for each models such as ELMo (Peters et al., 2018), individual bias type and specification. In sum, cur- BERT (Devlin et al., 2019), or GPT-2 (Radford rent debiasing procedures designed for pretraining et al., 2019) tend to exhibit a range of stereotypical or full fine-tuning of PLMs have a large carbon societal biases, such as racism and sexism (e.g., footprint (Strubell et al., 2019) and consequently Kurita et al., 2019; Dev et al., 2020; Webster et al., 1 2020; Nangia et al., 2020; Barikeri et al., 2021, While a full fine-tuning approach to PLM debiasing may still be feasible for moderate-sized PLMs like BERT (Devlin ∗ Equal contribution. et al., 2019), it is prohibitively computationally expensive for † Most of the work was conducted while Anne Lauscher giant language models like GPT-3 (Brown et al., 2020) or was employed at the University of Mannheim. GShard (Lepikhin et al., 2020).
jeopardize the sustainability (Moosavi et al., 2020) (3) fully preserving the distributional knowledge of fair representation learning in NLP. acquired in the pretraining. To meet all three cri- In this work, we move towards more sustain- teria, we propose debiasing based on the popu- able removal of stereotypical societal biases from lar adapter modules (Houlsby et al., 2019; Pfeif- pretrained language models. To this end, we fer et al., 2020a). Adapters are lightweight neu- propose A DELE (Adapter-based DEbiasing of ral components designed for parameter-efficient LanguagE Models), a debiasing approach based on fine-tuning of PLMs, injected into the PLM layers. the the recently proposed modular adapter frame- In downstream fine-tuning, all original PLM pa- work (Houlsby et al., 2019; Pfeiffer et al., 2020a). rameters are kept frozen and only the adapters are In A DELE, we inject additional parameters, the so- trained. Because adapters have fewer parameters called adapter layers into the layers of the PLM than the original PLM, adapter-based fine-tuning and incorporate the “debiasing” knowledge only in is more computationally efficient. And since fine- those parameters, without changing the pretrained tuning does not update the PLM’s original parame- knowledge in the PLM. We show that, while be- ters, all distributional knowledge is preserved. ing substantially more efficient (i.e., sustainable) The debiasing adapters could, in principle, be than existing state-of-the-art debiasing approaches, trained using any of the debiasing strategies and A DELE is just as effective in bias attenuation. training objectives from the literature, e.g., via ad- ditional debiasing loss objectives Qian et al. (2019); Contributions. The contributions of this work Bordia and Bowman (2019); Lauscher et al. (2020a, are three-fold: (i) we first present A DELE, our inter alia) or data-driven approaches such as Coun- novel adapter-based framework for parameter- terfactual Data Augmentation (Zhao et al., 2018). efficient and knowledge-preserving debiasing of For simplicity, we opt for the data-driven CDA PLMs. We combine A DELE with one of the approach: it has been shown to offer reliable de- most effective debiasing strategies, Counterfactual biasing performance (Zhao et al., 2018; Webster Data Augmentation (CDA; Zhao et al., 2018), and et al., 2020) and, unlike other approaches, it does demonstrate its effectiveness in gender-debiasing not require any modifications of the model archi- of BERT (Devlin et al., 2019), the most widely tecture nor training procedure. used PLM. (ii) We benchmark A DELE in what is arguably the most comprehensive set of bias mea- 2.1 Debiasing Adapters sures and data sets for both intrinsic and extrin- In this work, we employ the simple adapter archi- sic evaluation of biases in representation spaces tecture proposed by Pfeiffer et al. (2021), in which spanned by PLMs. Additionally, we study a previ- only one adapter module is added to each layer of ously neglected effect of fairness forgetting present the pretrained Transformer, after the feed-forward when debiased PLMs are subjected to large-scale sub-layer. The more widely used architecture of downstream training for specific tasks (e.g., natural Houlsby et al. (2019) inserts two adapter mod- language inference, NLI); we show that A DELE’s ules per Transformer layer, with the other adapter modular nature allows to counter this undesirable injected after the multi-head attention sublayer. effect by stacking a dedicated task adapter on top of We opt for the “Pfeiffer architecture” because in the debiasing adapter. (iii) Finally, we successfully comparison with the “Houlsby architecture” it is transfer A DELE’s debiasing effects to six other lan- more parameter-efficient and has been shown to guages in a zero-shot manner, i.e., without rely- yield slightly better performance on a wide range ing on any debiasing data in the target languages. of downstream NLP tasks (Pfeiffer et al., 2020a, We achieve this by training the debiasing adapter 2021). The output of the adapter, a two-layer feed- stacked on top of the multilingual BERT on the forward network, is computed as follows: English counterfactually augmented dataset. 2 A DELE: Adapter-Based Debiasing Adapter(h, r) = U · g(D · h) + r, (1) In this work, we seek to fulfill the following three with h and r as the hidden state and residual of desiderata: (1) we want to achieve effective de- the respective Transformer layer. D ∈ Rm×h and biasing, comparable to that of existing state-of- U ∈ Rh×m are the linear down- and up-projections, the-art debiasing methods while (2) keeping the respectively (h being the Transformer’s hidden size, training costs of debiasing significantly lower; and and m the adapter’s bottleneck dimension), and g(·)
is a non-linear activation function. The residual r is (t1 , t2 ), we check whether either t1 or t2 occur in the output of the Transformer’s feed-forward layer s: if t1 is present, we replace its occurrence with t2 whereas h is the output of the subsequent layer nor- and vice versa. We denote the counterfactual sen- malization. The down-projection D compresses to- tence of s obtained this way with s0 and the whole ken representations to the adapter size m < h, and counterfactual corpus with S 0 . We adopt the so- the up-projection U projects the activated down- called two-sided CDA from (Webster et al., 2020): projections back to the Transformer’s hidden size the final corpus for debiasing training consists of h. The ratio h/m captures the factor by which both the original and counterfactually created sen- the adapter-based fine-tuning is more parameter- tences. Finally, we train the debiasing adapter via efficient than full fine-tuning of the Transformer. masked language modeling on the counterfactually In our case, we train the adapters for debias- augmented corpus S ∪ S 0 . We train sequentially by ing: we inject adapter layers into BERT (Devlin first exposing the adapter to the original corpus S et al., 2019), freeze the original BERT’s parameters, and then to the augmented portion S 0 . and run a standard debiasing training procedure – language modeling on counterfactual data (§2.2) – 3 Experiments during which we only tune the parameters of the We showcase A DELE for arguably the most ex- debiasing adapters. At the end of the debiasing plored societal bias – gender bias – and the most training, the debiasing functionality is isolated into widely used PLM, BERT. We profile its debiasing the adapter parameters. This not only preserves the effects with a comprehensive set of intrinsic and distributional knowledge in the Transformer’s orig- downstream (i.e., extrinsic) evaluations. inal parameters, but also allows for more flexibility and “on-demand” usage of the debiasing function- 3.1 Evaluation Data Sets and Measures ality in downstream applications. For example, We test A DELE on three intrinsic (BEC-Pro, DisCo, one could train a separate set of debiasing adapters WEAT) and two downstream debiasing bench- for each bias dimension of interest (e.g., gender, marks (Bias-STS-B and Bias-NLI). We now de- race, religion, sexual orientation) and selectively scribe each of the benchmarks in more detail. combine them in downstream tasks, depending on Bias Evaluation Corpus with Professions (BEC- the constraints and requirements of the concrete Pro). We intrinsically evaluate A DELE on the sociotechnical environment. BEC-Pro data set (Bartl et al., 2020), designed to capture gender bias w.r.t. professions. The data set 2.2 Counterfactual Augmentation Training consists of 2,700 sentence pairs in the format (“m In the context of representation debiasing, coun- [temp] p”; “f [temp] p”), where m is a male term terfactual data augmentation (CDA) refers to the (e.g., boy, groom), f is a female term (e.g., girl, automatic creation of text instances that in some bride), p is a profession term (e.g., mechanic, doc- way counter the stereotypical bias present in the tor), and [temp] is one of the predefined connecting representation space. CDA has been successfully templates, e.g., “is a” or “works as a”. used for attenuating a variety of bias types, e.g., We measure the bias on BEC-Pro using the bias gender and race, and in several variants, e.g., with measure of Kurita et al. (2019). They compute the general terms describing dominant and minoritized association at,p between a gender term t (male or groups, or with personal names acting as proxies female) and a profession p as: for such groups (Zhao et al., 2018; Lu et al., 2020). P (t)t Most commonly, CDA modifies the training data by at,p = log , (2) P (t)t,p replacing terms describing one of the target groups (dominant or minoritized) with terms describing where P (t)t is the probability of the PLM generat- the other group. Let S be our training corpus, con- ing the target term t when only t itself is masked, sisting of sentences s and let T = {(t1 , t2 )i }N i=1 be and P (t)t,p is the probability of t being generated a set of N term pairings between the dominant and when both t and the profession p are masked. The minoritized group (i.e., t1 is a term representing bias score b is then simply a difference in the as- the dominant group, e.g., man, and t2 is a corre- sociation score between the male term m and its sponding term representing the minoritized group, corresponding female term f : b = am,p − af,p . e.g., woman). For each sentence si and each pair We measure the overall bias on the whole dataset
in two complementary ways: (a) by averaging the rank in the two lists (e.g., Liam and Olivia). Fi- bias scores b across all 2,700 instances (∅ bias) and nally, we remove pairs with ambiguous names that (b) by measuring the percentage of instances for may also be used as general concepts (e.g., violet, which b is below some threshold value: we report a color), resulting in final 92 pairs. this score for two different thresholds (0.1 and 0.7). Word Embedding Association Test (WEAT). Bartl et al. (2020) additionally published a Ger- As the final intrinsic measure, we use the well- man version of the BEC-Pro data set, which we use known WEAT (Caliskan et al., 2017) test. Devel- to evaluate A DELE’s zero-shot transfer abilities. oped for detecting biases in static word embedding Discovery of Correlations (DisCo). The sec- spaces, it computes the differential association be- ond data set for intrinsic debiasing evaluation, tween two target term sets A (e.g., male terms) and DisCo (Webster et al., 2020), also relies on tem- B (e.g., female terms) based on the mean (cosine) plates (e.g., “[PERSON] studied [BLANK] at col- similarity of their embeddings with embeddings lege”). For each template, the [PERSON] slot is of terms from two attribute sets X (e.g., science filled first with a male and then with a female term terms) and Y (e.g., art terms): (e.g., for the pair (John, Veronica), we get John studied [BLANK] at college and Veronica studied X X w(A, B, X, Y ) = s(a, X, Y ) − s(b, X, Y ) . (5) [BLANK] at college). Next, for each of the two a∈A b∈B instances, the model is asked to fill the [BLANK] The association s of term t ∈ A or t ∈ B is com- slot: the goal is to determine the difference in the puted as: probability distribution for the masked token, de- 1 X 1X pending on which term is inserted in the [PERSON] s(t,X,Y )= cos(t, x) − cos(t, y) . (6) |X|x∈X |Y |y∈Y slot. While Webster et al. (2020) retrieve the top three most likely terms for the masked position, we The significance of the statistic is computed with retrieve all terms t with the probability p(t) > 0.1.2 a permutation test in which s(A, B, X, Y ) is com- (i) (i) Let Cm and Cf be the candidate sets obtained pared with the scores s(A∗ , B ∗ , X, Y ) where A∗ for the i-th instance when filled with a male [PER- and B ∗ are equally sized partitions of A ∪ B. We SON] term m and the corresponding female term f , report the effect size, a normalized measure of sep- respectively. We then compute two different mea- aration between the association distributions: sures. The first is the average fraction of shared candidates between the two sets (∅frac): µ({s(a, X, Y )}a∈A ) − µ({s(b, X, Y )}b∈B ) , (7) σ ({s(t, X, Y )}t∈A∪B ) N (i) (i) 1 X |Cm ∩ Cf | ∅frac = (i) (i) , (3) where µ is the mean and σ is the standard deviation. N i min (|Cm |, |Cf |) Since WEAT requires word embeddings as in- with N as the total number of test instances. Intu- put, we first have to extract word-level vectors from itively, a higher average fraction of shared candi- a PLM like BERT. To this end, we follow Vulić dates indicates lower bias. et al. (2020) and obtain a vector xi ∈ Rd for each For the second measure, we retrieve the proba- word wi (e.g., man) from the bias specification as bilities p(t) for all candidates t in the union of two follows: we prepend the word with the BERT’s se- (i) (i) quence start token and append it with the separator sets C (i) = Cm ∪ Cf . We then compute the nor- malized average absolute probability difference: token (e.g., [CLS] man [SEP]). We then feed the input sequence through the Transformer and N 1 X P t∈Ci |pm (t) − pf (t)| compute xi as the average of the term’s represen- ∅diff= P P . (4) tations from layers m : n. We experimented with N i ( t∈C (i) pm (t) + t∈C (i) pf (t))/2 m m inducing word-level embeddings by averaging rep- We create test instances by collecting 100 most resentations over all consecutive ranges of layers frequent baby names for each gender from the US [m : n], m ≤ n. We measure the gender bias using Social Security name statistics for 2019.3 We cre- the test WEAT 7 (see the full specification in the ate pairs (m, f ) from names at the same frequency Appendix), which compares male terms (e.g., man, 2 boy) against female terms (e.g., woman, girl) w.r.t. We argue that retrieving more terms from the distribution allows for a more accurate estimate of the bias. associations to science terms (e.g., math, algebra, 3 https://www.ssa.gov/oact/babynames/limits.html numbers) and art terms (e.g., poetry, dance, novel).
Lauscher and Glavaš (2019) created XWEAT stances. Following the original work, we compute by translating some of the original WEAT bias two bias scores: (1) the fraction neutral (FN) score specifications to six target languages: German (DE), is the percentage of instances for which the model Spanish (ES), Italian (IT), Croatian (HR), Russian predicts the NEUTRAL class; (2) net neutral (NN) (RU), and Turkish (TR). We use their translations of score is the average probability that the model as- the WEAT 7 gender test in the zero-shot debiasing signs to the NEUTRAL class across all instances. transfer evaluation of ADELE. In both cases, the higher score corresponds to a lower bias. We couple FN and NN on Bias-NLI Bias-STS-B. The first extrinsic measure we with the actual NLI accuracy on the MNLI matched use is Bias-STS-B, introduced by Webster et al. development set (Williams et al., 2018). (2020), based on the well-known Semantic Textual Similarity-Benchmark (STS-B; Cer et al., 2017), 3.2 Experimental Setup a regression task where models need to predict se- Data. Aligned with BERT’s pretraining, we carry mantic similarity for pairs of sentences. Webster out the debiasing MLM training on the concatena- et al. (2020) adapt STS-B for discovering gender- tion of the English Wikipedia and the BookCor- biased correlations. They start from neutral STS pus (Zhu et al., 2015). Since we are only training templates and fill them with a gendered term (man, the parameters of the debiasing adapters, we uni- woman) and a profession term from (Rudinger formly subsample the corpus to one third of its et al., 2018) (e.g., A man is walking vs. A nurse original size. We adopt the set of gender term is walking and A woman is walking vs. A nurse pairs T for CDA from Zhao et al. (2018) (e.g., is walking). The dataset consists of 16,980 such actor-actress, bride-groom)4 and augment it with pairs. As a measure of bias, we compute the av- three additional pairs: his-her, himself -herself, and erage absolute difference between the similarity male-female, resulting with the total of 193 term scores of male and female sentence pairs, with a pairs. Our final debiasing CDA corpus consists of lower value corresponding to less bias. We couple 105,306,803 sentences. the bias score with the actual STS task performance score (Pearson correlation with human similarity Models and Baselines. In all experiments we in- scores), measured on the STS-B development set. ject A DELE adapters of bottleneck size m = 48 into the pretrained BERT Base Transformer (12 lay- Bias-NLI. We select the task of understanding ers, 12 attention heads, 768 hidden size).5 We com- biased natural language inferences (NLI) as the sec- pare A DELE with the debiased BERT Large models ond extrinsic evaluation. To this end, we fine-tune released by Webster et al. (2020): (1) ZariCDA is the original BERT as well as our adapter-debiased counterfactually pretrained (from scratch); whereas BERT on the MNLI data set (Williams et al., 2018). (2) ZariDO was post-hoc MLM-fine-tuned on regu- For evaluation, we follow Dev et al. (2020), and lar corpora, but with more aggressive dropout rates. create a synthetic NLI data set that tests for the In cross-lingual zero-shot transfer experiments, we gender-occupation bias: it comprises NLI instances train A DELE on top of multilingual BERT (Devlin for which an unbiased model should not be able et al., 2019) in its base configuration (uncased, 12 to infer anything, i.e., it should predict the NEU - layers, 768 hidden size). TRAL class. We use the code of Dev et al. (2020) and, starting from the generic template The a/an , fill the slots with MLM procedure for BERT training and mask 15% term sets provided with the code. First, we fill the of the tokens. We then train A DELE’s debiasing verb and object slots with common activities, e.g., adapters on our CDA data set for 2 epochs, with a “bought a car”. We then create neutral entailment batch size of 16. We optimize the adapter param- pairs by filling the subject slot with an occupation eters using the Adam algorithm (Kingma and Ba, term, e.g., “physician”, for the hypothesis and a 2015), with the constant learning rate of 3 · 10−5 . gendered term, e.g., “woman”, for the premise, 4 resulting in the final instance: (woman bought a https://github.com/uclanlp/corefBias/ car, physician bought a car, NEUTRAL). Using the tree/master/WinoBias/wino 5 We implement A DELE using the Huggingface tranformers code and terms released by Dev et al. (2020), we library (Wolf et al., 2020) in combination with the AdapterHub produce the total of N = 1, 936, 512 Bias-NLI in- framework (Pfeiffer et al., 2020a).
Downstream Fine-tuning. Our two extrinsic [12:12]). For A DELE, we get the most gender- evaluations require task-specific fine-tuning on neutral embeddings by aggregating representations the STS-B and MNLI training datasets, respec- from lower layers (e.g., [0:3] or [1:3]); representa- tively. We couple BERT (with and without tions from higher layers (e.g., [6:12]) flip the bias A DELE adapters) with the standard single-layer into the opposite direction (blue color). Both Zari feed-forward softmax classifier and fine-tune all models produce embeddings which are relatively parameters in task-specific training.6 We optimize unbiased, but ZariCDA still exhibits slight gender the hyperparameters on the respective STS-B and bias in higher layer representations. The dropout- MNLI (matched) development sets. To this end, we based debiasing of ZariDO results in an interesting search for the optimal number of training epochs per-layer-region oscillating gender bias. in {2, 3, 4} and fix the learning rate to 2 · 10−5 , maximum sequence length to 128, and batch size Zero-Shot Cross-Lingual Transfer. We show to 32. Like in debiasing training, we use Adam the results of zero-shot transfer of gender debias- (Kingma and Ba, 2015) for optimization. ing with A DELE (on top of mBERT) on German BEC-Pro in Table 2. On the E N BEC-Pro portion 4 Results and Discussion A DELE is as effective on top of mBERT as it is on top of the E N BERT (see Table 1): it reduces Monolingual Evaluation. Our main monolin- mBERT’s bias from 0.81 to 0.3. More importantly, gual English debiasing results on three intrinsic the positive debiasing effect successfully transfers and two extrinsic benchmarks are summarized in to German: the bias effect on the DE portion is Table 1. The results show that (1) A DELE suc- reduced from 1.1 to 0.67, despite not using any cessfully attenuates BERT’s gender bias across the German data in the training of debiasing adapters. board, and (2) it is, in many cases, more effective in We also see an improvement with respect to the attenuating gender biases than the computationally fraction of unbiased instances for both thresholds, much more intensive Zari models (Webster et al., expectedly with larger improvements for the more 2020). In fact, on BEC-Pro and DisCo A DELE lenient threshold of 0.7. substantially outperforms both Zari variants. In Table 3, we show the bias effects of static The results from two extrinsic evaluations – STS word embeddings, aggregated from layers of and NLI – demonstrate that A DELE successfully mBERT and A DELE-debiased mBERT, on the attenuates the bias, while retaining the high task XWEAT gender-bias test 7 for six different target performance. Zari variants yield slightly better task languages. We show the results for two aggregation performance for both STS-B and MNLI: this is strategies, including ([0:12]) and excluding ([1:12]) expected, as they are instances of the BERT Large mBERT’s (sub)word embedding layer. Transformer with 336M parameters; in comparison, Like BEC-Pro, WEAT confirms that A DELE also A DELE has only 110M parameters of BERT Base attenuates the bias in E N representations coming and approx. 885K adapter parameters.7 from mBERT. The results across the six target lan- According to WEAT evaluation on static em- guages are somewhat mixed, but overall encour- beddings extracted from BERT (§3.1), the original aging: for all significantly biased combinations BERT Transformer is only slightly and insignifi- of languages and layer aggregations from original cantly biased. Consequently, A DELE inverts the mBERT ([0:12] – IT, RU; [1:12] – HR, RU), A DELE bias in the opposite direction. In Figure 1, we successfully reduces the bias. E.g., for IT embed- further analyze the WEAT bias effects w.r.t. the dings extracted from all layers ([0:12]), the bias subset of BERT layers from which we aggregate effect size drops from significant 1.02 to insignifi- the word embeddings. For the original BERT (Fig- cant −0.25. In case of already insignificant biases ure 1a), we obtain the gender unbiased embeddings in original mBERT, A DELE often further reduces if we aggregate representations from higher layers the bias effect size (DE, TR) and if not, the bias (e.g., [5:12], [6:9], or by taking final layer vectors, effects remain insignificant. 6 The only exception is the fairness forgetting experiment in We additionally visualize all XWEAT bias effect §4, in which we freeze both the Transformer and the debiasing sizes in the produced embeddings via heatmaps adapters and train the dedicated task adapter on top. in Figure 2. The intuition we can get from the 7 A DELE adds 884,736 parameters to BERT Base: 12 (lay- ers) × 2 (down-projection and up-projection matrix) × 768 plots supports our conclusion: for all languages, (hidden size h of BERT Base) × 48 (bottleneck size m). especially for the source language EN and the tar-
WEAT T7 BEC-Pro DisCo (names) STS NLI Model e[0:12]↓ ∅ bias↓ t(0.1)↑ t(0.7)↑ ∅ frac↑ ∅ diff↓ ∅ diff↓ Pear↑ FN↑ NN↑ Acc↑ BERT 0.79* 1.33 0.05 0.37 0.8112 0.5146 0.313 88.78 0.0102 0.0816 84.77 ZariCDA 0.43* 1.11 0.07 0.45 0.7527 0.6988 0.087 89.37 0.1202 0.1628 85.52 ZariDO 0.23* 1.20 0.07 0.38 0.6422 0.9352 0.118 88.22 0.1058 0.1147 86.06 A DELE -0.98 0.39 0.17 0.85 0.8862 0.3118 0.121 88.93 0.1273 0.1726 84.13 Table 1: Results of our monolingual gender bias evaluation. We report WEAT effect size (e), BEC-Pro average bias (∅ bias) and fraction of biased instances at thresholds 0.1 and 0.7, DisCo average fraction (∅ frac) and average difference (∅ diff), STS average similarity difference (∅ diff) and Pearson correlation (Pear), and Bias- NLI fraction neutral (FN) and net neutral (NN) scores as well as MNLI-m accuracy (Acc) for three models: original BERT, ZariCDA and ZariDO (Webster et al., 2020), and A DELE. ↑: higher is better (lower bias); ↓: lower is better. 0 0 1 1 2 12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 2 3 3 4 1.0 1.0 1.0 4 1.0 5 5 6 6 7 7 8 0.5 0.5 0.5 8 0.5 9 9 10 10 11 11 12 0.0 n 0.0 0.0 12 0.0 n 13 n n 13 14 14 15 0.5 0.5 0.5 15 0.5 16 16 17 17 18 1.0 1.0 18 1.0 19 1.0 19 20 20 21 21 22 22 23 23 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 m m m m (a) BERTBase . (b) BERTA DELE . (c) ZariCDA . (d) ZariDO . Figure 1: WEAT bias effect heatmaps for (a) original BERTBase , and the debiased BERTs, (b) BERTA DELE , (c) ZariCDA (Webster et al., 2020), and (d) ZariCDA , for word embeddings averaged over different subsets of layers [m : n]. E.g., [0 : 0] points to word embeddings directly obtained from BERT’s (sub)word embeddings (layer 0); [1 : 7] indicates word vectors obtained by averaging word representations after Transformer layers 1 through 7. EN DE large-scale fine-tuning in downstream tasks. Web- Model ∅ bias t(0.1) t(0.7) ∅ bias t(0.1) t(0.7) ster et al. (2020) report the presence of debiasing mBERT 0.81 0.08 0.55 1.10 0.08 0.39 effects after STS-B training. With merely 5,749 mBERTA 0.30 0.23 0.93 0.67 0.11 0.62 training instances, however, STS-B is two orders of magnitude smaller than MNLI (392,702 train- Table 2: Results for mBERT and mBERT debiased on ing instances). Here we conduct a study on MNLI, EN data with A DELE on BEC-Pro English and German. testing for the presence of the gender bias in Bias- We report the average bias (∅ bias) and the fraction of NLI after A DELE’s exposure to varying amount biased instances for thresholds t(0.1) and t(0.7). of MNLI training data. We fully fine-tune BERT Layers Model EN DE ES IT HR RU TR Base and BERTA DELE (i.e., BERT augmented with 0:12 mBERT 1.42 0.59* -0.47* 1.02 -0.57* 1.49 -0.55* debiasing adapters) on MNLI datasets of varying mBERTA 0.20* -0.04* -0.49* -0.25* 0.72* 1.24 -0.33* sizes (10K, 25K, 75K, 100K, 150K, and 200K) and mBERT 1.36 0.62* -0.55* -0.55* 1.08 0.62 -0.61* 1:12 mBERTA -0.08 -0.05* -0.63* -0.63* 0.79* -0.05 -0.34* measure, for each model, the Bias-NLI net neu- tral (NN) score as well as the NLI accuracy on the Table 3: XWEAT effect sizes for original mBERT and MNLI (matched) development set. For each model zero-shot cross-lingual debiasing transfer of A DELE and each training set size, we carry out five training (mBERTA ) from EN to six target languages. Results runs and report the average scores. for two variants of embedding aggregation over Trans- former layers: [1:12] – all Tranformer layers; [0:12] – Figure 3 summarizes the results of our fairness all layers plus mBERT’s (sub)word embeddings (“layer forgetting experiment. We report the mean and the 0”). Asterisks: insignificant bias effects at α < 0.05. 95% confidence interval over the five runs for NN on Bias-NLI and Accuracy (Acc) on the MNLI-m get language DE, the bias gets reduced, which is development set. Several interesting observations indicated by the lighter colors throughout all plots. emerge. First, the NN scores seem to be quite unstable across different runs (wide confidence Fairness Forgetting. Finally, we investigate intervals) for both BERT and A DELE, which is whether the debiasing effects persist even after the surprising given the size of the Bias-NLI test set
12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.0 0.0 0.0 n n n 0.0 0.0 0.0 0.0 n n n n 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 m m m m m m m (a) mBERT EN. (b) mBERT DE. (c) mBERT ES. (d) mBERT IT. (e) mBERT HR. (f) mBERT RU. (g) mBERT TR. 12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.0 0.0 0.0 n n n 0.0 0.0 0.0 0.0 n n n n 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 m m m m m m m (h) mBERTA EN. (i) mBERTA DE. (j) mBERTA ES. (k) mBERTA IT. (l) mBERTA HR. (m) mBERTA RU. (n) mBERTA TR. Figure 2: XWEAT effect sizes heat maps for (a) original mBERT, and the debiased (b) mBERTA DELE in seven languages (source language EN, and transfer languages DE, ES, IT, HR, RU, TR), for word embeddings averaged over different subsets of layers [m : n]. E.g., [0 : 0] points to word embeddings directly obtained from BERT’s (sub)word embeddings (layer 0); [1 : 7] indicates word vectors obtained by averaging word representations after Transformer layers 1 through 7. Lighter colors indicate less bias. 0.8 Model FN↑ NN↑ Acc↑ 0.7 BERT 0.010 0.082 84.77 0.6 A DELE 0.127 0.173 84.13 BERT NN 0.5 A DELE-TA 0.557 0.504 81.30 NN, Acc BERT Acc 0.4 ADELE NN ADELE Acc Table 4: Fairness preservation results for A DELE-TA. 0.3 We report bias measures Fraction Neutral (FN) and Net 0.2 Neutral (NN) on the Bias-NLI data set together with 0.1 NLI accuracy on MNLI-m dev set. 10k 25k 50k 75k 100k 150k 200k #instances only the TA parameters in downstream (MNLI) Figure 3: Bias and performance over time for different training. This way, the debiasing knowledge stored size of downstream (MNLI) training sets (#instances). We report mean and the 95% confidence interval over in A DELE’s debiasing adapters remains intact. Ta- five runs for Net Neutral (NN) on Bias-NLI and Accu- ble 4 compares Bias-NLI and MNLI performance racy (Acc) on the MNLI matched development set. of this fairness preserving variant (A DELE-TA) against BERT and A DELE. Results strongly suggest that by freezing the de- (1,936,512 instances). This could point to the lack biasing adapters and injecting the additional task of robustness of the NN measure (Dev et al., 2020) adapters, we indeed retain most of the debiasing as means for capturing biases in fine-tuned Trans- effects of A DELE: according to bias measures, formers. Second, after training on smaller datasets A DELE-TA is massively fairer than the fully fine- (10K), A DELE still retains much of its debiasing tuned A DELE (e.g., FN score of 0.557 vs. A DELE’s effect and is much fairer than BERT. With larger 0.127). Preventing fairness forgetting comes at a NLI training (already at 25K), however, much of tolerable task performance cost: A DELE-TA loses its debiasing effect vanishes, although it still seems 3 points in NLI accuracy compared to fully fine- to be slightly (but consistently) fairer than BERT tuning BERT and A DELE for the task. over time. We dub this effect fairness forgetting and will investigate it further in future work. 5 Related Work We provide a brief overview of work in two areas Preventing Fairness Forgetting. Finally, we which we bridge in this work: debiasing methods propose a downstream fine-tuning strategy that can and parameter efficient fine-tuning with adapters. prevent fairness forgetting and which is aligned with the modular debiasing nature of A DELE: we Adapter Layers in NLP. Adapters (Rebuffi (1) inject an additional task-specific adapter (TA) et al., 2018) have been introduced to NLP by on top of A DELE’s debiasing adapter and (2) update Houlsby et al. (2019), who demonstrated their ef-
fectiveness and efficiency for general language un- evaluated A DELE on gender debiasing of BERT, derstanding (NLU). Since then, they have been demonstrating its effectiveness on three intrinsic employed for various purposes: apart from NLU, and two extrinsic debiasing benchmarks. Further, task adapters have been explored for natural lan- applying A DELE on top of mBERT, we success- guage generation (Lin et al., 2020) and machine fully transfered its debiasing effects to six target translation quality estimation (Yang et al., 2020). languages. Finally, we showed that by combining Other works use language adapters encoding A DELE’s debiasing adapters with task-adapters, we language-specific knowledge, e.g., for machine can preserve the representational fairness even af- translation (Philip et al., 2020; Kim et al., 2019) or ter large-scale downstream training. We hope that multilingual parsing (Üstün et al., 2020). Further, A DELE catalyzes more research efforts towards adapters have been shown useful in domain adapta- making fair NLP fairer, i.e., more sustainable and tion (Pham et al., 2020; Glavaš et al., 2021) and for more inclusive (i.e., more multilingual). injection of external knowlege (Wang et al., 2020; Lauscher et al., 2020b). Pfeiffer et al. (2020b) use Acknowledgments adapters to learn both language and task represen- The work of Anne Lauscher and Goran Glavaš tations. Building on top of this, Vidoni et al. (2020) has been supported by the Multi2ConvAI Grant prevent adapters from learning redundant informa- (Mehrsprachige und Domänen-übergreifende Con- tion by introducing orthogonality constraints. versational AI) of the Baden-Württemberg Ministry Debiasing Methods. A recent survey covering of Economy, Labor, and Housing (KI-Innovation). research on stereotypical biases in NLP is provided Additionally, Anne Lauscher has partially received by Blodgett et al. (2020). In the following, we focus funding from the European Research Council on approaches for mitigating biases from PLMs, (ERC) under the European Union’s Horizon 2020 which are largely inspired by debiasing for static research and innovation program (grant agreement word embeddings (e.g., Bolukbasi et al., 2016; Dev No. 949944, INTEGRATOR). and Phillips, 2019; Lauscher et al., 2020a; Karve et al., 2019, inter alia). While several works pro- Further Ethical Considerations pose projection-based debiasing for PLMs (e.g., In this work, we employed a binary conceptual- Dev et al., 2020; Liang et al., 2020; Kaneko and ization of gender due to the plethora of existing Bollegala, 2021), most of the debiasing approaches bias evaluation tests that are restricted to such a require training. Here, some methods rely on de- narrow notion of gender available. Our work is biasing objectives (e.g., Qian et al., 2019; Bordia of methodological nature (i.e., we do not create and Bowman, 2019). In contrast, the debiasing ap- additional data sets and text resources), and our proach we employ in this work, CDA (Zhao et al., primary goal was to demonstrate the bias attenua- 2018), relies on adapting the input data and is more tion effectiveness of our approach based on debi- generally applicable. Variants of CDA exist, e.g., asing adapters: to this end, we relied on the avail- Hall Maudslay et al. (2019) use names as bias prox- able evaluation data sets from previous work. We ies and substitute instances instead of augmenting fully acknowledge that gender is a spectrum: we the data, whereas Zhao et al. (2019) use CDA at test fully support the inclusion of all gender identities time to neutralize the models’ biased predictions. (nonbinary, gender fluid, polygender, and other) in Webster et al. (2020) investigate one-sided vs. two- language technologies and strongly support work sided CDA for debiasing BERT in pretraining and on creating resources and data sets for measuring show dropout to be effective for bias mitigation. and attenuating harmful stereotypical biases ex- pressed towards all gender identities. Further, we 6 Conclusion acknowledge the importance of research on the in- We presented A DELE, a novel sustainable and mod- tersectionality (Crenshaw, 1989) of stereotyping, ular approach to debiasing PLMs based on the which we did not consider here for similar reasons adapter modules. In contrast to existing compu- – lack of training and evaluation data. Our modular tationally demanding debiasing approaches, which adapter-based debiasing approach, A DELE, how- debias the entire PLM via full fine-tuning, A DELE ever, is conceptually particularly suitable for ad- performs parameter-efficient debiasing by train- dressing complex intersectional biases, and this is ing dedicated debiasing adapters. We extensively something we intend to explore in our future work.
References Kimberlé Crenshaw. 1989. Demarginalizing the inter- section of race and sex: A black feminist critique of Soumya Barikeri, Anne Lauscher, Ivan Vulić, and antidiscrimination doctrine, feminist theory and an- Goran Glavaš. 2021. RedditBias: A real-world re- tiracist politics. u. Chi. Legal f., 1989:139. source for bias evaluation and debiasing of conver- sational language models. In Proceedings of the Sunipa Dev, Tao Li, Jeff M. Phillips, and Vivek Sriku- 59th Annual Meeting of the Association for Compu- mar. 2020. On measuring and mitigating biased in- tational Linguistics and the 11th International Joint ferences of word embeddings. In The Thirty-Fourth Conference on Natural Language Processing (Vol- AAAI Conference on Artificial Intelligence, AAAI ume 1: Long Papers), pages 1941–1955, Online. As- 2020, The Thirty-Second Innovative Applications of sociation for Computational Linguistics. Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances Marion Bartl, Malvina Nissim, and Albert Gatt. 2020. in Artificial Intelligence, EAAI 2020, New York, NY, Unmasking contextual stereotypes: Measuring and USA, February 7-12, 2020, pages 7659–7666. AAAI mitigating BERT’s gender bias. In Proceedings Press. of the Second Workshop on Gender Bias in Natu- ral Language Processing, pages 1–16, Barcelona, Sunipa Dev and Jeff Phillips. 2019. Attenuating bias in Spain (Online). Association for Computational Lin- word vectors. In The 22nd International Conference guistics. on Artificial Intelligence and Statistics, pages 879– 887. PMLR. Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (technology) is Jacob Devlin, Ming-Wei Chang, Kenton Lee, and power: A critical survey of “bias” in NLP. In Pro- Kristina Toutanova. 2019. BERT: Pre-training of ceedings of the 58th Annual Meeting of the Asso- deep bidirectional transformers for language under- ciation for Computational Linguistics, pages 5454– standing. In Proceedings of the 2019 Conference 5476, Online. Association for Computational Lin- of the North American Chapter of the Association guistics. for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Tolga Bolukbasi, Kai-Wei Chang, James Zou, pages 4171–4186, Minneapolis, Minnesota. Associ- Venkatesh Saligrama, and Adam Kalai. 2016. ation for Computational Linguistics. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Pro- Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and ceedings of the 30th International Conference on James Zou. 2018. Word embeddings quantify Neural Information Processing Systems, NIPS’16, 100 years of gender and ethnic stereotypes. Pro- page 4356–4364, Red Hook, NY, USA. Curran ceedings of the National Academy of Sciences, Associates Inc. 115(16):E3635–E3644. Shikha Bordia and Samuel R. Bowman. 2019. Identify- Goran Glavaš, Ananya Ganesh, and Swapna Somasun- ing and reducing gender bias in word-level language daran. 2021. Training and domain adaptation for su- models. In Proceedings of the 2019 Conference of pervised text segmentation. In Proceedings of the the North American Chapter of the Association for 16th Workshop on Innovative Use of NLP for Build- Computational Linguistics: Student Research Work- ing Educational Applications, pages 110–116, On- shop, pages 7–15, Minneapolis, Minnesota. Associ- line. Association for Computational Linguistics. ation for Computational Linguistics. Rowan Hall Maudslay, Hila Gonen, Ryan Cotterell, Tom Brown, Benjamin Mann, Nick Ryder, Melanie and Simone Teufel. 2019. It’s all in the name: Mit- Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind igating gender bias with name-based counterfactual Neelakantan, Pranav Shyam, Girish Sastry, Amanda data substitution. In Proceedings of the 2019 Con- Askell, Sandhini Agarwal, et al. 2020. Language ference on Empirical Methods in Natural Language models are few-shot learners. In Advances in Neural Processing and the 9th International Joint Confer- Information Processing Systems, volume 33, pages ence on Natural Language Processing (EMNLP- 1877–1901. Curran Associates, Inc. IJCNLP), pages 5267–5275, Hong Kong, China. As- sociation for Computational Linguistics. Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. Semantics derived automatically Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, from language corpora contain human-like biases. Bruna Morrone, Quentin de Laroussilhe, Andrea Science, 356(6334):183–186. Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez- In Proceedings of the 36th International Conference Gazpio, and Lucia Specia. 2017. SemEval-2017 on Machine Learning, ICML 2019, volume 97 of task 1: Semantic textual similarity multilingual and Proceedings of Machine Learning Research, pages crosslingual focused evaluation. In Proceedings 2790–2799, Long Beach, CA, USA. PMLR. of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1–14, Vancouver, Masahiro Kaneko and Danushka Bollegala. 2021. De- Canada. Association for Computational Linguistics. biasing pre-trained contextualised embeddings. In
Proceedings of the 16th Conference of the European First Workshop on Knowledge Extraction and Inte- Chapter of the Association for Computational Lin- gration for Deep Learning Architectures, pages 43– guistics: Main Volume, pages 1256–1266, Online. 49, Online. Association for Computational Linguis- Association for Computational Linguistics. tics. Saket Karve, Lyle Ungar, and João Sedoc. 2019. Con- Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, ceptor debiasing of word representations evaluated Dehao Chen, Orhan Firat, Yanping Huang, Maxim on WEAT. In Proceedings of the First Workshop Krikun, Noam Shazeer, and Zhifeng Chen. 2020. on Gender Bias in Natural Language Processing, Gshard: Scaling giant models with conditional com- pages 40–48, Florence, Italy. Association for Com- putation and automatic sharding. arXiv preprint putational Linguistics. arXiv:2006.16668. Sheng Liang, Philipp Dufter, and Hinrich Schütze. Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram 2020. Monolingual and multilingual reduction of Khadivi, and Hermann Ney. 2019. Pivot-based gender bias in contextualized representations. In transfer learning for neural machine translation be- Proceedings of the 28th International Conference tween non-English languages. In Proceedings of on Computational Linguistics, pages 5082–5093, the 2019 Conference on Empirical Methods in Natu- Barcelona, Spain (Online). International Committee ral Language Processing and the 9th International on Computational Linguistics. Joint Conference on Natural Language Process- ing (EMNLP-IJCNLP), pages 866–876, Hong Kong, Zhaojiang Lin, Andrea Madotto, and Pascale Fung. China. Association for Computational Linguistics. 2020. Exploring versatile generative language model via parameter-efficient transfer learning. In Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Findings of the Association for Computational Lin- method for stochastic optimization. In 3rd Inter- guistics: EMNLP 2020, pages 441–459, Online. As- national Conference on Learning Representations, sociation for Computational Linguistics. ICLR 2015, Conference Track Proceedings, San Diego, CA, USA. Kaiji Lu, Piotr Mardziel, Fangjing Wu, Preetam Aman- charla, and Anupam Datta. 2020. Gender bias in James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, neural natural language processing. In Logic, Lan- Joel Veness, Guillaume Desjardins, Andrei A Rusu, guage, and Security, pages 189–202. Springer. Kieran Milan, John Quan, Tiago Ramalho, Ag- nieszka Grabska-Barwinska, et al. 2017. Over- Michael McCloskey and Neal J Cohen. 1989. Catas- coming catastrophic forgetting in neural networks. trophic interference in connectionist networks: The Proceedings of the national academy of sciences, sequential learning problem. In Psychology of learn- 114(13):3521–3526. ing and motivation, volume 24, pages 109–165. El- sevier. Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yulia Tsvetkov. 2019. Measuring bias in contex- Nafise Sadat Moosavi, Angela Fan, Vered Shwartz, tualized word representations. In Proceedings of the Goran Glavaš, Shafiq Joty, Alex Wang, and Thomas First Workshop on Gender Bias in Natural Language Wolf, editors. 2020. Proceedings of SustaiNLP: Processing, pages 166–172, Florence, Italy. Associ- Workshop on Simple and Efficient Natural Language ation for Computational Linguistics. Processing. Association for Computational Linguis- tics, Online. Anne Lauscher and Goran Glavaš. 2019. Are we con- Nikita Nangia, Clara Vania, Rasika Bhalerao, and sistently biased? multidimensional analysis of bi- Samuel R. Bowman. 2020. CrowS-pairs: A chal- ases in distributional word vectors. In Proceedings lenge dataset for measuring social biases in masked of the Eighth Joint Conference on Lexical and Com- language models. In Proceedings of the 2020 Con- putational Semantics (*SEM 2019), pages 85–91, ference on Empirical Methods in Natural Language Minneapolis, Minnesota. Association for Computa- Processing (EMNLP), pages 1953–1967, Online. As- tional Linguistics. sociation for Computational Linguistics. Anne Lauscher, Goran Glavaš, Simone Paolo Ponzetto, Matthew Peters, Mark Neumann, Mohit Iyyer, Matt and Ivan Vulić. 2020a. A general framework for im- Gardner, Christopher Clark, Kenton Lee, and Luke plicit and explicit debiasing of distributional word Zettlemoyer. 2018. Deep contextualized word rep- vector spaces. Proceedings of the AAAI Conference resentations. In Proceedings of the 2018 Confer- on Artificial Intelligence, 34(05):8131–8138. ence of the North American Chapter of the Associ- ation for Computational Linguistics: Human Lan- Anne Lauscher, Olga Majewska, Leonardo F. R. guage Technologies, Volume 1 (Long Papers), pages Ribeiro, Iryna Gurevych, Nikolai Rozanov, and 2227–2237, New Orleans, Louisiana. Association Goran Glavaš. 2020b. Common sense or world for Computational Linguistics. knowledge? investigating adapter-based knowledge injection into pretrained transformers. In Proceed- Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, ings of Deep Learning Inside Out (DeeLIO): The Kyunghyun Cho, and Iryna Gurevych. 2021.
AdapterFusion: Non-destructive task composition Emma Strubell, Ananya Ganesh, and Andrew McCal- for transfer learning. In Proceedings of the 16th lum. 2019. Energy and policy considerations for Conference of the European Chapter of the Associ- deep learning in NLP. In Proceedings of the 57th ation for Computational Linguistics: Main Volume, Annual Meeting of the Association for Computa- pages 487–503, Online. Association for Computa- tional Linguistics, pages 3645–3650, Florence, Italy. tional Linguistics. Association for Computational Linguistics. Jonas Pfeiffer, Andreas Rücklé, Clifton Poth, Aish- Ahmet Üstün, Arianna Bisazza, Gosse Bouma, and warya Kamath, Ivan Vulić, Sebastian Ruder, Gertjan van Noord. 2020. UDapter: Language adap- Kyunghyun Cho, and Iryna Gurevych. 2020a. tation for truly Universal Dependency parsing. In AdapterHub: A framework for adapting transform- Proceedings of the 2020 Conference on Empirical ers. In Proceedings of the 2020 Conference on Em- Methods in Natural Language Processing (EMNLP), pirical Methods in Natural Language Processing: pages 2302–2315, Online. Association for Computa- System Demonstrations, pages 46–54, Online. Asso- tional Linguistics. ciation for Computational Linguistics. Marko Vidoni, Ivan Vulić, and Goran Glavaš. Jonas Pfeiffer, Ivan Vulić, Iryna Gurevych, and Se- 2020. Orthogonal language and task adapters in bastian Ruder. 2020b. MAD-X: An Adapter-Based zero-shot cross-lingual transfer. arXiv preprint Framework for Multi-Task Cross-Lingual Transfer. arXiv:2012.06460. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Ivan Vulić, Simon Baker, Edoardo Maria Ponti, Ulla pages 7654–7673, Online. Association for Computa- Petti, Ira Leviant, Kelly Wing, Olga Majewska, Eden tional Linguistics. Bar, Matt Malone, Thierry Poibeau, Roi Reichart, and Anna Korhonen. 2020. Multi-SimLex: A large- Minh Quang Pham, Josep Maria Crego, François Yvon, scale evaluation of multilingual and crosslingual lex- and Jean Senellart. 2020. A study of residual ical semantic similarity. Computational Linguistics, adapters for multi-domain neural machine transla- 46(4):847–897. tion. In Proceedings of the Fifth Conference on Ma- chine Translation, pages 617–628, Online. Associa- Tobias Walter, Celina Kirschner, Steffen Eger, Goran tion for Computational Linguistics. Glavaš, Anne Lauscher, and Simone Paolo Ponzetto. 2021. Diachronic analysis of german parliamentary Jerin Philip, Alexandre Berard, Matthias Gallé, and proceedings: Ideological shifts through the lens of Laurent Besacier. 2020. Monolingual adapters for political biases. arXiv preprint arXiv:2108.06295. zero-shot neural machine translation. In Proceed- ings of the 2020 Conference on Empirical Methods Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, in Natural Language Processing (EMNLP), pages Xuanjing Huang, Cuihong Cao, Daxin Jiang, Ming 4465–4470, Online. Association for Computational Zhou, et al. 2020. K-adapter: Infusing knowl- Linguistics. edge into pre-trained models with adapters. arXiv preprint arXiv:2002.01808. Yusu Qian, Urwa Muaz, Ben Zhang, and Jae Won Hyun. 2019. Reducing gender bias in word-level Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beu- language models with a gender-equalizing loss func- tel, Emily Pitler, Ellie Pavlick, Jilin Chen, Ed Chi, tion. In Proceedings of the 57th Annual Meeting of and Slav Petrov. 2020. Measuring and reducing the Association for Computational Linguistics: Stu- gendered correlations in pre-trained models. arXiv dent Research Workshop, pages 223–228, Florence, preprint arXiv:2010.06032. Italy. Association for Computational Linguistics. Adina Williams, Nikita Nangia, and Samuel Bowman. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, 2018. A broad-coverage challenge corpus for sen- Dario Amodei, and Ilya Sutskever. 2019. Language tence understanding through inference. In Proceed- models are unsupervised multitask learners. OpenAI ings of the 2018 Conference of the North American Blog, 1(8). Chapter of the Association for Computational Lin- guistics: Human Language Technologies, Volume Sylvestre-Alvise Rebuffi, Andrea Vedaldi, and Hakan 1 (Long Papers), pages 1112–1122, New Orleans, Bilen. 2018. Efficient parametrization of multi- Louisiana. Association for Computational Linguis- domain deep neural networks. In 2018 IEEE/CVF tics. Conference on Computer Vision and Pattern Recog- nition, pages 8119–8127. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pier- Rachel Rudinger, Jason Naradowsky, Brian Leonard, ric Cistac, Tim Rault, Remi Louf, Morgan Funtow- and Benjamin Van Durme. 2018. Gender bias in icz, Joe Davison, Sam Shleifer, Patrick von Platen, coreference resolution. In Proceedings of the 2018 Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Conference of the North American Chapter of the Teven Le Scao, Sylvain Gugger, Mariama Drame, Association for Computational Linguistics: Human Quentin Lhoest, and Alexander Rush. 2020. Trans- Language Technologies, Volume 2 (Short Papers), formers: State-of-the-art natural language process- pages 8–14, New Orleans, Louisiana. Association ing. In Proceedings of the 2020 Conference on Em- for Computational Linguistics. pirical Methods in Natural Language Processing:
You can also read