Attenuating Bias in Word Vectors
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Attenuating Bias in Word Vectors Sunipa Dev Jeff Phillips University of Utah University of Utah Abstract and Hendricks et al. (2018) [6] show that machine arXiv:1901.07656v1 [cs.CL] 23 Jan 2019 learning algorithms and their output show more bias Word vector representations are well devel- than the data they are generated from. oped tools for various NLP and Machine Word vector embeddings as used in machine learning Learning tasks and are known to retain sig- towards applications which significantly affect people’s nificant semantic and syntactic structure of lives, such as to assess credit [11], predict crime [5], languages. But they are prone to carrying and other emerging domains such judging loan appli- and amplifying bias which can perpetrate dis- cations and resumes for jobs or college applications. crimination in various applications. In this So it is paramount that efforts are made to identify work, we explore new simple ways to detect and if possible to remove bias inherent in them. Or the most stereotypically gendered words in an at least, we should attempt minimize the propagation embedding and remove the bias from them. of bias within them. For instance, in using existing We verify how names are masked carriers of word embeddings, Bolukbasi et al. (2016) [3] demon- gender bias and then use that as a tool to strated that women and men are associated with dif- attenuate bias in embeddings. Further, we ferent professions, with men associated with leader- extend this property of names to show how ships roles and professions like doctor, programmer names can be used to detect other types of and women closer to professions like receptionist or bias in the embeddings such as bias based on nurse. Caliskan et al. (2017) [7] similarly noted how race, ethnicity, and age. word embeddings show that women are more closely associated with arts than math while it is the opposite for men. They also showed how positive and negative 1 BIAS IN WORD VECTORS connotations are associated with European-American versus African-American names. Word embeddings are an increasingly popular applica- Our work simplifies, quantifies, and fine-tunes these tion of neural networks wherein enormous text corpora approaches: we show that very simple linear projection are taken as input and words therein are mapped to of all words based on vectors captured by common a vector in some high dimensional space. Two com- names is an effective and general way to significantly monly used approaches to implement this are Word- reduce bias in word embeddings. More specifically: ToVec [15,16] and GloVe [17]. These word vector repre- sentations estimate similarity between words based on the context of their nearby text, or to predict the likeli- 1a. We demonstrate that simple linear projection of hood of seeing words in the context of another. Richer all word vectors along a bias direction is more properties were discovered such as synonym similarity, effective than the Hard Debiasing of Bolukbasi et linear word relationships, and analogies such as man : al. (2016) [3] which is more complex and also woman :: king : queen. Their use is now standard in partially relies on crowd sourcing. training complex language models. 1b. We show that these results can be slightly im- However, it has been observed that word embeddings proved by dampening the projection of words are prone to express the bias inherent in the data it is which are far from the projection distance. extracted from [3,4,7]. Further, Zhao et al. (2017) [18] 2. We examine the bias inherent in the standard Thanks to NSF CCF-1350888, ACI-1443046, CNS-1514520, word pairs used for debiasing based on gender CNS-1564287, IIS-1816149, and NVidia Corporation. by randomly flipping or swapping these words in Part of the work by JP was done while visiting the Simons the raw text before creating the embeddings. We Institute for Theory of Computing. show that this alone does not eliminate bias in
Attenuating Bias in Word Vectors word embeddings, corroborating that simple lan- {man,woman}, {son,daughter}, {he,she}, {his,her}, {male,female}, {boy,girl}, {himself,herself}, guage modification is not as effective as repairing {guy,gal}, {father,mother}, {john,mary} the word embeddings themselves. Table 1: Gendered Word Pairs 3a. We show that common names with gender associ- ation (e.g., john, amy) often provides a more effec- tive gender subspace to debias along than using vB be the top singular vector of Q. We revisit how to gendered words (e.g., he, she). create such a bias direction in Section 4. Now given a word vector w ∈ W , we can project it to 3b. We demonstrate that names carry other inher- its component along this bias direction vB as ent, and sometimes unfavorable, biases associated with race, nationality, and age, which also corre- πB (w) = hw, vB ivB . sponds with bias subspaces in word embeddings. And that it is effective to use common names to establish these bias directions and remove this 3.1 Existing Method : Hard Debiasing bias from word embeddings. The most notable advance towards debiasing embed- dings along the gender direction has been by Boluk- 2 DATA AND NOTATIONS basi et al. (2016) [3] in their algorithm called Hard Debiasing (HD). It takes a set of words desired to We set as default the text corpus of a Wikipedia be neutralized, {w1 , w2 , . . . , wn } = WN ⊂ W , a unit dump (dumps.wikimedia.org/enwiki/latest/ bias subspace vector vB , and a set of equality sets enwiki-latest-pages-articles.xml.bz2) with E1 , E2 , . . . , Em . 4.57 billion tokens and we extract a GloVe embedding First, words {w1 , w2 , . . . , wn } ∈ WN are projected or- from it in D = 300 dimensions per word. We restrict thogonal to the bias direction and normalized the word vocabulary to the most frequent 100,000 words. We also modify the text corpus and extract wi − wB wi0 = . embeddings from it as described later. ||wi − wB || So, for each word in the Vocabulary W , we represent Second, it corrects the locations P of the vectors in the the word by the vector wi ∈ RD in the embedding. 1 ¯ equality sets. Let µj = |E| e be the mean of The bias (e.g., gender) subspace is denoted by a set of 1 Pe∈E m j vector B. It is typically considered in this work to be a an equality set, and µ = m j=1 µj be the mean of single unit vector, vB (explained in detail later). As we of equality set means. Let νj = µ − µj be the offset will revisit, a single vector is typically sufficient, and of a particular equality set from the mean. Now each will simplify descriptions. However, these approaches e ∈ Ej in each equality set Ej is first centered using can be generalized to a set of vectors defining a multi- their average and then neutralized as dimensional subspace. πB (e) − vB q e0 = νj + 1 − kνj k2 . kπB (e) − vB k 3 HOW TO ATTENUATE BIAS Intuitively νj quantifies the amount words in each Given a word embedding, debiasing typically takes as equality set Ej differ from each other in directions input a set Ȩ = {E1 , E2 , . . . , Em } of equality sets. apart from the gender direction. This is used to center An equality set Ej for instance can be a single pair the words in each of these sets. (e.g., {man, woman}), but could be more words (e.g., This renders word pairs such as man and woman as {latina, latino, latinx}) that if the bias connota- equidistant from the neutral words wi0 with each word tion (e.g, gender) is removed, then it would objectively of the pair being centralized and moved to a position make sense for all of them to be equal. Our data sets opposite the other in the space. This can filter out will only use word pairs (as a default the ones in Ta- properties either word gained by being used in some ble 1), and we will describe them as such hereafter for other context, like mankind or humans for the word simpler descriptions. In particular, we will represent man. D each Ej as a set of two vectors e+ − i , ei ∈ R . ¯ The word set WN = {w1 , w2 , . . . , wn } ⊂ W which Given such a set Ȩ of equality sets, the bias vector vB is debiased is obtained in two steps. First it seeds can be formed as follows [3]. For each Ej = {e+ − j , ej } some words as definitionally gendered via crowd sourc- + − create a vector ~ei = ei − ei between the pairs. Stack ing and using dictionary definitions; the complement these to form a matrix Q = [~e1 ~e2 . . . ~em ], and let – ones not selected in this step – are set as neutral.
Sunipa Dev, Jeff Phillips man Next, using this seeding an SVM is trained and used to predict among all W the set of other biased WB or µ1 vB neutral words WN . This set WN is taken as desired hw, vB ivB to be neutral and is debiased. Thus not all words W woman he µ in the vocabulary are debiased in this procedure, only a select set chosen via crowd-sourcing and definitions, ⌘ and its extrapolation. Also the word vectors in the µ2 equality sets are also handled separately. This makes w this approach not a fully automatic way to debias the vector embedding. she 3.2 Alternate and Simple Methods Figure 1: Illustration of η and β for word vector w. We next present some simple alternatives to HD which are simple and fully automatic. These all assume a bias direction vB . 3.3 Partial Projection Subtraction. As a simple baseline, for all word A potential issue with the simple approaches is that vectors w subtract the gender direction vB from w: they can significantly change some embedded words w 0 = w − vB . which are definitionally biased (e.g., the neutral words WB described by Bolukbasi et al. [3]). [[We note that this may not *actually* be a problem (see Section 5); Linear Projection. A better baseline is to project the change may only be associated with the bias, so re- all words w ∈ W orthogonally to the bias vector vB . moving it would then not change the meaning of those words in any way except the ones we want to avoid.]] w0 = w − πB (w) = w − hw, vB ivB . However, these intuitively should be words which have correlation with the bias vector, but also are far in the This enforces that the updated set W 0 = {w0 | w ∈ W } orthogonal direction. In this section we explore how has no component along vB , and hence the resulting to automatically attenuate the effect of the projection span is only D − 1 dimensions. Reducing the total on these words. dimension from say 300 to 299 should have minimal effects of expressiveness or generalizability of the word This stems from the observation that given a bias di- vector embeddings. rection, the words which are most extreme in this di- rection (have the largest dot product) sometimes have Bolukbasi et al. [3] apply this same step to a dictio- a reasonable biased context, but some do not. These nary definition based extrapolation and crowd-source- “false positives” may be large normed vectors which chosen set of word pairs WN ⊂ W . We quantify in also happen to have a component in the bias direc- Section 5 that this single universal projection step de- tion. biases better than HD. We start with a bias direction vB and mean µ derived For example, consider the bias as gender, and the from equality pairs (defined the same way as in context equality set with words man and woman. Linear pro- of HD). Now given a word vector w we decompose it jection will subtract from their word embeddings the into key values along two components, illustrated in proportion that were along the gender direction vB Figure 1. First, we write its bias component as learned from a larger set of equality pairs. It will make them close-by but not exactly equal. The word man is β(w) = hw, vB i − hµ, vB i. used in many extra senses than the word woman; it is used to refer to humankind, to a person in general, This is the difference of w from µ when both are pro- and in expressions like “oh man”. In contrast a sim- jected onto the bias direction vB . pler word pair with fewer word senses, like (he - she) and (him - her), we can expect them to be almost at Second, we write a (residual) orthogonal component identical positions in the vector space after debiasing, implying their synonymity. r(w) = w − hw, vB ivB . Thus, this approach uniformly reduces the component Let η(w) = kr(w)k be its value. It is the orthogonal of the word along the bias direction without compro- distance from the bias vector vB ; recall we chose vB mising on the differences that words (and word pairs) to pass through the origin, so the choice of µ does not have. affect this distance.
Attenuating Bias in Word Vectors Now we will maintain the orthogonal component Proportion along the 1 dimensional gender subspace 1.00 f (r(w), which is in a subspace spanned by D − 1 out of f1 0.75 f2 D dimensions) but adjust the bias component β(w) to f3 0.50 P1 make it closer to µ. But the adjustment will depend P2 on the magnitude η(w). As a default we set 0.25 0.00 w0 = µ + r(w) 0.25 so all word vectors retain their orthogonal component, 0.50 but have a fixed and constant bias term. This is func- 0.75 tionally equivalent to the Linear Projection approach; 1.00 the only difference is that instead of having a 0 magni- 2 1 0 1 2 tude along vB (and the orthogonal part unchanged), it Proportion along the other 299 dimensions instead has a magnitude of constant µ along vB (and the orthogonal part still unchanged). This adds a Figure 2: The gendered region as per the three varia- constant to every inner product, and a constant off- tions of projection. Both points P1 and P2 have a dot set to any linear projection or classifier. If we are product of 1.0 initially with the gender subspace. But required to work with normalize vectors (we do not their orthogonal distance to it differs, as expressed by recommend this as the vector length captures veracity their dot product with the other 299 dimensions. information about its embedding), we can simple set w0 = r(w)/kr(w)k. to have more damping (large fi values – they are not Given this set-up, we now propose three modifications. moved much). In each set Given sets S and T , we can define a gain function w0 = µ + r(w) + β · fi (η(w)) · vB X X γi,ρ (σ) = β(s)(1−fi (η(s)))−ρ β(t)(1−fi (η(t))), were fi for i = {1, 2, 3} is a function of only the or- s∈S t∈T thogonal value η(w). For the default case f (η) = 0 with a regularization term ρ. The gain γ is large when f1 (η) = σ 2 /(η + 1)2 most bias words in S have very little damping (small f2 (η) = exp(−η 2 /σ 2 ) fi , large 1−fi ), and the opposite is true for the neutral f3 (η) = max(0, σ/2η) words in T . We want the neutral words to have large fi and hence small 1 − fi , so they do not change much. Here σ is a hyperparameter that controls the impor- To define the gain function, we need sets S and T ; we tance of η; in Section 3.4 we show that we can just set do so with the bias of interest as gender. The biased set σ = 1. x S is chosen among a set of 1000 popular names in W In Figure 2 we see the regions of the (η, β)-space which (based on babynamewizard.com and and SSN that the functions f , f1 and f2 consider gendered. f databases [1,2]) are strongly associated with a gender. projects all points onto the y = µ line. But variants The neutral set T is chosen as the most frequent 1000 f1 , f2 , and f3 are represented by curves that dampen words from W , after filtering out obviously gendered the bias reduction to different degrees as η increases. words like names man and he. We also omit occupation Points P1 and P2 have the same dot products with words like doctor and others which may carry unin- the bias direction but different dot products along the tentional gender bias (these are words we would like other D − 1 dimensions. We can observe the effects to automatically de-bias). The neutral set may not of each dampening function as η increases from P1 to be perfectly non-gendered, but it provides reasonable P2. approximation of all non-gendered words. We find for an array of choices for ρ (we tried ρ = 1, 3.4 SETTING σ = 1. ρ = 10, and ρ = 100), the value σ = 1 approxi- mately maximizes the gain function γi,ρ (σ) for each To complete the damping functions f1 , f2 , and f3 , we i ∈ {1, 2, 3}. So for hereafter we fix σ = 1. need a value σ. If σ is larger, then more word vectors have bias completely removed; if σ is smaller, than Although these sets S and T play a role somewhat more words are unaffected by the projection. The goal similar to the crowd-sourced sets WB and WN from is that words S which are likely to carry a bias con- HD that we hoped to avoid, the role here is much notation to have little damping (small fi values) and reduced. This is just to verify that a choice of σ = 1 words T which are unlikely to carry a bias connotation is reasonable, and otherwise they are not used.
Sunipa Dev, Jeff Phillips 0.5 0.200 0.175 0.6 0.175 0.6 0.4 0.150 0.4 0.5 0.150 0.5 0.125 0.3 0.3 0.125 0.4 0.100 0.4 0.100 0.3 0.2 0.075 0.3 0.2 0.075 0.2 0.050 0.2 0.050 0.1 0.1 0.1 0.025 0.025 0.1 0.0 0.0 0.0 0.000 0.000 0.0 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0.00 0.50 0.75 1.00 Figure 3: Fractional singular values for avg male - Figure 4: Proportion of singular values along principal female words (as per Table 1) after flipping with prob- directions (left) using names as indicators, and (right) ability (from left to right) 0.0 (the original data set), using word pairs from Table 1 as indicators 0.5, 0.75, and 1.0. 4 THE BIAS SUBSPACE We explore ways of detecting and defining the bias 3.5 Flipping the Raw Text subspace vB and recovering the most gendered words in the embedding. Recall as default, we use vB as the Since the embeddings preserve inner products of the top singular vector of the matrix defined by stacking data from which it is drawn, we explore if we can make vectors ~ei = e+ − i −ei of biased word pairs. We primarily the data itself gender unbiased and then observe how focus on gendered bias, using words in Table 1, and that change shows up in the embedding. Unbiasing show later how to effectively extend to other biases. a textual corpus completely can be very intricate and We discuss this in detail in Supplementary Material complicated since there are a many (sometimes im- C. plicit) gender indicators in text. Nonetheless, we pro- pose a simple way of neutralizing bias in textual data Most gendered words. The dot product, hvB , wi by using word pairs E1 , E2 , . . . Em ; in particular, when of the word vectors w with the gender subspace vB we observe in raw text on part of a word part, we ran- is a good indicator of how gendered a word is. The domly flip it to the other pair. For instance for gen- magnitude of the dot product tells us of the length dered word pairs (e.g., (he - she)) in a string “he was along the gender subspace and the sign tells us whether a doctor” we may flip to “she was a doctor.” it is more female or male. Some of the words denoted We implement this procedure over the entire input raw as most gendered are listed in Table 3. text, and try various probabilities of flipping each ob- served word, focusing on probabilities 0.5, 0.75 and 4.1 Bias Direction using Names 1.00. The first 0.5-flip probability makes each element of a word pair equally likely. The last 1.00-flip proba- When listing gendered words by |hvB , wi|, we observe bility reverses the roles of those word pairs, and 0.75- that many gendered words are names. This indicates flip probability does something in between. We per- the potential to use names as an alternative (and po- form this set of experiments on the default Wikipedia tentially in a more general way) to bootstrap finding data set and switch between word pairs (say man → the gender direction. woman, she → he, etc), from a list larger that Table 3 consisting of 75 word pairs; see Supplementary Mate- From the top 100K words, we extract the 10 rial D.1. most common male {m1 , m2 , . . . , m10 } and female {s1 , s2 , . . . , s10 } names which are not used in ambigu- We observe how the proportion along the principal ous ways (e.g., not the name hope which could also component changes with this flipping in Figure 3. We refer to the sentiment). We pair these 10 names from see that flipping with 0.5 somewhat dampens the dif- each category (male, female) randomly and compute ference between the different principal components. the SVD as before. We observe in 4 that the frac- On the other hand flipping with probability 1.0 (and tional singular values show a similar pattern as with to a lesser extent 0.75) exacerbates the gender compo- the list of correctly gendered word pairs like (man - nents rather than dampening it. Now there are two woman), (he - she), etc. But this way of pairing names components significantly larger than the others. This is quite imprecise. These names are not ‘opposites’ of indicates this flipping is only addressing part of the ex- each other in the sense that word pairs are. So, we plicit bias, but missing some implicit bias, and these modify how we compute vB now so that we can better effects are now muddled. use names to detect the bias in the embedding. The We list some gender biased analogies in the default em- following method gives us this advantage where we do bedding and how they change with each of the methods not necessarily need word pairs or equality sets as in described in this section in Table 2. Bolukbasi et al. [3].
Attenuating Bias in Word Vectors Table 2: What analogies look like before and after damping gender by different methods discussed : hard Debiaisng, flipping words in text corpus, subtraction and projection analogy head original HD flipping subtraction projection 0.5 0.75 1.0 man : woman :: doctor : nurse surgeon dr dr medicine physician physician man : woman :: footballer : politician striker midfielder goalkeeper striker politician midfielder he : she :: strong : weak stronger weak strongly many well stronger he : she :: captain : mrs lieutenant lieutenant colonel colonel lieutenant lieutenant john : mary :: doctor : nurse physician medicine surgeon nurse father physician Table 3: Some of the most gendered words in default Table 4: Gendered occupations as observed in word embedding; and most gendered adjectives and occupa- embeddings using names as the gender direction indi- tion words. cator Gendered Words Female Occ Male Occ Female* Occ Male* Occ miss herself forefather himself nurse captain policeman policeman maid heroine nephew congressman maid cop detective cop motherhood jessica zahir suceeded actress boss character character adriana seductive him sir housewife officer cop assassin Female Adjectives Male Adjectives dancer actor assassin bodyguard glamorous strong nun scientist actor waiter diva muscular waitress gangster waiter actor shimmery powerful scientist trucker butler detective beautiful fast Female Occupations Male Occupations nurse soldier 5 QUANTIFYING BIAS maid captain housewife officer prostitute footballer In this section we develop new measures to quantify how much bias has been removed from an embedding, and evaluate the various techniques we have developed for doing so. Our gender direction is calculated as, As one measure, we use the Word Embedding Associ- ation Test (WEAT) test developed by Caliskan et al. s−m vB,names = , (2017) [7] as analogous to the IAT tests to evaluate the ks − mk association of male and female gendered words with two categories of target words: career oriented words 1 1 P P where s = 10 i si and m = 10 i mi . versus family oriented words. We detail WEAT and list the exact words used (as in [7]) in Supplementary Using the default Wikipedia dataset, we found that Material B; smaller values are better. this is a good approximator of the gender subspace defined by the first right singular vector calculated us- Bolukbasi et al. [3] evaluated embedding bias use a ing gendered words from Table 1; there dot product crowsourced judgement of whether an analogy pro- is 0.809. We find similar large dot product scores for duced by an embedding is biased or not. Our goal other datasets too. was to avoid crowd sourcing, so we propose two more automatic tests to qualitatively and uniformly evalu- Here too we collect all the most gendered words as ate an embedding for the presence of gender bias. per the gender direction vB,names determined by these names. Most gendered words returned are similar as Embedding Coherence Test (ECT). A way to using the default vB , like occupational words, adjec- evaluate how the neutralization technique affects the tives, and synonyms for each gender. We find names embedding is to evaluate how the nearest neighbors to express similar classification of words along male change for (a) gendered pairs of words Ȩ and (b) - female vectors with homemaker more female and indirect-bias-affected words such as those associated policeman being more male. We illustrate this in more with sports or occupational words (e.g., football, detail in Table 4. captain, doctor). We use the gendered word pairs Using that direction, we debias by linear projection. in Table 1 for Ȩ and the professions list P = There is a similar shift in analogy results. We see a {p1 , p2 , . . . , pk } as proposed and used by Bolukbasi few examples in Table 5. et al. https://github.com/tolga-b/debiaswe (see
Sunipa Dev, Jeff Phillips Table 5: What analogies look like before and after removing the gender direction using names analogy head original subtraction projection man : woman :: doctor : nurse physician physician man : woman :: footballer : politician politician midfielder he : she :: strong : weak very stronger he : she :: captain : mrs lieutenant lieutenant john : mary :: doctor : nurse dr dr also Supplementary Material D.2) to represent (b). The scores for EQT are typically much smaller than for ECT. We explain two reasons for this. S1: For all word pair {e+ − j , ej } P= Ej ∈ Ȩ we com- First, EQT does not check for if the analogy makes 1 + pute two means m = |Ȩ | Ej ∈Ȩ ej and s = relative sense, biased or otherwise. So, “man : woman 1 P − :: doctor : nurse” is as wrong as “man : woman :: |Ȩ| Ej ∈Ȩ ej . We find the cosine similarity of doctor : chair.” This pushes the score down. both m and s to all words pi ∈ P . This creates two vectors um , us ∈ Rk . Second, synonyms in each set si as returned by Word- ¯ Net [8] on the Natural Language Toolkit, NLTK [14] S2: We transform these similarity vectors to replace do not always contain all possible variants of the each coordinate by its rank order, and compute word. For example, the words psychiatrist and the Spearman Coefficient (in [−1, 1], larger is bet- psychologist can be seen as analogous for our pur- ter) between the rank order of the similarities to poses here but linguistically are removed enough that words in P . WordNet does not put them as synonyms together. Hence, even after debiasing, if the analogy returns Thus, here, we care about the order in which the words “man : woman :: psychiatrist : psychologist“ S1 in P occur as neighbors to each word pair rather than returns 0. Further, since the data also has several the exact distance. The exact distance between each misspelt words, archeologist is not recognized as a word pair would depend on the usage of each word and synonym or alternative for the word archaeologist. thus on all the different dimensions other than the gen- For this too S1 returns a 0. der subspace too. But the order being relatively the same, as determined using Spearman Coefficient would The first caveat can be side-stepped by restricting the indicate the dampening of bias in the gender direction pool of words we search over for the analogous word (i.e., if doctor by profession is the 2nd closest of all to be from list P . But it is debatable if an embedding professions to both man and woman, then the embed- should be penalized equally for returning both nurse ding has a dampened bias for the word doctor in the or chair for the analogy “man : woman :: doctor : ?” gender direction). Neutralization should ideally bring This measures the quality of analogies, with better the Spearman coefficient towards 1. quality having a score closer to 1. Embedding Quality Test (EQT). The demon- Evaluating embeddings. We mainly run 4 meth- stration by Bolukbasi et al. [3] about the skewed gen- ods to evaluate our methods WEAT, EQT, and two der roles in embeddings using analogies is what we try variants of ECT: ECT (word pairs) uses Ȩ defined by to quantify in this test. We attempt to quantify the words in Table 1 and ECT (names) which uses vectors improvement in analogies with respect to bias in the m and s derived by gendered names. embeddings. We observe in Table 6 that the ECT score increases We use the same sets Ȩ and P as in the ECT test. for all methods in comparison to the non-debiased (the However, for each profession pi ∈ P we create a list original) word embedding; the exception is flipping Si of their plurals and synonyms from WordNet on with 1.0 probability score for ECT (word pairs) and NLTK [14]. all flipping variants for ECT (names). Flipping does nothing to affect the names, so it is not surprising that S1: For each word pair {e+ − j , ej } = Ej ∈ Ȩ, and each it does not improve this score; further indicating that occupation word pi ∈ P , we test if the analogy it is challenging to directly fix bias in raw text before e+ − j : ej :: pi returns a word from Si . If yes, we set creating embeddings. Moreover, HD has the lowest Q(Ej , pi ) = 1, and Q(Ej , pi ) = 0 otherwise. score (of 0.917) whereas projection obtains scores of S2: Return 0.996 (with vB ) and 0.943 (with vB,names ). 1 1 P the average P value across all combinations |Ȩ| k E j ∈ Ȩ p i ∈P Q(Ej , pi ). EQT is a more challenging test, and the original em-
Attenuating Bias in Word Vectors Table 6: Performance on ECT, EQT and WEAT by the different debiasing methods; and performance on standard similarity and analogy tests. analogy head original HD flipping subtraction projection 0.5 0.75 1.0 word pairs names word pairs names ECT (word pairs) 0.798 0.917 0.983 0.984 0.683 0.963 0.936 0.996 0.943 ECT (names) 0.832 0.968 0.714 0.662 0.587 0.923 0.966 0.935 0.999 EQT 0.128 0.145 0.131 0.098 0.085 0.268 0.236 0.283 0.291 WEAT 1.623 1.221 1.164 1.09 1.03 1.427 1.440 1.233 1.219 WSim 0.637 0.537 0.567 0.537 0.536 0.627 0.636 0.627 0.629 Simlex 0.324 0.314 0.317 0.314 0.264 0.302 0.312 0.321 0.321 Google Analogy 0.623 0.561 0.565 0.561 0.321 0.538 0.565 0.565 0.584 bedding only achieves a score of 0.128, and HD only Table 7: Performance of damped linear projection obtains 0.145 (that is 12 − 15% of occupation words using word pairs. have their related word as nearest neighbor). On the Tests f f1 f2 f3 other hand, projection increases this percentage to ECT 0.996 0.994 0.995 0.997 28.3% (using vB ) and 29.1% (using vB,names ). Even EQT 0.283 0.280 0.292 0.287 subtraction does nearly as well at between 23 − 27%. WEAT 1.233 1.253 1.245 1.241 Generally, the subtraction always performs slightly WSim 0.627 0.628 0.627 0.627 worse than projection. Simlex 0.321 0.324 0.324 0.324 Google Analogy 0.565 0.571 0.569 0.569 For the WEAT test, the original data has a score of 1.623, and this is decreased the most by all forms of flipping, down to about 1.1. HD and projection do Analogy test. This test set is devoid of bias and is about the same with HD obtaining a score of 1.221 and made up of syntactic and semantic analogies. So, a projection obtaining 1.219 (with vB,names ) and 1.234 score closer to that of the original, biased embedding, (with vB ); values closer to 0 are better (See Supple- tells us that more structure has been retained by f1 , f2 mentary Material B). and f3 . Overall, any of these approaches could be used if a user wants to debias while retaining as much struc- In the bottom of Table 6 we also run these approaches ture as possible, but otherwise linear projection (or f ) on standard similarity and analogy tests for evaluat- is roughly as good as these dampened approaches. ing the quality of embeddings. We use cosine similar- ity [13] on WordSimilarity-353 (WSim, 353 word pairs) [9] and SimLex-999 (Simlex, 999 word pairs) [10], each 6 DETECTING OTHER BIAS of which evaluates a Spearman coefficient (larger is USING NAMES better). We also use the Google Analogy Dataset us- ing the function 3COSADD [12] which takes in three We saw so far how projection combined with finding words which for a part of the analogy and returns the the gender direction using names works well and works 4th word which fits the analogy the best. as well as projection combined with finding the gender direction using word pairs. We observe (as expected) that all that debiasing ap- proaches reduce these scores. The largest decrease in We explore here a way of extending this approach to scores (between 1% and 10%) is almost always from detect other kinds of bias where we cannot necessar- HD. Flipping at 0.5 rate is comparable to HD. And ily find good word pairs to indicate a direction, like simple linear projection decreases the least (usually Table 1 for gender, but where names are known to only about 1%, except on analogies where it is 7% belong to certain protected demographic groups. For (with vB ) or 5% (with vB,names ). example, there is a divide between names that differ- ent racial groups tend to use more. Caliskan et al. [7] In Table 7 we also evaluate the damping mechanisms use a list of names that are more African-American defined by f1 , f2 , and f3 , using vB . These are very (AA) versus names that are more European-American comparable to simple linear projection (represented by (EA) for their analysis of bias. There are similar lists f ). The scores for ECT, EQT, and WEAT are all of names that are distinctly and commonly used by about the same as simple linear projection, usually different ethnic, racial (e.g., Asian, African-American) slightly worse. and even religious (for e.g., Islamic) groups. While ECT, EQT and WEAT scores are in a similar We first try this with two common demographic group range for all of f , f1 , f2 , and f3 ; the dampened ap- divides : Hispanic / European-American and African- proaches f1 , f2 , and f3 performs better on the Google American / European-American.
Sunipa Dev, Jeff Phillips Hispanic and European-American names. Even though we begin with most commonly used His- panic (H) names (Supplementary Material D), this is tricky as not all names occur as much as Euro- pean American names and are thus not as well em- bedded. We use the frequencies from the dataset to guide us in selecting commonly used names that are also most frequent in the Wikipedia dataset. Using the same method as Section 4.1, we determine the direction, vB,names , which encodes this racial differ- ence and find the words most commonly aligned with Figure 5: Gender and racial bias in the embedding it. Other Hispanic and European-American names are the closest words. But other words like, latino or hispanic also appear to be close, which affirms Table 8: WEAT positive-negative test scores before that we are capturing the right subspace. and after debiasing Before Debiasing After Debiasing African-American and European-American EA-AA 1.803 0.425 names. We see a similar trend when we use EA-H 1.461 0.480 African-American names and European-American Youth-Aged 0.915 0.704 names (Figure 5). We use the African-American names used by Caliskan et al. (2017) [7]. We bias, we see that biased words like other names belong- determine the bias direction by using method in ing to these specific demographic groups, slang words, Section 4.1. colloquial terms like latinos are removed from the We plot in Figure 5 a few occupation words along the closest 10% words. This is beneficial since the distin- axes defined by H-EA and AA-EA bias directions, and guishability of demographic characteristics based on compare them with those along the male-female axis. names is what shows up in these different ways like in The embedding is different among the groups, and occupational or financial bias. likely still generally more subordinate-biased towards Hispanic and African-American names as it was for Age-associated names. We observed that names female. Although footballer is more Hispanic than can be masked carriers of age too. Using the database European-American, while maid is more neutral in the for names through time [1] and extracting the most racial bias setting than the gender setting. We see this common names from early 1900s as compared to late pattern repeated across embeddings and datasets (see 1900s and early 2000s, we find a correlation between Supplementary Material A). these names (see Supplementary Material) and age re- When we switch the type of bias, we also end up find- lated words. In Figure 6, we see a clear correlation be- ing different patterns in the embeddings. In the case tween age and names. Bias in this case does not show of both of these racial directions, there is a the split in up in professions as clearly as in gender but in terms of not just occupation words but other words that are de- association with positive and negative words [7]. We tected as highly associated with the bias subspace. It again evaluate using a WEAT test in Table 8, the bias shows up foremost among the closest words of the sub- before and after debiasing the embedding. space of the bias. Here, we find words like drugs and illegal close to the H-EA direction while, close to the AA-EA direction, we retrieve several slang words youth used to refer to African-Americans. These word as- maturity sociations with each racial group can be detected by youthadolescence the WEAT tests (lower means less bias) using positive young and negative words as demonstrated by Caliskan et al. (2017) [7]. We evaluate using the WEAT test before senile venerable and after linear projection debiasing in Table 8. For each of these tests, we use half of the names in each cat- old elderly egory for finding the bias direction and the other half for WEAT testing. This selection is done arbitrarily aged and the scores are averaged over 3 such selections. More qualitatively, as a result of the dampening of Figure 6: Detecting Age with Names : a plot of age related terms along names from different centuries.
Attenuating Bias in Word Vectors 7 DISCUSSION [10] F Hill, R Reichart, and A Korhonen. Simlex-999 : Evaluating semantic models with (genuine) sim- Different types of bias exist in textual data. Some ilarity estimation. In Computational Linguistics, are easier to detect and evaluate. Some are harder to volume 41, pages 665–695, 2015. find suitable and frequent indicators for and thus, to [11] Amir E. Khandani, Adlar J. Kim, and Andrew dampen. Gendered word pairs and gendered names Lo. Consumer credit-risk models via machine- are frequent enough in textual data to allow us to suc- learning algorithms. Journal of Banking & Fi- cessfully measure it in different ways and project the nance, 34(11):2767–2787, 2010. word embeddings away from the subspace occupied by gender. Other types of bias don’t always have a list [12] Omer Levy and Yoav Goldberg. Neural word em- of word pairs to fall back on to do the same. But bedding as implicit matrix factorization. In NIPS, using names, as we see here, we can measure and de- 2013. tect the different biases anyway and then project the embedding away from. In this work we also see how [13] Omer Levy and Yoav Goldberg. Linguistic reg- a weighted variant of propection removes bias while ularities of sparse and explicit word representa- retaining best the inherent structure of the word em- tions. In CoNLL, 2014. bedding. [14] Edward Loper and Steven Bird. Nltk: The natu- ral language toolkit. In Proceedings of the ACL- References 02 Workshop on Effective Tools and Method- [1] https://www.ssa.gov/oact/babynames/. ologies for Teaching Natural Language Process- ing and Computational Linguistics - Volume 1, [2] http://www.babynamewizard.com. ETMTNLP ’02, pages 63–70, Stroudsburg, PA, USA, 2002. Association for Computational Lin- [3] T Bolukbasi, K W Chang, J Zou, V Saligrama, guistics. and A Kalai. Man is to computer programmer as woman is to homemaker? debiasing word em- [15] Tomas Mikolov, Kai Chen, Greg Corrado, and beddings. In ACM Transactions of Information Jeffrey Dean. Efficient estimation of word rep- Systems, 2016. resentations in vector space. Technical report, arXiv:1301.3781, 2013. [4] T Bolukbasi, K W Chang, J Zou, V Saligrama, [16] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg and A Kalai. Quantifying and reducing bias in Corrado, and Jeffrey Dean. Distributed represen- word embeddings. 2016. tations of words and phrases and their composi- tionality. In NIPS, pages 3111–3119, 2013. [5] Tim Brennan, William Dieterich, and Beate Ehret. Evaluating the predictive validity of the [17] Jeffrey Pennington, Richard Socher, and Christo- compas risk and needs assessment system. Crim- pher D. Manning. Glove: Global vectors for word inal Justice and Behavior, 36(1):21–40, 2009. representation. 2014. [6] Kaylee Burns, Lisa Anne Hendricks, Trevor Dar- [18] Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente rell, and Anna Rohrbach. Women also snowboard: Ordonez, and Kai-Wei Chang. Men also like shop- Overcoming bias in captioning models. CoRR, ping: Reducing gender bias amplification using abs/1803.09797, 2018. corpus-level constraints. CoRR, abs/1707.09457, 2017. [7] Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186, 2017. [8] Christiane Fellbaum. WordNet: An Electronic Lexical Database. Bradford Books, 1998. [9] L Finkelstein, E Gabrilovich, Y Matias, E Rivlin, Z Solan, G Wolfman, and etal. Placing search in context : The concept revisited. In ACM Trans- actions of Information Systems, volume 20, pages 116–131, 2002.
Sunipa Dev, Jeff Phillips Supplementary material for: male/female words or names. Attenuating Bias in Word Career : { executive, management, professional, cor- Embeddings poration, salary, office, business, career } Family : { home, parents, children, family, cousins, marriage, wedding, relatives } A Bias in different embeddings Male names : { john, paul, mike, kevin, steve, greg, We explore here how gender bias is expressed across jeff, bill } different embeddings, datasets and embedding mecha- Female names : { amy, joan, lisa, sarah, diana, kate, nisms. Similar patterns are reflected across all as seen ann, donna } in Figure 7. Male words : { male, man, boy, brother, he, him, his, For this verification of the permeative na- son } ture of bias across datasets and embeddings, we use the GloVe embeddings of a Wikipedia Female words : { female, woman, girl, she, her, hers, dump (dumps.wikimedia.org/enwiki/latest/ daughter } enwiki-latest-pages-articles.xml.bz2, 4.7B tokens) Common Crawl (840B tokens, 2.2M vo- cab) and Twitter (27B tokens, 1.2M vocab) from https://nlp.stanford.edu/projects/glove/, C DETECTING THE GENDER and the WordToVec embedding of Google DIRECTION News (100B tokens, 3M vocab) from https: //code.google.com/archive/p/word2vec/. For this, we take a set of gendered word pairs as listed in Table 3. From our default Wikipedia dataset, using B WORD EMBEDDING the embedded vectors for these word pairs (i.e., (woman ASSOCIATION TEST - man), (she - he), etc), we create a basis for the sub- space F , of dimension 10. We then try to understand the distribution of variance in this subspace. To do Word Embedding Association Test (WEAT) was de- so, we project the entire dataset onto this subspace F , fined as an analogue to Implicit Association Test (IAT) and take the SVD. The top chart in Figure 10 shows by Caliskan et al. [7]. It checks for human like bias the singular values of the entire data in this subspace associated with words in word embeddings. For ex- F . We observe that there is a dominant first singu- ample, it found career oriented words (executive, ca- lar vector/value which is almost twice the size of the reer, etc) more associated with male names and male second value. After the this drop, the decay is signifi- gendered words (’man’,’boy’ etc) than female names cantly more gradual. This suggests to use only the top and gendered words and family oriented words (’fam- singular vector of F as the gender subspace, not 2 or ily’,’home’ etc) more associated with female names and more of these vectors. words than male. We list a set of words used for WEAT by Calisan et al. and that we used in our work To grasp how much of this variation is from the correla- below. tion along the gender direction, and how much is just random variation, we repeat this experiment, again For two sets of target words X and Y and attribute in Figure 10, with different ways of creating the sub- words A and B, the WEAT test statistic is : P space F . First in chart (b), we generate 10 vectors, s(X, Y, A, B) = x∈X s(x, A, B) − s(y, A, B) with one word chosen randomly, and one chosen from the gendered set (e.g., chair-woman). Second in chart where, (c), we generate 10 vectors between two random words s(w, A, B) = meana∈A cos(a, w) − meanb∈B cos(b, w) from our set of the 100,000 most frequent words; these and, cos(a, b) is the cosine distance between vector a are averaged over 100 random iterations due to higher and b. variance in the plots. Finally in chart (d), we gen- erate 10 random unit vectors in R300 . We observe This score is normalized by std−devw∈X∪Y s(w, A, B). ¯ that the pairs with one gendered vector in each pair So, closer to 0 this value is, the less bias or preferential still exhibits a significant drop in singular values, but association target word groups have to the attribute not as drastic as with both pairs. The other two ap- word groups. proaches have no significant drop since they do not Here target words are occupation words or ca- in general contain a gendered word with interesting reer/family oriented words and attributes are subspace. All remaining singular values, and their de-
Attenuating Bias in Word Vectors man he john avg male captain captain captain footballer doctor attorney programmerfootballer programmer footballercaptain programmer programmer doctor attorney footballer doctor attorney attorney doctor homemaker receptionist receptionist dancer dancer homemaker maid dancerreceptionist homemaker receptionist homemaker nurse nurse dancer maid maid nurse nurse maid woman she mary avg female (a) man he john avg male captain captain attorney footballer captain footballer captainattorney attorney programmer footballer footballerdoctor programmerdoctor doctor programmer dancer receptionist programmer homemaker dancer homemaker nurse homemaker attorney nurse receptionist nurse maid doctor receptionist dancer nurse maid maid maid homemaker receptionist dancer woman she mary avg female (b) man he john avg male captain captain captain captain footballer footballer programmer footballer programmerfootballer attorney programmer programmer attorney attorney doctor doctor attorney doctor dancer doctor receptionist nurse maid receptionist receptionist receptionist homemaker homemaker dancer dancer dancer maid homemaker homemaker maid nurse nurse nurse maid woman she mary avg female (c) man he john avg male captain captain captain footballer footballer footballer programmer attorney captain doctor programmer maid programmer programmer footballer doctor doctor attorney doctor attorney dancernurse attorney receptionist homemaker dancer dancer dancer receptionist homemaker maid nurse receptionist maid maid nurse receptionist homemaker nurse homemaker woman she mary avg female (d) Figure 7: Gender Bias in Different Embeddings : (a) GloVeon Wikipedia, (b) GloVeon Twitter, (c) GloVeon Common Crawl and (d) WordToVecon GoogleNews datasets.
Sunipa Dev, Jeff Phillips avg male avg male avg male avg male strong important importantstrong confident powerful intelligent intelligentstrong confidentintelligent powerfulconfident confident shypowerful uglyintelligent powerfulimportant strong important homely shy ugly uglyshy homely shy homely homely uglyglamorous glamorousbeautiful beautiful beautiful glamorous glamorous beautiful avg female avg female avg female avg female (a) (b) (c) (d) Figure 8: Bias in adjectives along the gender direction : (a) GloVe on default Wikipedia dataset, (b) GloVe on Common Crawl (840B token dataset), (c) GloVe on Twitter dataset and (d) WordToVec on Google News dataset 0.5 0.6 0.5 0.4 0.4 0.3 0.3 0.2 European American Names European American Names 0.2 0.1 0.1 0.0 0.0 captain 0 2 4 6 8 0 2 4 6 8 0.35 0.35 0.30 0.30 0.25 0.25 0.20 0.20 0.15 doctor 0.15 0.10 dancer 0.10 0.05 0.05 captainattorney nurse 0.00 0.00 attorney 0 2 4 6 8 0 2 4 6 8 doctor nurse footballer Figure 10: Fractional singular values for (a) male- dancerprogrammer programmerreceptionist female word pairs (b) one gendered word - one random homemaker maidreceptionist maid word (c) random word pair (d) random unit vectors footballer homemaker Hispanic Names Hispanic Names cay appears similar to the non-leading ones from the (a) (b) European American Names European American Names charts (a) and (b). This further indicates that there is roughly one important gender direction, and any related subspace is not significantly different than a random one in the word-vector embedding. captain Now, for any word w in vocabulary W of the embed- doctor ding, we can define wB as the part of w along the captaindoctor attorney gender direction. programmernurse dancer maid Based on the experiments shown in Figure 10, it is jus- nurse dancerreceptionist attorney tified to take the gender direction as the (normalized) footballer programmer homemaker maid first right singular vector, vB , or the full data set data receptionist projected onto the subspace F . Then, the component footballer African American Names of a word vector w along vB is simply hw, vB ivB . African American Names (c) (d) Calculating this component when the gender subspace is defined by two or more of the top right singular Figure 9: Racial bias in different embeddings : Oc- vectors of V can be done similarly. cupation words along the European American - His- We should note here that the gender subspace defined panic axis in GloVe embeddings of (a) Common Crawl here passes through the origin. Centering the data and (b) Twitter Dataset and the European American and using PCA to define the gender subspace lets the - African American axis in GloVe embeddings of (a) gender subspace not pass through the origin. We see Common Crawl and (b) Twitter Dataset a comparison in the two methods in Section 5 as HD uses PCA and we use SVD to define the gender direc- tion.
Attenuating Bias in Word Vectors D Word Lists policeman policewoman postman postwoman D.1 Word Pairs used for Flipping postmaster postmistress priest priestess actor actress prince princess author authoress prophet prophetess bachelor spinster proprietor proprietress boy girl shepherd shepherdess brave squaw sir madam bridegroom bride son daughter brother sister son-in-law daughter-in-law conductor conductress step-father step-mother count countess step-son step-daughter czar czarina steward stewardess dad mum sultan sultana daddy mummy tailor tailoress duke duchess uncle aunt emperor empress usher usherette father mother waiter waitress father-in-law mother-in-law washerman washerwoman fiance fiancee widower widow gentleman lady wizard witch giant giantess god goddess governor matron D.2 Occupation Words grandfather grandmother grandson granddaughter detective he she ambassador headmaster headmistress coach heir heiress officer hero heroine epidemiologist him her rabbi himself herself ballplayer host hostess secretary hunter huntress actress husband wife manager king queen scientist lad lass cardiologist landlord landlady actor lord lady industrialist male female welder man woman biologist manager manageress undersecretary manservant maidservant captain masseur masseuse economist master mistress politician mayor mayoress baron milkman milkmaid pollster millionaire millionairess environmentalist monitor monitress photographer monk nun mediator mr mrs character murderer murderess housewife nephew niece jeweler papa mama physicist poet poetess hitman
Sunipa Dev, Jeff Phillips geologist novelist painter senator employee collector stockbroker goalkeeper footballer singer tycoon acquaintance dad preacher patrolman trumpeter chancellor colonel advocate trooper bureaucrat understudy strategist paralegal pathologist philosopher psychologist councilor campaigner violinist magistrate priest judge cellist illustrator hooker surgeon jurist nurse commentator missionary gardener stylist journalist solicitor warrior scholar cameraman naturalist wrestler artist hairdresser mathematician lawmaker businesswoman psychiatrist investigator clerk curator writer soloist handyman servant broker broadcaster boss fisherman lieutenant landlord neurosurgeon housekeeper protagonist crooner sculptor archaeologist nanny teenager teacher councilman homemaker attorney cop choreographer planner principal laborer parishioner programmer therapist philanthropist administrator waiter skipper barrister aide trader chef swimmer gangster adventurer astronomer monk educator bookkeeper lawyer radiologist midfielder columnist evangelist banker
Attenuating Bias in Word Vectors neurologist technician barber nun policeman instructor assassin alderman marshal analyst waitress chaplain artiste inventor playwright lifeguard electrician bodyguard student bartender deputy surveyor researcher consultant caretaker athlete ranger cartoonist lyricist negotiator entrepreneur promoter sailor socialite dancer architect composer mechanic president entertainer dean counselor comic janitor medic firebrand legislator sportsman salesman anthropologist observer performer pundit crusader maid envoy archbishop trucker firefighter publicist vocalist commander tutor professor proprietor critic restaurateur comedian editor receptionist saint financier butler valedictorian prosecutor inspector sergeant steward realtor confesses commissioner bishop narrator shopkeeper conductor ballerina historian diplomat citizen parliamentarian worker author pastor sociologist serviceman photojournalist filmmaker guitarist sportswriter butcher poet mobster dentist drummer statesman astronaut minister protester dermatologist custodian
Sunipa Dev, Jeff Phillips maestro African American : { darnell, hakim, jermaine, pianist kareem, jamal, leroy, tyrone, rasheed, yvette, malika, pharmacist latonya, jasmine } chemist Hispanic : { alejandro, pancho, bernardo, pedro, pediatrician octavio, rodrigo, ricardo, augusto, carmen, katia, lecturer marcella , sofia } foreman cleric musician D.5 Names used for Age related Bias cabbie Detection and Dampening fireman farmer Aged : { ruth, william, horace, mary, susie, amy, headmaster john, henry, edward, elizabeth } soldier Youth : { taylor, jamie, daniel, aubrey, alison, carpenter miranda, jacob, arthur, aaron, ethan } substitute director cinematographer warden marksman congressman prisoner librarian magician screenwriter provost saxophonist plumber correspondent organist baker doctor constable treasurer superintendent boxer physician infielder businessman protege D.3 Names used for Gender Bias Detection Male : { john, william, george, liam, andrew, michael, louis, tony, scott, jackson } Female : { mary, victoria, carolina, maria, anne, kelly, marie, anna, sarah, jane } D.4 Names used for Racial Bias Detection and Dampening European American : { brad, brendan, geoffrey, greg, brett, matthew, neil, todd, nancy, amanda, emily, rachel }
You can also read