Analogies and Feature Attributions for Model Agnostic Explanation of Similarity Learners
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Analogies and Feature Attributions for Model Agnostic Explanation of Similarity Learners Karthikeyan Natesan Ramamurthy*, Amit Dhurandhar*, Dennis Wei IBM Research, Yorktown Heights, NY USA 10598 {knatesa, adhuran, dwei}@us.ibm.com arXiv:2202.01153v1 [cs.LG] 2 Feb 2022 Zaid Bin Tariq† Rensselaer Polytechnic Institute, Troy, NY USA 12180 zaidtariq4@gmail.com Abstract Post-hoc explanations for black box models have been studied extensively in classification and regression settings. However, explanations for models that output similarity between two inputs have received comparatively lesser attention. In this paper, we provide model agnostic local explanations for similarity learners applicable to tabular and text data. We first propose a method that provides feature attributions to explain the similarity between a pair of inputs as determined by a black box similarity learner. We then propose analogies as a new form of explanation in machine learning. Here the goal is to identify diverse analogous pairs of examples that share the same level of similarity as the input pair and provide insight into (latent) factors underlying the model’s prediction. The selection of analogies can optionally leverage feature attributions, thus connecting the two forms of explanation while still maintaining complementarity. We prove that our analogy objective function is submodular, making the search for good-quality analogies efficient. We apply the proposed approaches to explain similarities between sentences as predicted by a state-of-the-art sentence encoder, and between patients in a healthcare utilization application. Efficacy is measured through quantitative evaluations, a careful user study, and examples of explanations. 1 Introduction The goal of a similarity function is to quantify the similarity between two objects. The learning of similarity functions from labeled examples, or equivalently distance functions, has traditionally been studied within the area of similarity or metric learning [31]. With the advent of deep learning, learning complex similarity functions has found its way into additional important applications such as health care informatics, face recognition, handwriting analysis/signature verification, and search engine query matching. For example, learning pairwise similarity between patients in Electronic Health Records (EHR) helps doctors in diagnosing and treating future patients [57]. Although deep similarity models may better quantify similarity, the complexity of these models could make them harder to trust. For decision-critical systems like patient diagnosis and treatment, it would be helpful for users to understand why a black box model assigns a certain level of similarity to two objects. Providing explanations for similarity models is therefore an important problem. ML model explainability has been studied extensively in classification and regression settings. Local explanations in particular have received a lot of attention [40, 34, 43] given that entities (viz. individuals) are primarily interested in understanding why a certain decision was made for them, and building globally interpretable surrogates for a black box model is much more challenging. Local explanations can uncover potential issues such as reliance on unimportant or unfair factors in a region of the input space, hence aiding in model debugging. Appropriately aggregating local explanations can also provide reasonable global understanding [50, 39]. * The first two authors contributed equally. † Work done while interning at IBM Research. 1
Input: (black box distance: 0.19) a) As well as the dolphin scheme, the chaos has allowed foreign companies to engage in damaging logging and fishing operations without proper monitoring or export controls. b) Internal chaos has allowed foreign companies to set up damaging commercial logging and fishing operations without proper moni- toring or export controls. Analogy: (black box distance: 0.21) a) Singapore is already the United States' 12th-largest trading part- ner, with two-way trade totaling more than $34 billion. b) Although a small city-state, Singapore is the 12th-largest trading partner of the United States, with trade volume of $33.4 billion last year. Figure 1: Our feature-based and analogy-based explanations of similarity between two input sentences. The former is represented by shading (darker is more important) derived from row sums of the matrix in Figure 2b. The latter suggests that the presence of more context in one of the sentences – dolphin scheme in the input, Singapore being a small city-state in the analogy – is important in explaining the black box’s similarity score, whereas details such as the particular scheme or words such as “foreign companies” or “fishing operations” may not be required. In this paper, we develop model-agnostic local explanation methods for similarity learners, which is a relatively under-explored area. Given a black box similarity learner and a pair of inputs, our first method produces feature attributions for the output of the black box. We discuss why the direct application of LIME [40] and other first-order methods is less satisfactory for similarity models. We then propose a quadratic approximation using Mahalanobis distance. A simplified example of the output is shown as shading in Figure 1. Our second contribution is to propose a novel type of explanation in the form of analogies for a given input pair. The importance of analogy-based explanations was recently advocated for by Hullermeier [25]. The proposed feature- and analogy-based explanations compliment each other well where humans may prefer either or both together for a more complete explanation as alluded to in cognitive science [26]. We formalize analogy-based explanations with an objective that captures the intuitive desiderata of (1) closeness in degree of similarity to the input pair, (2) diversity among the analogous pairs, and (3) a notion of analogy, i.e., members of each analogous pair have a similar relationship to each other as members of the input pair. We prove that this objective is submodular, making it efficient to find good analogies within a large dataset. An example analogy is shown in Figure 1. The analogy is understandable since, in the input one of the sentences provides more context (i.e. presence of a fraudulent scheme called “dolphin scheme"), similar to the analogy where Singapore being a “small city-state" is the additional context. This thus suggests that analogies can uncover appropriate (latent) factors to explain predictions, which may not be apparent from explicit features such as words/phrases. The proposed feature- and analogy-based methods are applied to text and tabular data, to explain similarities between i) sentences from the Semantic Textual Similarity (STS) dataset [9], ii) patients in terms of their healthcare utilization using Medical Expenditure Panel Survey (MEPS) data, and iii) iris species (IRIS). The proposed methods outperform feature- and exemplar-based baselines in both quantitative evaluation and user study showing high fidelity to the black box similarity learner and providing reasons that users find sensible. We also present examples of explanations and illustrate specific insights. 2 Problem Description Given a pair of examples x = (x1 , x2 ) ∈ Rm ⊗ Rm , where m is the dimension of the space, and a black box model δBB (.) : Rm ⊗ Rm 7→ R, our goal is to “explain” the prediction δBB (x) of the black box model. One type of explanation 2
takes the form of a sparse set of features (i.e. m if m is large) that are most important in determining the output, together possibly with weights to quantify their importance. An alternative form of explanation consists of other example pairs that have the same (or similar) output from the black box model as the input pair. The latter constitutes a new form of (local) explanation which we term as analogy-based explanation. Although these might seem to be similar to exemplar-based explanations [19], which are commonly used to locally explain classification models, there is an important difference: Exemplars are typically close to the inputs they explain, whereas analogies do not have to be. What is desired is for the relationship between members of each analogous pair to be close to the relationship of the input pair (x1 , x2 ). 3 Related Work A brief survey of local and global explanation methods are available in Appendix A. We are aware of only a few works that explain similarity models [55, 38, 56], all of which primarily apply to image data. Further, they either require white-box access or are based on differences between saliency maps. Our methods on the other hand are model agnostic and apply to tabular and text data, as showcased in the experiments. The Joint Search LIME (JSLIME) method, proposed in [20] for model-agnostic explanations of image similarity, has parallels to our feature-based explanations. JSLIME is geared toward finding corresponding regions between a query and a retrieved image, whereas our method explains the distance predicted by a similarity model by another, has a simpler distance function and is more natural for tabular data (See Appendix E.1 for more details). There is a rich literature on similarity/metric learning methods, see e.g. [31] for a survey. However, the goal in these works is to learn a global metric from labeled examples. The labels may take the form of real-valued similarities or distances (regression similarity learning) [52]; binary similar/dissimilar labels [49], which may come from set membership or consist of pairwise “must-link” or “cannot-link” constraints [54]; or triplets (x, y, z) where y is more similar to x than z (contrastive learning) [42, 23]. Importantly, the metric does not have to be interpretable like in recent deep learning models. In our setting, we are given a similarity function as a black box and we seek local explanations. Hence, the two problems are distinct. Mathematically, our feature-based method belongs to the regression similarity learning category [28], but the supervision comes from the given black box. Note our notion of analogies is different from analogy mining [24], where representations are learnt from datasets to retrieve information with a certain intent. 4 Explanation Methods We propose two methods to explain similarity learners. The first is a feature-based explanation, while the second is a new type of explanation termed as analogy-based explanation. The two explanations complement each other, while at the same time are also related as the analogy-based explanation can optionally use the output of the feature-based method as input pointing to synergies between the two. 4.1 Feature-Based Similarity Explanations We assume that the black box model δBB (x, y) is a distance function between two points x and y, i.e., smaller δBB (x, y) implies greater similarity. We do not assume that δBB satisfies all four axioms of a metric, although the the proposed local approximation is a metric and may be more suitable if δBB satisfies some of the axioms. Following post-hoc explanations of classifiers and regressors, a natural way to obtain a feature-based explanation of δBB (x, y) is to regard it as a function of a single input – the concatenation of (x, y). Then LIME [40] or other first-order gradient-based methods [44] can produce a local linear approximation of δBB (x, y) at (x, y) of the form gxT ∆x + gyT ∆y. This approach cannot create interactions and thus cannot provide explanations in terms of distances between elements of x and y, e.g. (xj − yj )2 or |xj − yj |, which are necessarily nonlinear. We thus propose to locally approximate δBB (x, y) with a quadratic model, the Mahalanobis distance δI (x, y) = (x̄ − ȳ)T A(x̄ − ȳ), where A 0 is a positive semidefinite matrix and x̄, ȳ are interpretable representations of x, y (see [40] and note that x̄ = x, ȳ = y if the features in x, y are already interpretable). This simple, interpretable approximation is itself a distance between x and y. In Appendix E.2, we discuss the equivalence between explaining distances and similarities. In Section 5.1, we show qualitative examples for how elements of learned A can explain similarities. We learn A by minimizing the following loss over a set of perturbations (xi , yi ) in the neighborhood Nxy 3
of the input pair (x, y): X 2 min wxi ,yi δBB (xi , yi ) − (x̄i − ȳi )T A(x̄i − ȳi ) . (1) A
term G(zi , x), we use 2 G(zi , x) = D(zi , x) + α (δI (zi ) − δI (x)) , (3) T φ(zi2 ) − φ(zi1 ) φ(x2 ) − φ(x1 ) D(zi , x) = 1 − . (4) kφ(zi2 ) − φ(zi1 )kkφ(x2 ) − φ(x1 )k In (3), δI (x) = (x̄1 − x̄2 )T A(x̄1 − x̄2 ) is the distance predicted by the feature-based explanation of Section 4.1. The inclusion of this term with weight α > 0 may be helpful if the feature-based explanation is faithful and we wish to directly interpret the analogies. The term D(zi , x) is the cosine distance between the directions φ(zi2 ) − φ(zi1 ) and φ(x2 ) − φ(x1 ) in an embedding space. Here φ is an embedding function that can be the identity or chosen independently of the black box, hence preserving the model-agnostic nature of the interpretations. The intuition is that these directions capture aspects of the relationships between zi1 , zi2 and between x1 , x2 . We will hence refer to this as direction similarity. In summary, the terms in (2)—(4) together are aimed at producing faithful, intuitive and diverse analogies as explanations. Let f ({z1 , . . . , zk }) denote the objective in (2). We prove the following in Appendix B. Lemma 4.1. The objective function f in (2) is submodular. Given that our function is submodular, we can use well-known minimization methods to find a k-sparse solution with approximation guarantees [48]. 5 Experiments We present first in Section 5.1 examples of explanations obtained with our proposed methods, to illustrate insights that may be derived. Our formal experimental study consists of both a human evaluation to investigate the utility of different explanations (Section 5.2) as well as quantitative analysis (Section 5.3) which were run with embarrassing parallelization on a 32 core/64 GB RAM Linux machine or on a 56 core/242 GB RAM machine for larger experiments. 5.1 Qualitative Examples We discuss examples of the proposed feature-based explanations with full A matrix (FbFull) and analogy-based explanations (AbE), using the Semantic Textual Similarity (STS) benchmark dataset1 [9] described below. STS dataset: The dataset has 8628 sentence pairs, divided into training, validation, and test sets. Each pair has a ground truth semantic similarity score that we convert to a distance. For the black box similarity model δBB (x, y), we use the cosine distance between the embeddings of x and y produced by the universal sentence encoder2 [10]. It is possible to learn a distance on top of these embeddings, but we find that the Pearson correlation of 0.787 between the cosine distances and true distances is already competitive with the STS benchmarks [51]. The corresponding mean absolute error is 0.177. In any case, our methods are agnostic to the black box model. AbE hyperparameters: In all experiments, we set α = 0 to assess the value of AbE independent of feature-based explanations. λ1 and λ2 were selected once per dataset (not tuned per example) by evaluating the average fidelity of the analogies to the input pairs in terms of the black box model’s predictions, along with manually inspecting a random subset of analogies to see how intuitive they were. With STS, we get λ1 = 0.5, λ2 = 0.01 (Appendix G has more details). Analogies from baseline methods are in Appendix K, ablation studies in which terms are removed from (2) are provided in Appendix L, and analogies with the tabular MEPS dataset are in Appendix M. Example 1: We start with a simple pair of sentences. (a) A man is playing a harp. (b) A man is playing a keyboard. δBB (x, y) = 0.38. This pair was assigned a distance of 0.38 by the black box (BB) similarity model. FbFull approximates the above distance by the Mahalanobis distance (x̄ − ȳ)T A(x̄ − ȳ). For STS, the interpretable representation x̄ is a binary vector with each component x̄j indicating whether a word is present in the sentence. We define the distance contribution matrix C whose elements Cjk := (x̄j − ȳj )Ajk (x̄k − ȳk ) sum up to the Mahalanobis distance. The distance contributions Cjk for Example 1 are shown in Figure 2a. Since the substitution of “keyboard” for “harp” is the only difference between the sentences, these are the only rows/columns with non-zero entries. A diagonal element Cjj is the contribution due to 1 https://ixa2.si.ehu.eus/stswiki/index.php/STSbenchmark 2 https://tfhub.dev/google/universal-sentence-encoder/4 5
0.5 0.14 0.0175 keyboard 0.4 engage re 0.12 0.0150 harp 0.3 dolphin i 0.10 the 0.0125 playing 0.2 0.08 0.1 as 0.0100 want man well 0.06 0.0 0.0075 going is up 0.04 0.1 scheme 0.0050 guess 0.02 a 0.2 set 0.0025 next 0.00 keyboard harp playing man is a 0.3 engage dolphin the as well up scheme set 0.0000 re next i want going guess 0.02 (a) Example 1 (b) Example 2 (c) Example 3 Figure 2: Contributions to distance based on feature-based explanation with full A matrix (FbFull). The words are ordered by decreasing order of their contributions to the distance. Note that contribution of a word is the sum of all columns (rows) corresponding to that row (column). Also in Figure 2c, the first word is ’re a short of the word are. See Example 3, sentence (b) in Section 5.1 for details. one sentence having word j and the other lacking it (e.g. x̄j = 1, ȳj = 0). The diagonal elements are partially offset by negative off-diagonal elements Cjk , which represent a contribution due to substituting word j (x̄j = 1, ȳj = 0) for word k (x̄k = 0, ȳk = 1). Presumably this offset occurs because harp and keyboard are both musical instruments and thus somewhat similar. AbE gives the following top three analogies: 1. (a) A guy is playing hackysack. (b) A man is playing a key-board. δBB (x, y) = 0.40. 2. (a) Women are running. (b) Two women are running. δBB (x, y) = 0.19. 3. (a) There’s no rule that decides which players can be picked for bowling/batting in the Super Over. (b) Yes a team can use the same player for both bowling and batting in a super over. δBB (x, y) = 0.59. The first analogy is very similar except that hackysack is a sport rather than a musical instrument. The sentences in the second pair are more similar than the input pair as reflected in the corresponding BB distance. The third analogy is less related (both sentences are about cricket player selection) with a larger BB distance. Example 2: Next we consider the pair of longer sentences from Figure 1. The BB distance between this pair is 0.19 so they are closer than in Example 1. The two sentences are mostly the same but the first one adds context about an additional dolphin scheme. In addition to the analogy shown in Figure 1, the other two top analogies from AbE are: 1. (a) The American Anglican Council, which represents Episcopalian conservatives, said it will seek authorization to create a separate province in North America because of last week’s actions. (b) The American Anglican Council, which represents Episcopalian conservatives, said it will seek authorization to create a separate group. δBB (x, y) = 0.18. 2. (a) A Stage 1 episode is declared when ozone levels reach 0.20 parts per million. (b) The federal standard for ozone is 0.12 parts per million. δBB (x, y) = 0.44. The analogy in Figure 1 and the first analogy above are good matches because like the input pair, each analogous pair makes the same statement but one of the sentences gives more context (a group in North America and because of last week’s actions, Singapore being a small city-state). The second analogy is more distant (about two different ozone thresholds) but its BB distance is also higher. The distance contribution matrix given by FbFull is plotted in Figure 2b. For clarity, only rows/columns with absolute sum greater than 0.01 are shown. Several words with the largest contributions come from the additional phrase about the dolphin scheme. The substitution of the verb “set up” for “engage” is also highlighted. Example 3: The third pair is both more complex than Example 1 and less similar than Example 2: (a) It depends on what you want to do next, and where you want to do it. (b) I guess it depends on what you’re going to do. δBB (x, y) = 0.44. Figure 2c shows the distance contribution matrix produced by FbFull, again restricted to significant rows/columns. The most important contributions identified are the substitution of “[a]re going” for “want” and the addition of “I guess” in sentence b). Of minor importance but interesting to note is that the word “next” in sentence a) would have a larger contribution but it is offset by negative contributions from the (“next”, “going”) and (“next”, “guess”) entries. Both 6
Figure 3: % accuracies from our user study (higher values are better). The left, center and right figures are overall results, results for short sentences and results for long sentences respectively. The error bars are one standard error. We observe that overall our proposed AbE and FbbFull methods are (statistically) significantly better than other approaches. P-values based on paired t-tests for our approaches being (statistically) equal to the second best approaches confirm this and are as follows: AbE-PDash is 1.66 × 10−9 , FbFull-PDash is 2.92 × 10−5 , AbE-FbDiag is 3.28 × 10−9 and FbFull-FbDiag is 3.36 × 10−5 . Looking at specific cases we see that for short sentences DirSim was the most competitive (center figure), while for long sentences FbDiag was (right figure). However, our methods (AbE and FbFull) remain consistently better in both of these scenarios. “next” and “going” are indicative of future action. Below is the top analogy for Example 3: (a) I prefer to run the second half 1-2 minutes faster then the first. (b) I would definitely go for a slightly slower first half. δBB (x, y) = 0.45. Both sentences express the same idea (second half faster than first half) but in different ways, similar to the input pair. Two more analogies are discussed in Appendix K. 5.2 User Study We designed and conducted a human based evaluation to investigate five local explainability methods. Methods: Besides the proposed FbFull and AbE methods, three other approaches evaluated in the user study are: feature-based explanation with diagonal A (FbDiag); ProtoDash (PDash) [19], a state-of-the-art exemplar explanation method3 ; and Direction Similarity (DirSim), which finds analogies like AbE but using only the direction similarity term D(zi , x) in (4). Further comments on choice of methods is provided in Appendix O. Setup: For each pair of sentences in the STS dataset, users were instructed to use the provided explanations to estimate the similarity of the pair per a black box similarity model. As mentioned in Section 5.1, the black box model produces cosine distances in [0, 1] based on a universal sentence encoder [11]. To be more consumable to humans, the outputs of the black box model were discretized into three categories: Similar (0 − 0.25 distance), Somewhat similar (0.25 − 0.75 distance) and Dissimilar (> 0.75 distance). Users were asked to predict one of these categories or “can’t say” if they were unable to do so. Screenshots illustrating this are in Appendix Q and the full user study is attached in Appendix T. Predicting black box outputs is a standard procedure to measure efficacy of explanations [39, 40, 33]. In the survey, 10 pairs of sentences were selected randomly in stratified fashion from the test set such that four were similar, four were somewhat similar, and the remaining two were dissimilar as per the black box. This was done to be consistent with the distribution of sentence pairs in the dataset with respect to these categories. Also, half the pairs selected were short sentence pairs where the number of words in each sentence was typically ≤ 10, while for the remaining pairs (i.e. long sentence pairs) the numbers of words were typically closer to 20. This was done to test the explanation methods for different levels of complexity in the input, thus making our conclusions more robust. The users were blinded to which explanation method produced a particular explanation. The survey had 30 questions where each question corresponded to an explanation for a sentence pair. Given that there were 10 sentence pairs, we randomly chose three methods per pair, which mapped to three different questions. By randomizing the order in which the explanation methods were presented, we are able to mitigate order bias. For feature-based explanations, the output from the explanation model was provided along with a set of important words, corresponding to rows in the A matrix with the largest sums in absolute value. For analogy-based explanations, black box outputs were provided for the 3 We created analogies by selecting prototypes for each instance and then pairing them in order. 7
Table 1: Generalized infidelity (mean absolute error) of the outputs produced by the feature-based explanation methods to the black box models. We show mean ± standard error of the mean (SEM) for Iris and MEPS where 5-fold CV was performed. See Appendix I for more quantitative results. Lower values are better. Measure Dataset FbFull FbDiag LIME JSLIME Iris 0.676 ± 0.090 0.922 ± 0.116 1.093 ± 0.108 1.208 ± 0.146 Generalized MEPS 0.178 ± 0.005 0.140 ± 0.002 0.192 ± 0.002 0.150 ± 0.002 Infidelity STS 0.245 0.257 0.462 0.321 analogies only (not for the input sentence pair), selected from the STS dev set. We did this to allow the users to calibrate the black box relative to the explanations, and without which it would be impossible to estimate the similarity of the sentence pair in question. More importantly though, all this information would be available to the user in a real scenario where they are given explanations. We leveraged Google Forms for our study. 41 participants took it with most of them having backgrounds in data science, engineering and business analytics. We did this as recent work shows that most consumers of such explanations have these backgrounds [6]. To ensure good statistical power with 41 subjects, our study follows the alternating treatment design paradigm [4], commonly used in psychology, where treatments (explanation methods here) are alternated randomly even within a single subject (see also Appendix N.) Observations: Figure 3 presents a summary of our user study results. In the left figure (all sentences), we observe that AbE and FbFull significantly outperform both exemplar-based and feature-based baselines. AbE seems to be slightly better than FbFull; however, the difference is not statistically significant. While the results in Section 5.3 show that both of these methods have high fidelity, this was not known to the participants, who instead had to use the provided reasons (analogies or important words) to decide whether to accept the outputs of the explanation methods. The good performance of AbE and FbFull suggests that the provided reasons are sensible to users. For analogy-based explanations, using additional evidence in Appendix P, we demonstrate that the participants indeed used their judgement guided by the explanations to estimate the BB similarity. In the center figure (short sentences), DirSim is the closest competitor, which suggests that the black box model is outputting distances that accord with intuition. FbDiag does worst here, signaling the importance of looking at interactions between words. However in the right figure (long sentences), FbDiag is the closest competitor and DirSim is the worst, hinting that predicting the black box similarity becomes harder based on intuition and certain key words are important to focus on independent of everything else. We also solicited (optional) user feedback (provided in Appendix R). From the comments, it appeared that there were two main groups. One preferred analogies as they felt they gave them more information to make the decision. This is seen from comments such as “The examples [analogies] seem to be more reliable than the verbal reason [words].” There was support for having multiple diverse analogies to increase confidence in a prediction, as seen in “The range of examples may be useful, as some questions have all three examples in the same class.” While one would expect this benefit to diminish without diversity in the multiple analogies, this aspect was not explicitly tested. The second group felt the feature-based explanations were better given their precision and brevity. An example comment here was “I find the explanation with the difference between the sentences easier to reason about.” A couple of people also said that providing both the feature-based and analogy-based explanations would be useful as they somewhat complement each other and can help cross-verify one’s assessment. 5.3 Quantitative Experiments This section presents evaluations of the fidelity of various explanation methods with respect to the BB model’s outputs. Methods: In addition to the five local methods considered in Section 5.2, we evaluate a globally interpretable model, global feature-based full-matrix explanations (GFbFull), LIME [40], and Joint Search LIME (JSLIME) [20]. GFbFull uses a Mahalanobis model like in Section 4.1 but fit on the entire dataset instead of a perturbation neighborhood Nxy . To run GFbFull on the STS dataset, we chose only the top 500 words in the test set vocabulary according to tf-idf scores to limit the computational complexity. For all methods, explanations were generated using the test set and for AbE, DirSim, and PDash, we use the validation set to select the analogies. Data and Black Box Models: In addition to the STS dataset, we use two other datasets along with attendant black box models: UCI Iris [13] and Medical Expenditure Panel Survey (MEPS) [1]. The supplement has more details on 8
0.30 0.5 0.225 AbE AbE DirSim DirSim Mean absolute error Mean absolute error Mean absolute error 0.25 0.200 0.4 PDash PDash 0.175 0.20 0.150 0.3 0.15 0.125 0.2 AbE 0.100 0.10 0.075 0.1 DirSim 0.05 PDash 0.050 0.0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Number of analogies Number of analogies Number of analogies (a) Iris (b) MEPS (c) STS Figure 4: Mean absolute errors (infidelity) of the analogy explanation methods’ predictions with respect to the black box predictions for varying number of analogies. The solid lines are the mean over 5 CV folds, and the shaded areas show 1 standard error of the mean. Lower values are better. datasets, black box models, and neighborhoods for feature-based explanations. For Iris and MEPS, 5-fold cross-validation was performed. For Iris, pairs of examples were exhaustively enumerated and labeled as similar or dissimilar based on the species labels. A Siamese network was trained on these labeled pairs as the black box model δBB , achieving a mean absolute error (MAE) of 0.400 ± 0.044 with respect to the ground truth distances and Pearson’s r of 0.370 ± 0.164. For MEPS, we found that tree-based models worked better for this largely categorical dataset. So, we first trained a Random Forest regressor to predict healthcare utilization, achieving a test R2 value of 0.381 ± 0.017. The BB function δBB (x, y) was then obtained as the distance between the leaf embeddings [57] of x, y from the random forest. Note that δBB (x, y) is a distance function of two inputs, not a regression function of one input. Pairs of examples to be explained were generated by stratified random sampling based on δBB (., .) values. For feature-based explanations we chose 10000 pairs each from the validation and test set of each fold. For AbE, DirSim, and PDash, we chose 1000 pairs to limit the computational complexity. For AbE, we used λ1 = 1.0 and λ2 = 0.01 for both MEPS and Iris. For feature-based explanations, we present comparisons of generalized infidelity [39]. This tests generalization by computing the MAE between the black-box distance for an input instance pair and the explanation of the closest neighboring test instance pair. Table 1 shows the generalized infidelity for FbFull, FbDiag, LIME, and JSLIME with respect to the black box predictions. Since GFbFull computes global explanations, we cannot obtain this measure. Generalized fidelity computed using Pearson’s r and non-generalized MAE/Pearson’s r for all methods including GFbFull are presented in Appendix I. Appendix H also presents more descriptions of metrics used. From Table 1, FbFull/FbDiag have superior performance. This suggests that they provide better generalization in the neighborhood by virtue of Mahalanobis distance being a metric. We do not expect LIME to perform well as discussed in Section 4.1, but JSLIME also has poor performance since it likely overfits because of the lack of constraints on A. Since all the black box predictions are between 0 and 1, it is possible to compare these three datasets. The methods seem to perform best with MEPS, followed by STS and Iris. The MEPS dataset, even though the largest of the three has two advantages. The variables are binary (dummy coded from categorical) which possibly leads to better fitting explanations, and the search space for computing the generalized metric is large, which means that the likelihood of finding a neighboring test instance pair with a good explanation is high. For STS, the black box universal sentence encoder seems to agree with the local Mahalanobis distance approximation and to some extent even with the diagonal approximation. Iris has the worst performance possibly because the dataset is so small that a Siamese neural network cannot approximate the underlying similarity function well, and also because the search space for computing the generalized metric is quite small. The infidelity (MAE) of the analogy explanation methods (AbE, DirSim, and PDash) is illustrated in Figure 4. Given a set of analogies z1 , . . . , zk , the prediction of the explainer is computed as the average of the black box predictions δBB (z1 ), . . . , δBB (zk ) for the analogies. The AbE method dominates the other two baselines because of the explicit inclusion of the black box fidelity term in the objective. For Iris and STS, the MAE of AbE steadily increases with the number of analogies. This is expected because of the trade-off between the fidelity term and the diversity term in ((2)) as k increases. For MEPS, the MAE of AbE very slowly reduces and flattens out. This could be due to the greater availability of high-fidelity analogous pairs in MEPS. 9
6 Discussion We have provided (model agnostic) local explanations for similarity learners, in both the more familiar form of feature attributions as well as the more novel form of analogies. Experimental results indicate that the resulting explanations have high fidelity, appear useful to humans in judging the black box’s behavior, and offer qualitative insights. For the analogy-based method, the selection of analogies is significantly influenced by the analogy closeness term G(zi , x) in ((2)). Herein we have used direction similarity (4), which is convenient to compute given an embedding and appears to capture word and phrasing relations well in the STS dataset. It would be interesting to devise more sophisticated analogy closeness functions, tailored to the notion of analogy in a given context. It is also of interest to extend this work from explaining pairwise relationships to tasks such as ranking. We thus hope that the approaches developed here could become meta-approaches for handling multiple types of relationships. References [1] Medical Expenditure Panel Survey (MEPS). https://www.ahrq.gov/data/meps.html. Content last reviewed August 2018. Agency for Healthcare Research and Quality, Rockville, MD. [2] Akshay Agrawal, Robin Verschueren, Steven Diamond, and Stephen Boyd. A rewriting system for convex optimization problems. Journal of Control and Decision, 5(1):42–60, 2018. [3] Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015. [4] David H Barlow and Steven C Hayes. Alternating treatments design: One strategy for comparing the effects of two treatments in a single subject. Journal of applied behavior analysis, 12(2):199–210, 1979. [5] Osbert Bastani, Carolyn Kim, and Hamsa Bastani. Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504, 2017. [6] Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Yunhan Jia Ankur Taly, Joydeep Ghosh, Ruchir Puri, José M. F. Moura, and Peter Eckersley. Explainable machine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 2020. [7] Cristian Buciluǎ, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006. [8] Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, pages 1721–1730, New York, NY, USA, 2015. ACM. [9] Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055, 2017. [10] Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo- Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. Universal sentence encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 169–174, Brussels, Belgium, November 2018. Association for Computational Linguistics. [11] Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo- Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. Universal sentence encoder. arXiv preprint arXiv:1803.11175, 2018. [12] Sanjeeb Dash, Oktay Günlük, and Dennis Wei. Boolean decision rules via column generation. Advances in Neural Information Processing Systems, 2018. [13] Dua Dheeru and Efi Karra Taniskidou. UCI machine learning repository, 2017. [14] Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, and Payel Das. Explanations based on the missing: Towards contrastive explanations with pertinent negatives. In Advances in Neural Information Processing Systems 31. 2018. [15] Amit Dhurandhar, Karthikeyan Shanmugam, and Ronny Luss. Enhancing simple models by exploiting what they already know. Intl. Conference on Machine Learning (ICML), 2020. [16] Amit Dhurandhar, Karthikeyan Shanmugam, Ronny Luss, and Peder Olsen. Improving simple models with confidence profiles. Advances of Neural Inf. Processing Systems (NeurIPS), 2018. 10
[17] Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016. [18] R. Guidotti, A. Monreale, S. Matwin, and D. Pedreschi. Black box explanation by learning image exemplars in the latent feature space. In In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2019. [19] Karthik Gurumoorthy, Amit Dhurandhar, Guillermo Cecchi, and Charu Aggarwal. Protodash: Fast interpretable prototype selection. IEEE ICDM, 2019. [20] Mark Hamilton, Scott Lundberg, Lei Zhang, Stephanie Fu, and William T Freeman. Model-agnostic explainability for visual search. arXiv preprint arXiv:2103.00370, 2021. [21] Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, Bernt Schiele, and Trevor Darrell. Generating visual explanations. In European Conference on Computer Vision, 2016. [22] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In https://arxiv.org/abs/1503.02531, 2015. [23] Elad Hoffer and Nir Ailon. Deep metric learning using triplet network. In Aasa Feragen, Marcello Pelillo, and Marco Loog, editors, Similarity-Based Pattern Recognition, pages 84–92, Cham, 2015. Springer International Publishing. [24] Tom Hope, Joel Chan, Aniket Kittur, and Dafna Shahaf. Accelerating innovation through analogy mining. In Proceedings of Knowledge Discovery and Data Mining, 2017. [25] E. Hullermeier. Towards analogy-based explanations in machine learning. arXiv:2005.12800, 2020. [26] John E Hummel, John Licato, and Selmer Bringsjord. Analogy, explanation, and proof. Frontiers in human neuroscience, 8:867, 2014. [27] Tsuyoshi Idé and Amit Dhurandhar. Supervised item response models for informative prediction. Knowl. Inf. Syst., 51(1):235– 257, April 2017. [28] Purushottam Kar and Prateek Jain. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25, pages 215–223, 2012. [29] Been Kim, Rajiv Khanna, and Oluwasanmi Koyejo. Examples are not enough, learn to criticize! criticism for interpretability. In In Advances of Neural Inf. Proc. Systems, 2016. [30] Andreas Krause. Sfo: A toolbox for submodular function optimization. Journal of Machine Learning Research, 11:1141–1144, 2010. [31] B. Kulis. Metric learning: A survey. Foundations and Trends® in Machine Learning, 2013. [32] O. Lampridis, R. Guidotti, and S. Ruggieri. Explaining sentiment classification with synthetic exemplars and counter-exemplars. In In International Conference on Discovery Science, 2020. [33] Zachary C Lipton. The mythos of model interpretability. arXiv preprint arXiv:1606.03490, 2016. [34] Scott Lundberg and Su-In Lee. Unified framework for interpretable methods. In In Advances of Neural Inf. Proc. Systems, 2017. [35] Ronny Luss, Pin-Yu Chen, Amit Dhurandhar, Prasanna Sattigeri, Yunfeng Zhang, Karthikeyan Shanmugam, and Chun-Chen Tu. Leveraging latent features for local explanations. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 1139–1149, 2021. [36] Nishtha Madaan, Inkit Padhi, Naveen Panwar, and Diptikalyan Saha. Generate your counterfactuals: Towards controlled counterfactual generation for text. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13516–13524, 2021. [37] Ramaravind Kommiya Mothilal, Amit Sharma, and Chenhao Tan. Explaining machine learning classifiers through diverse counterfactual explanations. arXiv preprint arXiv:1905.07697, 2019. [38] Bryan A. Plummer, Mariya I. Vasileva, Vitali Petsiuk, Kate Saenko, and David Forsyth. Why do these match? explaining the behavior of image similarity models. In ECCV, 2020. [39] Karthikeyan Ramamurthy, Bhanu Vinzamuri, Yunfeng Zhang, and Amit Dhurandhar. Model agnostic multilevel explanations. In In Advances in Neural Information Processing Systems, 2020. [40] Marco Ribeiro, Sameer Singh, and Carlos Guestrin. "why should i trust you?” explaining the predictions of any classifier. In ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining, 2016. [41] Cynthia Rudin. Please stop explaining black box models for high stakes decisions. NIPS Workshop on Critiquing and Correcting Trends in Machine Learning, 2018. [42] Matthew Schultz and Thorsten Joachims. Learning a distance metric from relative comparisons. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems, volume 16, pages 41–48. MIT Press, 2004. 11
[43] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad- CAM: Visual explanations from deep networks via gradient-based localization. nternational Journal of Computer Vision, 128:336–359, February 2020. [44] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, abs/1312.6034, 2013. [45] M. Sipser. Introduction to the Theory of Computation 3rd. Cengage Learning, 2013. [46] Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2020. [47] Guolong Su, Dennis Wei, Kush Varshney, and Dmitry Malioutov. Interpretable two-level boolean rule learning for classification. In https://arxiv.org/abs/1606.05798, 2016. [48] Zoya Svitkina and Lisa Karen Fleischer. Submodular approximation: Sampling-based algorithms and lower bounds. SIAM Journal on Computing, 2011. [49] Zaid Bin Tariq, Arun Iyengar, Lara Marcuse, Hui Su, and Bülent Yener. Patient-specific seizure prediction using single seizure electroencephalography recording. arXiv preprint arXiv:2011.08982, 2020. [50] Ilse van der Linden, Hinda Haned, and Evangelos Kanoulas. Global aggregations of local explanations for black box models. In Fairness, Accountability, Confidentiality, Transparency, and Safety - SIGIR Workshop, 2019. [51] Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In International Conference on Learning Representations, 2019. [52] Kilian Q Weinberger and Lawrence K Saul. Distance metric learning for large margin nearest neighbor classification. Journal of machine learning research, 10(2), 2009. [53] T. Wu, M. T. Ribeiro, J. Heer, and D. S. Weld. Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models. In ACL, 2021. [54] Hongjing Zhang, Sugato Basu, and Ian Davidson. A framework for deep constrained clustering – algorithms and advances. In Proceedings European Conference on Machine Learning, 2019. [55] Meng Zheng, Srikrishna Karanam, Terrence Chen, Richard J. Radke, and Ziyan Wu. Towards visually explaining similarity models. In arXiv:2008.06035, 2020. [56] Sijie Zhu, Taojiannan Yang, and Chen Chen. Visual explanation for deep metric learning. IEEE Transactions on Image Processing, 2021. [57] Z. Zhu, C. Yin, B. Qian, Y. Cheng, J. Wei, and F. Wang. Measuring patient similarities via a deep architecture with medical concept embedding. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 749–758, 2016. 12
A Other Explainability Methods A large body of work on XAI can be said to belong to either local explanations [40, 34, 19], global explanations [22, 5, 7, 16, 15], directly interpretable models [8, 41, 47, 12, 45] or visualization-based methods [21]. Among these categories, local explainability methods are the most relevant to our current endeavor. Local explanation methods generate explanations per example for a given black box. Methods in this category are either feature-based [40, 34, 3, 14, 37] or exemplar-based [19, 29]. There are also a number of methods in this category specifically designed for images [44, 3, 18, 32, 43]. However, all of the above methods are predominantly applicable to the classification setting and in a smaller number of cases to regression. Global explainability methods try to build an interpretable model on the entire dataset using information from the black-box model with the intention of approaching the black-box models performance. Methods in this category either use predictions (soft or hard) of the black-box model to train simpler interpretable models [22, 5, 7] or extract weights based on the prediction confidences reweighting the dataset [16, 15]. Directly interpretable methods include some of the traditional models such as decision trees or logistic regression. There has been a lot of effort recently to efficiently and accurately learn rule lists [41] or two-level boolean rules [47] or decision sets [45]. There has also been work inspired by other fields such as psychometrics [27] and healthcare [8]. Visualization based methods try to visualize the inner neurons or set of neurons in a layer of a neural network [21]. The idea is that by exposing such representations one may be able to gauge if the neural network is in fact capturing semantically meaningful high level features. B Proof of Submodularity (Lemma 4.1) Proof. Consider two sets S and T consisting of elements z (i.e. analogous pairs) as defined before, where S ⊆ T . Let w be a pair 6∈ T and x be an input pair that we want to explain. Then for any valid S and T , we have 2 X 2 f (S ∪ w) − f (S) = (δBB (w) − δBB (x)) + λ1 G(w, x) − λ2 δmin (w, z). (5) z∈S Similarly, 2 X 2 f (T ∪ w) − f (T ) = (δBB (w) − δBB (x)) + λ1 G(w, x) − λ2 δmin (w, z). (6) z∈T Subtracting equation (6) from (5) and ignoring λ2 as it just scales the difference without changing the sign gives us X X 2 2 δmin (w, z) − δmin (w, z) z∈T z∈S X X X 2 2 2 = δmin (w, z) + δmin (w, z) − δmin (w, z) z∈S z∈T /S z∈S X 2 = δmin (w, z) ≥0 (7) z∈T /S Thus, the function f (.) has diminishing returns property. C Greedy Approximate Algorithm for Solving (2) The only reliable software that was available for solving the submodular minimization was the SFO MATLAB package [30]. However, we faced the following challenges - (a) it was quite slow to run the exact optimization and since we had to compute thousands of local explanations, it would have taken an unreasonably long time, (b) we wanted k sparse solutions (not unconstrained outputs) (c) the optimization was quite sensitive in the exact setting to the hyperparameters λ1 and λ2 , (c) attempts to speed up the execution by parallelizing it would require algorithmic innovations, (d) MATLAB needed paid licenses. Hence for the purposes of having better control, speed, and efficiency, we implemented a greedy approximate version of the objective in (2). The greedy approach chooses one analogous pair (zi ) to minimize the current objective value and keeps repeating it until k pairs are chosen. The greedy algorithm is provided in Algorithm 2. 13
D Computational Complexities The FbFull method involves solving an SDP which has at least a time complexity of O(d3 ) where d is the number of features since each iteration usually involves solving a linear system or inverting a matrix of that size. However, it is not apparent how the CVXPY package we use sets up and solves this problem, which could alter this complexity. We implement a non-negative sparse minimization for FbDiag with k non-zeros and for this case the computational complexity if O(N k 2 ) where N is the number of perturbations used. For the proposed AbE method, q the objective function is submodular the k-sparse minimization algorithm has an approximation guarantee of lnkk and runs in O(k 4.5 ) time. Typically, k is small. However, there is no software implementation available for this method to the best of our knowledge. Hence we use the greedy method proposed above which has a time complexity of O(N k), where N is the dataset size. E Additional Remarks on Methods E.1 Joint Search LIME Joint Search LIME (JSLIME) [20] is a bilinear model akin to x̄T Aȳ in our notation, where A is unconstrained and may not even be square, as opposed to FbFull/FbDiag which uses a Mahalanobis distance (x̄ − ȳ)T A(x̄ − ȳ), where is A semidefinite and necessarily square. Both FbFull/FbDiag and JSLIME approaches have their merits. Mahalanobis distance is a metric and interpretations can exploit this by decomposing the Mahalanobis distance into distance contributions due to differences in individual features. On the other hand, JSLIME can be more flexible because of unconstrained A. [20] show that it can be used to identify correspondences between parts of two inputs, specifically a region of a retrieved image that corresponds to a given region of a query image. This is a different task from explaining a predicted distance by a decomposition. Also for tabular data, it is not clear how meaningful the correspondences from JSLIME will be. E.2 Explaining Similarities Versus Distances The feature based explanation methods (FbFull and FbDiag) explain the distance between the points x and y given by δBB (x, y) using the Mahalanobis distance (x̄ − ȳ)T A(x̄ − ȳ). To understand how this is equivalent to explaining similarities, consider without loss of generality that the maximum value of δBB (x, y) is 1 for any x and y. Now, Pd Pd the explanation model for the simple similarity function 1 − δBB (x, y) is j=1 k=1 (1/d2 − Cjk ) where Cjk = (x̂j − ŷj )Ajk (x̂k − ŷk ) is the distance contribution discussed in Example 1 (Section 5.1). Clearly, a low distance contribution Cjk results in a high similarity contribution 1/d2 − Cjk and vice versa. F Data and Black-Box Models For the Iris dataset we created 5 folds of the data with 20% non-overlapping test set in each fold, and the rest of the data in each fold is divided into 80% training and 20% validation samples. For each partition, we create similar and dissimilar pairs exhaustively based on the agreement or disagreement of labels. This resulted in an average of 4560 training pairs, 276 validation pairs, and 435 testing pairs per fold. The black-box model used in this case is a paired (conjoined) neural network where each candidate network in the pair has a single dense layer whose parameters are tied to the other candidate and is trained using contrastive loss. The mean absolute error (MAE) between the black box predictions, δBB (., .), and the ground truth distances between the pairs was 0.400 ± 0.044, the Pearson’s r was 0.370 ± 0.164. For GFbFull, we chose only the top 500 words in the test set vocabulary according to tf-idf scores to limit the computational complexity. The Medical Expenditure Panel Survey (MEPS) dataset is produced by the US Department of Health and Human Services. It is a collection of surveys of families of individuals, medical providers, and employers across the country. We choose Panel 19 of the survey which consists of a cohort that started in 2014 and consisted of data collected over 5 rounds of interviews over 2014 − 2015. The outcome variable was a composite utilization feature that quantified the total number of healthcare visits of a patient. The features used included demographic features, perceived health status, various diagnosis, limitations, and socioeconomic factors. We filter out records that had a utilization (outcome) of 0, and log-transformed the outcome for modeling. These pre-processing steps resulted in a dataset with 11136 examples 14
and 32 categorical features. We used 5 − f old CV using the same approach as in Iris data. When selecting pairs of examples for explanations, we performed stratified random sampling based on δBB (., .). For FbFull, FbDiag, and GFbFull, we chose 10000 pairs each from validation and test set for each fold. For AbE, DirSim, and PDash, we chose 1000 pairs to limit the computational complexity. The regression black-box model used for predicting the utilization outcome was a Random Forest with 500 trees and 50 leaf nodes per tree. The function δBB (x, y) for a pair (x, y) was obtained as the distance between the leaf embeddings [57] from the random forests. The R2 performance measure of the regressor in the test set was 0.381 ± 0.017. The STS benchmark dataset comprises of 8628 sentence pairs of which 5749 correspond to the training partition, 1500 to the validation partition, and 1379 to the test partition. Each pair has a ground truth semantic similarity score between 0 (no meaning overlap) and 5 (meaning equivalence). This can be re-interpreted into a distance measure by subtracting it from 5 and dividing the result by 5. The black box model used here was the universal sentence encoder4 [10], which creates a 512 dimensional embedding for each sentence. δBB (x, y) is the cosine distance between the embeddings of x and y. The Pearson’s r performance measure of these black-box predictions with respect to the distance between the sentences is 0.787 and the mean absolute error is 0.177. G Hyperparameters In all datasets, FbFull and GFbFull were computed along with a very small `1 penalty on A (10−4 kAk1 ) added to the objective function (1). For FbDiag, we request a maximum of 4 non-zero coefficients for Iris, 10 non-zero coefficients for MEPS, and 5 non-zero coefficients for STS. As discussed in Section 5, we set the hyperparameters for analogy-based explanations to reasonable values guided by the following procedure. First we set α = 0 because we wanted to evaluate independently the benefit of analogy-based explanations without any influence of feature-based explanations. For setting λ1 and λ2 , we first note that too high a value of λ2 may result in analogous pairs that do not have similarities close to the input. So we set it to a small value (0.01) in all cases and search around that range. Next, when we set λ1 we want to give somewhat equal priority to the first and second terms in (2). Hence, we search between 0.1 and 1.0. Again, we would like to have good fidelity between the input and the analogous pairs, and this guided our decision. Finally, we also consider how intuitive the analogies are for a randomly chosen set of inputs. At least for STS dataset, this consideration also guided our choice when setting these two hyperparameters. Such a human-in-the-loop process to tune explanations is also seen in prior works [35, 36, 53]. Perturbations for Local Explanations (FbDiag, FbFull): The input instances x, y are individually perturbed to get the data points (xi , yi ) ∈ Nxy (see (1)). To obtain the weights wxi ,yi , we first compute weights wx,xi and wy,yi for each generated instance xi and yi respectively. We use the exponential kernel to compute the weight wx,xi = exp(−F (x, xi )/σ 2 ) as a function of some distance F between the generated instance and the corresponding input instance. F could be δBB . The final weight wxi ,yi for the generated pair is then given by summing the individually computed weights of each generated data point with its respective input instance i.e. wxi ,yi = wx,xi + wy,yi . For Iris, the perturbation neighborhood Nxy was generated for each example in the pair by sampling from a Gaussian distribution centered at that example. The statistics for the Gaussian distribution are learned from the training set. For MEPS data, perturbations for the categorical features were learned using the model discussed in the next paragraph, with a bias value of 0.1. For STS, the perturbations were generated following the LIME codebase5 by randomly removing words from sentences. The sizes of the perturbation neighborhoods used were 100 for Iris, 200 for MEPS, and 10 for STS. The interpretable representation (x̄, ȳ) is the same as the original features in Iris; for MEPS it involves dummy coding the categorical features, and with STS, we create a vectorized binary representation indicating just the presence or absence of words in the pair of sentences considered. When computing perturbation neighborhood weights, F is the Manhattan distance for Iris and MEPS, whereas it is the Cosine distance for STS data. σ 2 in exponential kernel was set to 0.5625 times the number of features for all datasets, following the default setting in LIME’s code. Realistic Categorical Feature Perturbation using Conditional Probability Models: For categorical features, we develop a perturbation scheme that can generate more realistic perturbations. For each example, we estimate the conditional probability of a feature j belonging to different categories given all the other feature values. These conditional probabilities can be used to sample categories for feature j to generate perturbations. To ensure closeness to the original category, a small constant (bias) is added to the conditional probability of the original category and 4 https://tfhub.dev/google/universal-sentence-encoder/4 5 https://github.com/marcotcr/lime/blob/master/lime/lime_text.py 15
You can also read