Construction and Elicitation of a Black Box Model in the Game of Bridge
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Construction and Elicitation of a Black Box Model in the Game of Bridge Véronique Ventos, Daniel Braun, Colin Deheeger, Jean Pierre Desmoulins, Jean Baptiste Fantun, Swann Legras, Alexis Rimbaud, Céline Rouveirol, Henry Soldano and Solène Thépaut arXiv:2005.01633v2 [cs.AI] 4 Apr 2022 Abstract We address the problem of building a decision model for a specific bidding situation in the game of Bridge. We propose the following multi- step methodology: i) Build a set of examples for the decision problem and use simulations to associate a decision to each example ii) Use supervised relational learning to build an accurate and interpretable model iii) Per- form a joint analysis between domain experts and data scientists to improve the learning language, including the production by experts of a handmade model iv) Use insights from iii) to learn an almost self-explaining and accu- rate model. 1 Introduction Our goal is to model expert decision processes in Bridge. To do so, we pro- pose a methodology involving human experts, black box decision programs, and relational supervised machine learning systems. The aim is to obtain a global model for this decision process, that is both expressive and has high predictive performance. Following the success of supervised methods of the deep network fam- ily, and a growing pressure from society imposing that automated deci- Daniel Braun, Colin Deheeger, Jean Pierre Desmoulins, Jean Baptiste Fantun, Swann Legras, Alexis Rimbaud, Céline Rouveirol, Henry Soldano, Solène Thépaut and Véronique Ventos NukkAI, Paris, France Henry Soldano and Céline Rouveirol Université Sorbonne Paris-Nord, L.I.P.N UMR-CNRS 7030 Villetaneuse, France Henry Soldano Muséum National d’Histoire Naturelle, I.SY.E.B UMR-CNRS 7205, Paris, France 1
2 Véronique Ventos et al sion processes be made more transparent, a growing number of AI re- searchers are (re)exploring techniques to interpret, justify, or explain “black box” classifiers (referred to as the Black Box Outcome Explanation Prob- lem [Guidotti et al., 2019]). It is a question of building, a posteriori, explicit models in symbolic languages, most often in the form of rules or decision trees that explain the outcome of the classifier in a format intelligible to an expert. Such explicit models extracted by supervised learning can carry expert knowledge [Murdoch et al., 2019], which has intrinsic value for ex- plainability, pedagogy, and evaluation in terms of ethics or equity (they can help to explain biases linked to the learning system or to the set of train- ing examples). The vast majority of works focus on local interpretability of decision models: simple local explanations (linear models/rules) are built to justify individual predictions of black box classifiers (see for instance the popular LIME [Ribeiro et al., 2016] and ANCHORS [Ribeiro et al., 2018] systems). Such local explanations do not give an overview of the black box model behaviour, they moreover rely of the notion of neighborhood of in- stances to build explainations and may be highly unstable. Our approach is to build post-hoc global explanations by approximating the predictions of black box models with interpretable models such as decision lists and rule sets. We present here a complete methodology for acquiring a global model of a black box classifier as a set of relational rules, both as explicit and as accurate as possible. As we consider a game, i.e. a universe with precise and known rules, we are in the favorable case where it is possible to generate, on demand, data that i) is in agreement with the problem specification, and ii) can be labelled with correct decisions through simulations. The methodol- ogy we propose consists of the following elements: 1. Problem modelling with relational representations, data generation and labelling. 2. Initial investigation of the learning task and learning with relational learners. 3. Interaction with domain experts who refine a learned model to produce a simpler alternative model, which is more easily understandable for do- main users. 4. Subsequent investigation of the learning task, taking into account the concepts used by experts to produce their alternative model, and the proposition of a new, more accurate model. The general idea is to maximally leverage interactions between experts and engineers, with each group building on the analysis of the other. We approach the learning task using relational supervised learning meth- ods from Inductive Logic Programming (ILP) [Muggleton and Raedt, 1994]. The language of these methods, a restriction of first-order logic, allows learning compact rules, understandable by experts in the domain. The logi- cal framework allows the use of a domain vocabulary together with domain
Construction and Elicitation of a Black Box Model in the Game of Bridge 3 knowledge defined in a domain theory, as illustrated in early work on the subject [Legras et al., 2018]. The outline of the article is as follows. After a brief introduction to bridge in Section 2, we describe in Section 3 the relational formulation of the target learning problem, and the method for generating and labelling examples. We then briefly describe in Section 4 the ILP systems used, and the first set of experiments run on the target problem, along with their results (Sec- tion 4.2). In Section 5, bridge experts review a learned model’s output and build a powerful alternative model of their own, an analysis of which leads to a refinement of the ILP setup and further model improvements. Future research avenues are outlined in the conclusion. 2 Problem Addressed Bridge is played by four players in two competing partnerships, namely, North and South against East and West. A classic deck of 52 playing cards is shuffled and then dealt evenly amongst the players ( 52 4 = 13 cards each). The objective of each side is to maximize a score which depends on: • The vulnerability of each side. A non-vulnerable side loses a low score when it does not make it’s contract, but earns a low score when it does make it. In contrast, a vulnerable side has higher risk and reward. • The contract reached at the conclusion of the auction (the first phase of the game). The contract is the commitment of a side to win a mini- mum of lmin ∈ {7, . . . 13} tricks in the playing phase (the second phase of the game). The contract can either be in a Trump suit (♣, q, r, ♠), or No Trumps (NT), affecting which suit (if any) is to gain extra privileges in the playing phase. An opponent may Double a contract, thus imposing a big- ger penalty for failing to make the contract (but also a bigger reward for making it). A contract is denoted by pS (or pS X if it is Doubled), where p = lmin − 6 ∈ {1, . . . 7} is the level of the contract, and S ∈ {♣, q, r, ♠, N T } the trump suit. The holder of the contract is called the declarer, and the partner of the declarer is called the dummy. • The number of tricks won by the declaring side during the playing phase. A trick containing a trump card is won by the hand playing the highest trump, whereas a trick not containing a trump card is won by the hand playing the highest card of the suit led. For more details about the game of bridge, the reader can consult [ACBL, 2019]. Two concepts are essential for the work presented here: • Auction: This allows each player (the first being called the dealer) the opportunity to disclose coded information about their hand or game plan
4 Véronique Ventos et al to their partner1 . Each player bids in turn, clockwise, using as a language the elements: Pass, Double, or a contract higher than the previous bid (where ♣ < q < r < ♠ < N T at each level). The last bid contract, followed by three Passes, is the one that must be played. • The evaluation of the strength of a hand: Bridge players assign a value for the highest cards: an Ace is worth 4 HCP (High Card Points), a King 3 HCP, a Queen 2 HCP and a Jack 1 HCP. Information given by the players in the auction often relate to their number of HCP and their distribution (the number of cards in one or more suits). 2.1 Problem Statement After receiving suggestions from bridge experts, we chose to analyse the following situation: • West, the dealer, bids 4♠, which (roughly) means that they have a mini- mum of 7 spades, and a maximum of 10 HCP in their hand. • North Doubles, (roughly) meaning that they have a minimum of 13 HCP and, unless they have a very strong hand, a minimum of three cards in each of the other suits (♣, q and r). • East passes, which has no particular meaning. South must then make a decision: pass and let the opponents play 4♠X , or bid, and have their side play a contract. This is a high stakes decision that bridge experts are yet to agree on a precise formulation for. Our objective is to develop a methodology for representing this problem, and to find accu- rate and explainable solutions using relational learners. It should be noted that Derek Patterson was interested in solving this problem using genetic algorithms [Patterson, 2008]. In the remainder of the article, we describe the various processes used in data generation, labelling, supervised learning, followed by a discussion of the results and the explicit models produced. These processes use relational representations of the objects and models involved, keeping bridge experts in the loop and allowing them to make adjustments where required. In the next section we consider the relational formulation of the problem, and the data generation and labelling. 1The coded information given by a player is decipherable by both their partner and the opponents, so one can only deceive their opponents if they’re also willing to deceive their partner. In practice, extreme deception in the auction is rare, but for both strategical and practical reasons, the information shared in the auction is usually far from complete.
Construction and Elicitation of a Black Box Model in the Game of Bridge 5 3 Automatic Data Generation and Modelling Methodology The methodology to generate and label the data consists of the following steps: • Problem modelling • Automatic data generation • Automatic data labelling • ILP framing These steps are the precursors to running relational rule induction (Aleph) and decision tree induction (Tilde) on the problem. 3.1 Problem Modelling The problem modelling begins by asking experts to define, in the context described above, two rule based models: one to characterize the hands such that West makes the 4♠ bid, and another to characterize the hands such that North makes the Double bid. These rule based models are submitted to simulations allowing the experts to interactively validate their models. With the final specifications, we are able to generate examples for the target problem. For this section we introduce the terms: • nmpq exact distribution which indicates that the hand has n cards in ♠, m cards in r, p cards in q and q cards in ♣, where n + m + p + q = 13. • nmpq distribution refers to an exact distribution sorted in decreasing or- der (thus ignoring the suit information). For instance, a 2533 exact dis- tribution is associated to a 5332 distribution. • |c| is the number of cards held in the suit c ∈ {♠, r, q, ♣}. 3.1.1 Modelling the 4♠ bid Experts have modeled the 4♠ bid by defining a disjunction of 17 rules that relate to West hand: 17 _ 4♠ ← Ri , where Ri = C0 ∧ Vi ∧ Ci (1) i=1 in which • C0 is a condition common to the 17 rules and is reported in Listing ?? in the Appendix. • Vi is one of the four possible vulnerability configurations (no side vulner- able, both sides vulnerable, exactly one of the two sides vulnerable).
6 Véronique Ventos et al • Ci is a condition specific to the Ri rule. For instance, the conditions for rules R2 and R5 are: • V2 = East-West not vulnerable, North-South vulnerable. • C2 = all of: – |r| = 1, – 2 cards exactly among Ace, King, Queen and Jack of ♠, – a 7321 distribution. • V5 = East-West not vulnerable, North-South vulnerable. • C5 = all of: – |♣| ≥ 4 or |q| ≥ 4, – 2 cards exactly among Ace, King, Queen and Jack of ♠, – a 7mpq distribution with m ≥ 4. To generate boards that satisfy these rules, we randomly generated com- plete boards (all 4 hands), and kept the boards where the West hand satisfies one of the 17 rules. The experts were able to iteratively adjust the rules as they analysed boards that either passed through the filter, or failed to pass through the filter (but perhaps should have). After the experts were happy with the samples, 8,200,000 boards were randomly generated to analyse rule adherence, and 10,105 of them contained a West hand satisfying one of the 17 rules. All rules were satisfied at least once. Of the times where at least one rule was satisfied, R2 , for example, was satisfied 16.2% of the time, and R5 was satisfied 15% of the time. 3.1.2 Double Modelling Likewise, the bridge experts also modeled the North Double by defining a disjunction of 3 rules relating to the North hand: 3 _ Double ← R0i , where R0i = C00 ∧ Ci0 (2) i=1 This time, the conditions do not depend on the vulnerability. The common condition C00 and the specific conditions Ci0 are as follows: • C00 - for all c, c1 , c2 ∈ {r, q, ♣}, |c| ≤ 5 and not (|c1 | = 5 and |c2 | = 5). • C10 - HCP ≥ 13 and |♠| ≤ 1. • C20 - HCP ≥ 16 and |♠| = 2 and |r| ≥ 3 and |q| ≥ 3 and |♣| ≥ 3. • C30 - HCP ≥ 20. The same expert validation process was carried out as in Section 3.1.1. A generation of 70,000,000 boards resulted in 10,007 boards being satisfied by at least one 4♠ bid rule for the West hand and at least one Double rule for
Construction and Elicitation of a Black Box Model in the Game of Bridge 7 the North hand. All rules relating to Double were satisfied at least once. Of the times that at least one rule was satisfied, R01 , for example, was satisfied 69.3% of the time, and R02 was satisfied 24.8% of the time. 3.2 Data Generation The first step of the data generation process is to generate a number of South hands in the context described by the 4♠ and Double rules mentioned above. Note, again, that East’s Pass is not governed by any rules, which is close to the real situation. We first generated 1,000 boards whose West hands satisfied at least one 4♠ bid rule and whose North hands satisfied at least one Double rule. One such board is displayed in Example 1: Exemple 1 A board (North-South vulnerable / East-West not vulnerable) which contains a West hand satisfying rules R0 and R5 and a North hand satisfying rules R00 and R01 : ♠ 6 r AQ 9 3 q K Q 10 4 2 ♣ A9 4 N ♠ AK J 8752 ♠ 10 3 r 7 r J 84 W E q J 863 q A 7 ♣ 5 ♣ Q J 8762 S ♠ Q 9 4 r K 10 6 5 2 q 9 5 ♣ K 10 3 For reasons that become apparent in Section 3.3, for each of the 1000 generated boards, we generated an additional 999 boards. We did this by fixing the South hand in each board, and randomly redistributing the cards of the other players until we found a board satisfying Equations 1 and 2. As a result of this process, 1,000 files were created, with each file containing 1,000 boards that have the same South hand, but different West, North and East hands. Exemple 2 One of the 999 other boards generated: • West hand: ♠AK108653 r4 q832 ♣84 • North hand: ♠J2 rAJ73 qAK106 ♣AJ9 • East hand: ♠7 rQ98 qQJ74 ♣Q7652
8 Véronique Ventos et al • South hand: ♠Q94 rK10652 q95 ♣K103. Note that the South hand is identical to the one in Example 1, the West hand satisfies rules R0 and R2 , and the North hand satisfies rules R00 and R02 . 3.3 Data Labelling The goal is to label each of the 1,000 South hands associated to the 1,000 sample files, with one of following labels: • Pass when the best decision of South is to pass (and therefore have West play 4♠X ). • Bid when the best decision of South is to bid (and therefore play a con- tract on their side). • ? when the best decision is not possible to be determined. We exclude these examples from ILP experiments. These labels were assigned by computing the score at 4♠X and other pos- sible contracts by simulation. 3.3.1 Scores Computation During the playing phase, each player sees their own hand, and that of the dummy (which is laid face up on the table). A Double Dummy Solver (DDS) is a computation (and a software) that calculates, in a deterministic way, how many tricks would be won by the declarer, under the pretext that everyone can see each other’s cards. Though unrealistic, it is nevertheless a good estimator of the real distribution of tricks amongst the two sides [Pavlicek, 2014]. For any particular board, a DDS can be run for all possi- ble contracts, so one can thus deduce which contract will likely yield the highest score. For each example file (which each contain 1,000 boards with the same South hand), a DDS software [Haglund et al., 2014] was used to determine the score for the following situations: • 4♠X played by the East/West side. • All available contracts played by the North/South side, with the exclu- sion of 4NT, 5NT, 5♠, 6♠ and 7♠ (experts deemed these contracts infeasi- ble to be reached in practice). 3.3.2 Labels Allocation The labels are assigned to each unique South hand by applying the following tests sequentially.
Construction and Elicitation of a Black Box Model in the Game of Bridge 9 1. Take the DDS score from the best North/South contract on each of the 1,000 boards (the optimal contract may not always be the same), and average them. If the average score of defending 4♠X is higher than this score, assign the label of Pass to the South hand. 2. Take the North/South contract with the highest mean DDS score over all of the boards (this may not be the best contract on each board, but merely the contract with the highest mean score across all boards). If the mean score of this contract is greater than the mean score at 4♠X , assign Bid to the South hand. If the mean score of the contract is at least 30 total points lower than the mean score at 4♠X , assign Pass. Otherwise, assign the label ?. The initial dataset containing 1,000 sample files is reduced to a set S of 961 examples after elimination of the 39 deals with the ? label. The resulting dataset consists of: • 338 Bid labels, and • 623 Pass labels. 3.4 Relational data modelling Now that we have generated and labeled our data, the next step is to create the relational representations to be used in relational learners. The learning task consists in using what is known about a given board in the context described above, and the associated South hand, and predicting what the best bid is. The best bid is known for the labeled examples generated in the previous section, but, of course, unknown for new boards. We first introduce the notations used in the remainder of the paper. 3.4.1 Logic Programming Notations Examples are represented by terms and relations between terms. Terms are constants (for instance the integers between 1 and 13), or atoms whose iden- tifiers start with a lower-case character (for instance west, east, spade, heart, . . . ) or variables, denoted by an upper-case character (X, Y , Suit, . . . ). Vari- ables may instantiate to any constant in the domain. Relations between ob- jects are described using predicates applied to terms. A literal is a predicate symbol applied to terms, and a fact is a ground literal, i.e. a literal with- out variables, whose arguments are ground terms only. For instance, deci- sion([sq, s9, s4, hk, h10, h6, h5, h2, d9, d5, ck, c10, c3], 4, n, west, bid) is a ground literal of predicate symbol decision, with four arguments, namely, a hand described as a list of cards, and 4 constant arguments (see Section 3.4.2 for further details).
10 Véronique Ventos et al In order to introduce some flexibility in the problem description, the background knowledge (also known as the domain theory) is described as set of definite clauses. We can think of a definite clause as a first order logic rule, containing a head (the conclusion of the rule) and a body (the premises of the rule). For instance, the following clause: nb(Hand, Suit, V alue) ← suit(Suit), count cards of suit(Hand, Suit, V alue) has the head nb(Hand, Suit, V alue) where Hand, Suit and V alue are vari- ables and two literals in the body suit(Suit) and count cards of suit(Hand, Suit, V alue). The declarative interpretation of the rule is that given that variable Suit is a valid bridge suit, and given that V alue is the num- ber of cards of the suit Suit in the hand Hand, one can derive, in ac- cordance with the logical consequence relationship |= [Lloyd, 1987] that nb(Hand, Suit, V alue) is true. This background knowledge will be used to derive additional information concerning the examples, such as hand and board properties. Let us now describe the decision predicate, the target predicate of the relational learning task. 3.4.2 Target Predicate Given a hand, the position of the hand, vulnerability, and the dealer’s posi- tion, the goal is to predict the class label. The target predicate for the bid- ding phase is decision(Hand, Position, Vul, Dealer, Class) where the arguments are variables. • Hand: the 13 cards of the player who must decide to bid or pass. • Position: the relative position of the player to the dealer. In this task, Po- sition is always 4, the South hand. • Vul: the vulnerability configuration (b = both sides, o = neither side, n = North-South or e = East-West). • Dealer: the cardinal position of the dealer among north, east, south or west. In this task, Dealer is always west. • Class: label to predict (bid or pass). In this representation, a labeled example is a grounded positive literal con- taining the decision predicate symbol. Example 1 given in section 3.2 is therefore represented as: decision([sq, s9, s4, hk, h10, h6, h5, h2, d9, d5, ck, c10, c3], 4, n, west, bid). Such a ground literal contains all the explicit information about the example. How- ever, in order to build an effective relational learning model of the target predicate as a set of rules, we need to introduce abstractions of this informa- tion, in the form of instances of additional predicates in the domain theory. These predicates are described below.
Construction and Elicitation of a Black Box Model in the Game of Bridge 11 3.4.3 Target concept predicates In order to learn a general and explainable model for the target concept, we need to carefully define the target concept representation language by choosing predicate symbols that can appear in the model, more specifically, in the body of definite clauses for the target concept definition. We thus define new predicates in the domain theory, both extensional (defined by a set of ground facts) and intensional (defined by a set of rules or definite clauses). The updated domain theory allows to complete the example de- scription, by gathering/deriving true facts related to the example. It also implicitly defines a target concept language that forms the search space for the target model which the Inductive Logic Programming algorithms will explore using different strategies. Part of the domain theory for this prob- lem was previously used in another relational learning task for a bridge bidding problem [Legras et al., 2018], illustrating the reusability of domain theories. The predicates in the target concept language are divided in two subsets: • Predicates that were introduced in [Legras et al., 2018]. These include both general predicates, such as gteq(A, B) (A ≥ B), lteq(A, B) (A ≤ B) and predicates specific to bridge, such as nb(Hand, Suit, N umber), which tests that a suit Suit of a hand Hand has length N umber (e.g. nb(Hand, heart, 3) means that the hand Hand has 3 cards in r) • Higher level predicates not used in [Legras et al., 2018]. Among which: – hcp(Hand, N umber) is the number of High Card Points (HCP) in Hand. – suit representation(Hand, Suit, Honors, N umber) is an abstract repre- sentation of a suit. Honors is the list of cards strictly superior to 10 of the suit Suit in hand Hand. N umber is the total number of cards in Suit suit. – distribution(Hand, [N , M, P , Q]) states that the nmpq distribution (see Section 3.1) of hand Hand is the ordered list [N , M, P , Q]. For instance, in Example 1 from Section 1, we may add to the represen- tation the following literals: • hcp([sq, s9, s4, hk, h10, h6, h5, h2, d9, d5, ck, c10, c3], 8). • suit representation([sq, s9, s4, hk, h10, h6, h5, h2, d9, d5, ck, c10, c3], spade, [q], 3). • distribution([sq, s9, s4, hk, h10, h6, h5, h2, d9, d5, ck, c10, c3], [2, 3, 3, 5]). Note again, that a fact (i.e. ground literal) can be independent of the rest of the domain theory, such as has suit(hk, heart) which states that hk is a r card, or it can be inferred from existing facts and clauses of the domain the- ory, such as nb([sq, s9, s4, hk, h10, h6, h5, h2, d9, d5, ck, c10, c3], heart, 5), which derives from: • suit(heart): heart is a valid bridge suit. • has suit(hk, heart): hk is a heart card.
12 Véronique Ventos et al • nb(Handlist, Suit, V alue) ← suit(Suit), count card of suit(Handlist, Suit, V alue) where count card of suit has a recursive definition within the domain theory. 4 Learning Expert Rules 4.1 Inductive Logic Programming Systems The ILP systems used in our experiments were Aleph and Tilde. These are two mature state of the art ILP systems of the symbolic SRL paradigm2 . Re- cent work showed that such ILP systems, Tilde in particular, are competitive with recent distributional SRL approaches that first learn a knowledge graph embedding to encode the relational datasets into score matrices that can then fed to non-relational learners. Given a range of some relational datasets, [Dumancic et al., 2019] shows that there is no clear winner on classification tasks when comparing distributional and symbolic learning methods, and that both exhibit strengths and limitations. Tilde on the other hand outper- forms KGE based approaches on Knowledge Base Completion tasks (namely concerning the ability to infer the status (true/false) for missing facts in re- lational datasets). As our work aims to build explainable models in a deterministic context we only consider here ILP systems. The most prominent difference between Aleph and Tilde is their output: a set of independent relational rules for Aleph and a relational decision tree for Tilde. One important declarative parameter of both systems – and in symbolic SRL systems in general – is the so-called language bias that specifies which predicates defined in the back- ground knowledge may be used and in what form these predicates appear in the learned model (the rule set for Aleph and the decision tree for Tilde) 4.1.1 Aleph Given a set of positive examples E + , a set of negative examples E − , each as ground literals of the target concept predicate and a domain theory T , Aleph [Srinivasan, 1999] builds a hypothesis H as a logic program, i.e. a set of definite clauses, such that ∀e+ ∈ E + : H ∧ T |= e+ and ∀e− ∈ E − : H ∧ T 6|= e− . In other words, given the domain theory T , and after learning hypothesis H, it is possible from each positive example description e+ to derive that e+ is indeed a positive example, while such a derivation is impossible for any 2 SRL stands for Statistical Relational Learning and is a recent development of ILP [Raedt et al., 2016] that integrates learning and reasoning under uncertainty about in- dividuals and actions effect.
Construction and Elicitation of a Black Box Model in the Game of Bridge 13 of the negative examples e− . The domain theory T is represented by a set of facts, i.e. ground positive literals from other predicates and clauses, possibly inferred from a subset of primary facts and a set of unground clauses (see Section 3.4). H is a set of definite clauses where the head predicate of each rule is the target predicate. An example of such a hypothesis H is given in Listing 5. 4.1.2 Tilde Tilde3 [Blockeel and Raedt, 1998] relies on the learning by interpretation set- ting. Tilde learns a relational binary decision tree in which the nodes are conjunctions of literals that can share variables with the following restric- tion: a variable introduced in a node cannot appear in the right branch be- low that node (i.e. the failure branch of the test associated to the node). In the decision tree, each example, described through a set of ground literals, is propagated through the current tree until it reaches a leaf. In the final tree, each leaf is associated to the decision bid/pass, or a probability distribution on these decisions. An example of a Tilde decision tree output is given in Listing 4. 4.2 Learning setup In our problem, the target predicate is decision, as defined in Section 3.4.2 and exemplified in Section 3.4.2. Regarding Aleph, in positive examples the last argument of the target predicate has value bid while in negative exam- ples it has value pass. The Aleph standard strategy induce learns the logic program as a set of clauses using a hedging strategy: at each iteration it se- lects a seed positive example e not entailed by the current solution, find a clause Re which covers e, and then remove from the learning set the positive examples covered by Re . The learning procedure is repeated until all pos- itive examples are covered. The induce strategy is sensitive to the order of the learning examples. We also used Aleph with induce max, which is more expensive but insensitive to the order of positive examples: it constructs a best clause, in terms of coverage, for each positive example, then selects a clause subset. induce max has proved beneficial in some of our experiments. 3 Available as part of the ACE Ilp system at https://dtai.cs.kuleuven.be/ACE/
14 Véronique Ventos et al 4.2.1 Experimental setup The performance of the learning system with respect to the size of the train- ing set is averaged over 50 executions i. The sets T esti and T raini are built as follows: • T esti = 140 examples randomly selected from S using seed i. • Each T esti is stratified, i.e. it has the same ratio of examples labeled bid as S (35.2 %). • T raini = S \ T esti (820 examples). For each execution i, we randomly generate from T raini a sequence of increasing number of examples nk = 10 + 100 × k with k between 0 and 8. We denote each of these subsets Ti,k , so that |Ti,k | = nk and Ti,k ⊂ Ti,k+1 . Each model trained on Ti,k is evaluated on T esti . 4.2.2 Evaluation criteria As we can interpret our methodology as a post-hoc agnostic black-box ex- planation framework, we evaluate this framework in terms of fidelity, unam- biguity and interpretability[Lakkaraju et al., 2019]. • A high fidelity explanation should faithfully mimic the behavior of the black box model. To measure fidelity we use the accuracy of the surrogate relational model with respect to the he labels assigned by the black box model. • Unambiguity evaluates if one or multiple (potentially conflicting) expla- nations are provided by the relational surrogate model for a given in- stance. While we do not explicitly report on unambiguity in our experi- ments, we note that decision trees by nature are non ambiguous models. Rule sets, on the other hand, may indeed display some overlap. However, but as our Aleph experiments only consider one target label (see Listing 5) even if two rules apply for some instance, they conclude on the same label and the model again displays no ambiguity. That does not mean that the notion is useless in our context, rather that we have to find more sophisticate ways to evaluate in which sense such models include some level of ambiguity. • Interpretability quantifies how easy it is to understand and reason about the explanation. This dimension of explanation is subjective and as such, maybe prone to the expert/user bias. We therefore choose a crude but objective metric: the model complexity, i.e. the number of rules/nodes of the models. We will also see that in our experiments we obtain similar decision tree accuracies with L2 , a language containing only predicates appearing in a human-made model, as with the much larger language L1 . Clearly, the L2 models have better interpretability.
Construction and Elicitation of a Black Box Model in the Game of Bridge 15 4.3 First experiments The first experiments involve a baseline non relational data representation, together with the L0 language mentioned earlier as well as a smaller lan- guages L1 . Table 1 and Figures 1 and 2 display the results of our experiments as well as results obtained using a different language L2 further introduced in Section 5.2. We have first learned Random Forest models, using the sklearn python package[Pedregosa et al., 2011], with default parameters including using forests made of 100 trees. The data representation is as follows: each hand is described by 54 attributes: vulnerability, target class, highest card, second highest card, . . . , 13th highest card in each suit (ordered as ♣ < q < r < ♠). If a player has n cards in a given suit, the corresponding n + 1, . . . , 13th attributes for this suit are set to 0. The relational models are learned with Tilde and Aleph where Aleph is trained on both the induce and induce max settings. Figure 1-left presents the average accuracy, over all i = 1 . . . 50 executions, of the models trained on the sets Ti,k and therefore requiring building 450 models for each curve. Figure 2 displays the corresponding model complexity, expressed in average number of nodes for Tilde and Random Forest4 . Table 1 displays the aver- age accuracies and complexities of each experiment for learning set sizes 310 (k=4) and 810 (k=9). An experiment represents 450 Tilde or Aleph exe- cutions, and the resulting CPU cost of these executions is also recorded. As the experiments were performed on similar machines with a different num- ber of cores (either 8 or 16) we report the CPU cost as the number of cores of the machine times the total CPU time, i.e. in core hours units5 . Aleph failed to cope with the large L0 language, and couldn’t return a solution in reasonable time6 . As can be seen in Figure 1 and Table 1, the L0 - Tilde models had rather low accuracy and a large number of nodes (which grew linearly with the number of training examples used). After an analysis of the Tilde output trees, it became apparent that Tilde was selecting many nodes with very specific conditions, in particular, with the suit representation and distribution predicates. While this improved per- formance in training, the specificity hindered the performance in testing. To counter this, we removed from L0 the suit representation and distribution predicates, creating the more concise language L1 . As expected, the average number of nodes in L1 -Tilde reduced dramati- cally. L1 -Tilde had the best accuracy of all models, which continued to in- crease between 610 and 810 examples. Its complexity was relatively stable, 4 For Random Forest, the reported number of nodes is an average over the 100 trees of a model 5 For instance, 16 core-hours represents 1 hour of CPU time on a 16-core machine or 2 hours CPU time on a 8-core machine. 6 Aleph was able to learn models after we removed the nb predicate from L , albeit rather 0 slowly (≈ 384 core hours).
16 Véronique Ventos et al Fig. 1 Average model accuracies of L0 , L1 models (left) and L2 models (right). Fig. 2 Average model complexities of rule-based (left) and tree-based (right) models. Table 1 Summary of average accuracy and complexity for all models when trained on 310 and 810 examples, together with total CPU cost of the 450 Tilde or Aleph executions in CPUtime×hours. Reported complexity of Random Forest only concerns one tree among the 100 trees in the forest.
Construction and Elicitation of a Black Box Model in the Game of Bridge 17 increasing only slightly between 510 and 810 examples, to a total of 21 tree nodes. With the smaller L1 language, L1 -Aleph-induce was able to find solu- tions in reasonable time. It’s accuracy was worse than that of L1 -Tilde (85.6% compared to 89.8% when training on 810 examples). After 310 examples, the L1 -Aleph-induce model size remained stable at 16 rules (its accuracy also showed stability over the same training set sizes). We also used Aleph’s induce max strategy with L1 . Compared to induce, this search strategy has a higher time cost and can produce rules with large overlaps in coverage of learning examples, but can generalize better due to the fact that it saturates every example in the training set, not just a subset based on the coverage of a single seed example. This search strategy was of no benefit in this case, with L1 -Aleph-induce max showing slightly worse accuracy (84.4% compared to 85.6%) and a much higher complexity (47 rules compared to 15) when compared with L1 -Aleph-induce. The CPU cost of learning the Random Forest models is negligible (less than 10 seconds) compared to those of relational models. The accuracies lie between those of L1 -Tilde and of L1 -Aleph models, while the num- ber of nodes per tree is much higher (see Figure 2-right). This indicates that there is no gain in using ensemble learning on non-relational deci- sion trees, although the data representation is much simpler and needs only limited Bridge expertise to define. The Random Forest models are, as expected, difficult to interpret, apart from variable importance: It ap- pears that the first three highest cards in each suit and vulnerability are the most important variables in this task. It would be possible to use post- hoc formal methods to a posteriori explain those random forest models [Izza and Marques-Silva, 2021], however the whole process would be ex- pensive, and, after [Rudin et al., 2021], Explaining black boxes, rather than replacing them with interpretable models, can make the problem worse by pro- viding misleading or false characterizations. Following these experiments, we had bridge experts do a thorough anal- ysis of one of the successful L1 -Tilde outputs. Their goal was to create a sub- model which was much easier to interpret than the large decision trees7 , and had an accuracy comparable to that seen by L1 -Tilde. We also used the experts’ insights to further revise the domain theory, in the hope of learning a simpler model with Aleph and Tilde. We describe this process in the next section. 7 The experts noted that, in general, the Tilde trees were more difficult to read than the rule-based output produced by Aleph.
18 Véronique Ventos et al 5 Expert Rule Analysis and new Experiments 5.1 Expert Rule Analysis To find an L1 -Tilde output tree for the experts to analyse, we chose a model with good accuracy and low complexity (a large tree would be too hard to parse for a bridge expert). A subtree of the chosen L1 -Tilde output is dis- played in Listing 1 (the full tree is displayed in Listing 4 in the Appendix). Despite the low complexity of this tree (only 15 nodes), it remains somewhat difficult to interpret. Listing 1 Part of the L1-tree learned by Tilde d e c i s i o n ( −A, −B, −C, −D, −E ) nb (A, −F , −G) , gteq (G, 6 ) ? +−−yes : hcp (A, 4 ) ? | +−−yes : [ pass ] | +−−no : [ bid ] +−−no : hcp (A, −H) , gteq (H, 1 6 ) ? +−−yes : [ bid ] +−−no : nb (A, spade , − I ) , gteq ( I , 1 ) ? +−−yes : l t e q ( I , 1 ) ? | +−−yes : hcp (A, − J ) , l t e q ( J , 5 ) ? | | +−−yes : [ pass ] | | +−−no : nb (A, −K, 5 ) ? | | +−−yes : hcp (A, 6 ) ? | | | +−−yes : [ pass ] | | | +−−no : [ bid ] | | +−−no : [ pass ] Simply translating the tree to a decision list does not solve the issue of interpretability. Thus, with the help of bridge experts, we proceeded to do a manual extraction of rules from the given tree, following these post- processing steps: 1. Natural language translation from relational representations. 2. Removal of nodes with weak support (e.g. a node which tests for precise values of HCP instead of a range). 3. Selection of rules concluding on a bid leaf. 4. Reformulation of intervals (e.g. combining overlapping ranges of HCP or number of cards). 5. Highlighting the specific number of cards in ♠, and specific nmpq distri- butions. The set of five rules resulting from this post-processing, shown Listing 3 in the next section, are much more readable than L1 -Tilde trees. Despite its low complexity, this set of rules has an accuracy of 90.6% on the whole of S.
Construction and Elicitation of a Black Box Model in the Game of Bridge 19 5.2 New experiments Following this expert review, we created the new L2 language, which only used the predicates that appear in the post-processed rules above. This meant the reintroduction of the distribution predicate, the addition of the nbs predicate (which is similar to nb but denotes only the number of ♠ in the hand), and the omission of several other predicates. We display Table 2 in the Appendix the predicates involved in the languages L0 , L1 and L2 . Re- sults are displayed together withe those regarding L0 and L1 on Figure 1, Figure 2 and Table 1. Using L2 with Aleph-induce results in a much simpler model than L1 - Aleph-induce (6 rules compared to 16 when trained on 810 examples), at the cost of somewhat lower accuracy (83.5% compared to 85.6%). The com- plexity of L2 -Tilde was also smaller than its counterpart L1 -Tilde (12 nodes compared to 15), also at the expense of accuracy (87.3% compared to 89.8%). In both cases, the abstracted and simplified L2 language produced models which were far less complex (and thus more easily interpretable), but had a slight degradation in accuracy, when compared with L1 . When using L2 the more expensive induce max search strategy proved useful, with L2 -Aleph-induce max showing an accuracy of 88.2%, second only to L1 -Tilde (89.8%). This was at the cost of complexity, with L2 -Aleph- induce max models having an average of 20 rules compared to 6 for L2 - Aleph-induce. We selected a L2 -Aleph-induce max model, displayed Listing 5 in the Ap- pendix. The rules of this model are relatively easy to apprehend, and the model achieves an accuracy of 89.7% on the whole example set S. We pro- vide Listing 2 the experts translation M of this model and compare it to the rewriting H displayed Listing 3 of the L1 -Tilde tree model which led to the L2 language definition. The translation is much simpler that the previous one and required only steps 1,4 and 5 of the original translation process. An in-depth expert explanation of how the translated models differ is pro- vided in the Appendix. It is noticeable that the two translated models are very similar: each rule in M matches, with only slight differences, the rule in H with same number. The only apparent exception is the overly specific rule M2b. A closer look to rule H2 reveals that rule M2b completes rule M2 by considering the case in which the player has 6 cards, leading to a closer match to human rule H2.
20 Véronique Ventos et al Listing 2 Translation of the L2 -Aleph-induce max model We bid : M1 − With 0♠. . M2 − With 7+ c a r d s i n a non−spade s u i t OR 14+HCP. M2b − With 6+ c a r d s i n a non−spade s u i t AND ( 1 ♠ AND 5−13HCP) OR ( 2 ♠ AND 7−13HCP) OR ( 3 ♠ AND 10−13HCP) M3 − With 1♠ AND (5−4−3−1♠ OR 5−5−2−1♠ ) AND 8−13HCP. M4 − With 2♠ AND 5−5−2♠−1 M5 − With 2♠ AND (5−4−2−2♠ OR 5−3−3−2♠ ) AND 13HCP. Listing 3 Rules extracted by the experts from the L1-Tilde tree We bid : H1 . − With 0♠ . H2 − With 6+ c a r d s i n a non−spade s u i t OR 16+HCP. H3 − With 1♠ AND (5−4−3−1♠ or 5−5−2−1♠ ) AND 6−15HCP. H4 − With 2♠ AND 5−5−2♠−1 AND 7−15HCP. H5 − With 2♠ AND 5−4−2−2♠ AND 11−15HCP. 6 Conclusion We conducted an end-to-end experiment to acquire expert rules for a bridge bidding problem. We did not propose new technical tools, our contribution was in investigating a complete methodology to handle a learning task from beginning to end, i.e. from the specifications of the learning problem to a learned model, obtained after expert feedback, who not only criticise and evaluate the models, but also propose hand-made (and computable) alter- native models. The methodology includes generating and labelling exam- ples by using both domain experts (to drive the modelling and relational representation of the examples) and a powerful simulation-based black box for labelling. The result is a set of high quality examples, which is a prere- quisite for learning a compact and accurate relational model. The final rela- tional model is built after an additional phase of interactions with experts, who investigate a learned model, and propose an alternative model, an anal- ysis of which leads to further refinements of the learning language. We ar- gue that interactions between experts and engineers are made easy when the learned models are relational, allowing experts to revisit and contradict them. Indeed, a detailed comparison between the models, as translated by experts, shows that the whole process has led to quite a fixed-point. Clearly, learning tasks in a well defined and simple universe with precise and de- terministic rules, such as in games, are good candidates for the interactive methodology we proposed. The extent to which experts and learned models
Construction and Elicitation of a Black Box Model in the Game of Bridge 21 cooperating in this way can be transferred to less well defined problems, including, for instance, noisy environments, has yet to be investigated. Note that complementary techniques can also be applied, such as active learning, where machines or humans dynamically provide new examples to the sys- tem, leading to revised models. Overall, our contribution is a step towards a complete handling of learning tasks for decision problems, giving domain experts (and non-domain experts alike) the tools to both make accurate de- cisions, and to understand and scrutinize the logic behind those decisions. References ACBL, 2019. ACBL (2019). American contract bridge league. https://www.acbl.org/ learn_page/how-to-play-bridge/. Blockeel and Raedt, 1998. Blockeel, H. and Raedt, L. D. (1998). Top-down induction of first-order logical decision trees. Artif. Intell., 101(1-2):285–297. Dumancic et al., 2019. Dumancic, S., Garcia-Duran, A., and Niepert, M. (2019). A com- parative study of distributional and symbolic paradigms for relational learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 6088–6094. International Joint Conferences on Artificial Intelligence Organization. Guidotti et al., 2019. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. (2019). A survey of methods for explaining black box models. ACM Comput. Surv., 51(5):93:1–93:42. Haglund et al., 2014. Haglund, B., Hein, S., et al. (2014). Dds solver. Available at https: //github.com/dds-bridge/dds. Izza and Marques-Silva, 2021. Izza, Y. and Marques-Silva, J. (2021). On explaining ran- dom forests with SAT. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021, pages 2584–2591. Lakkaraju et al., 2019. Lakkaraju, H., Kamar, E., Caruana, R., and Leskovec, J. (2019). Faithful and customizable explanations of black box models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2019, Honolulu, HI, USA, January 27-28, 2019, pages 131–138. ACM. Legras et al., 2018. Legras, S., Rouveirol, C., and Ventos, V. (2018). The game of bridge: a challenge for ilp. In Inductive Logic Programming 28th International Conference, ILP 2018 Proceedings, pages 72–87. Springer. Lloyd, 1987. Lloyd, J. W. (1987). Foundations of logic programming; (2nd extended ed.). Springer-Verlag New York, Inc., New York, NY, USA. Muggleton and Raedt, 1994. Muggleton, S. and Raedt, L. D. (1994). Inductive logic pro- gramming: Theory and methods. J. Log. Program., 19/20:629–679. Murdoch et al., 2019. Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R., and Yu, B. (2019). Interpretable machine learning: definitions, methods, and applications. arXiv e-prints, page arXiv:1901.04592. Patterson, 2008. Patterson, D. (2008). Computer learning in Bridge. Msci Individual Project, Queen Mary College, University of London. Pavlicek, 2014. Pavlicek, R. (2014). Actual play vs. double-dummy. Available at http: //www.rpbridge.net/8j45.htm. Pedregosa et al., 2011. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
22 Véronique Ventos et al Raedt et al., 2016. Raedt, L. D., Kersting, K., Natarajan, S., and Poole, D. (2016). Sta- tistical Relational Artificial Intelligence: Logic, Probability, and Computation. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publish- ers. Ribeiro et al., 2016. Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ”Why Should I Trust You?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144. Ribeiro et al., 2018. Ribeiro, M. T., Singh, S., and Guestrin, C. (2018). Anchors: High- precision model-agnostic explanations. In Proceedings of AAAI-18, pages 1527–1535. Rudin et al., 2021. Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., and Zhong, C. (2021). Interpretable machine learning: Fundamental principles and 10 grand chal- lenges. CoRR, abs/2103.11251. Srinivasan, 1999. Srinivasan, A. (1999). The Aleph manual. Available at http://www. comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/.
Construction and Elicitation of a Black Box Model in the Game of Bridge 23 Appendix 1 7 Tree-based and rule-based models Listing 4 A complete L1-tree learned by Tilde d e c i s i o n ( −A, −B, −C, −D, −E ) nb (A, −F , −G) , gteq (G, 6 ) ? +−−yes : hcp (A, 4 ) ? | +−−yes : [ pass ] 2 . 0 | +−−no : [ bid ] 5 7 . 0 +−−no : hcp (A, −H) , gteq (H, 1 6 ) ? +−−yes : [ bid ] 6 . 0 +−−no : nb (A, spade , − I ) , gteq ( I , 1 ) ? +−−yes : l t e q ( I , 1 ) ? | +−−yes : hcp (A, − J ) , l t e q ( J , 5 ) ? | | +−−yes : [ pass ] 1 1 . 0 | | +−−no : nb (A, −K, 5 ) ? | | +−−yes : hcp (A, 6 ) ? | | | +−−yes : [ pass ] 3 . 0 | | | +−−no : [ bid ] 2 7 . 0 | | +−−no : [ pass ] 3 . 0 | +−−no : hcp (A, 1 5 ) ? | +−−yes : [ bid ] 3 . 0 | +−−no : nb (A, −L , 3 ) ? | +−−yes : [ pass ] 146.0 | +−−no : hcp (A, 1 1 ) ? | +−−yes : [ bid ] 3 . 0 | +−−no : hcp (A, −M) , l t e q (M, 6 ) ? | +−−yes : [ pass ] 1 5 . 0 | +−−no : lteq ( I ,2)? | +−−yes : nb (A, −N, 1 ) ? | | +−−yes : [ bid ] 3 . 0 | | +−−no : hcp (A, −O) , gteq (O, 1 1 ) ? | | +−−yes : [ bid ] 5 . 0 | | +−−no : [ pass ] 1 6 . 0 | +−−no : [ pass ] 5 . 0 +−−no : [ bid ] 5 . 0
24 Véronique Ventos et al Listing 5 A L2 -Aleph-induce max model d e c i s i o n (A, 4 , B , C, bid ) : − nbs (A , 0 ) . d e c i s i o n (A, 4 , B , C, bid ) : − hcp (A,D) , gteq (D, 5 ) , nb (A, E , F ) , gteq ( F , 6 ) , nbs (A , 1 ) . d e c i s i o n (A, 4 , B , C, bid ) : − hcp (A,D) , gteq (D, 9 ) , nbs (A , 1 ) . d e c i s i o n (A, 4 , B , C, bid ) : − hcp (A,D) , gteq (D, 8 ) , nb (A, E , F ) , gteq ( F , 5 ) , nbs (A , 1 ) . d e c i s i o n (A, 4 , B , C, bid ) : − hcp (A,D) , gteq (D, 4 ) , d i s t r i b u t i o n (A, [ 6 , 4 , 2 , 1 ] ) . d e c i s i o n (A, 4 , B , C, bid ) : − hcp (A,D) , gteq (D, 8 ) , nb (A, E , F ) , gteq ( F , 6 ) , nbs (A , 2 ) . d e c i s i o n (A, 4 , B , C, bid ) : − hcp (A,D) , gteq (D, 1 3 ) , nb (A, E , F ) , gteq ( F , 5 ) , nbs (A , 2 ) . d e c i s i o n (A, 4 , B , C, bid ) : − hcp (A,D) , gteq (D, 1 0 ) , nb (A, E , F ) , gteq ( F , 6 ) . d e c i s i o n (A, 4 , B , C, bid ) : − nb (A, D, E ) , gteq ( E , 7 ) . d e c i s i o n (A, 4 , B , C, bid ) : − hcp (A,D) , gteq (D, 1 4 ) , nb (A, E , F ) , gteq ( F , 5 ) . d e c i s i o n (A, 4 , B , C, bid ) : − nb (A, D, E ) , l t e q ( E , 1 ) , nbs (A , 2 ) . d e c i s i o n (A, 4 , B , C, bid ) : − hcp (A,D) , nb (A, E ,D) , nb (A, F ,G) , gteq (G, 6 ) . d e c i s i o n (A, 4 , B , C, bid ) : − hcp (A,D) , gteq (D, 1 2 ) , nb (A, E , F ) , l t e q ( F , 1 ) . d e c i s i o n (A, 4 , B , C, bid ) : − hcp (A,D) , l t e q (D, 7 ) , gteq (D, 7 ) , nb (A, E , F ) , gteq ( F , 6 ) . d e c i s i o n (A, 4 , B , C, bid ) : − hcp (A,D) , gteq (D, 1 4 ) , nbs (A , 2 ) . Appendix 2 8 Expert comparison of models M and H From an expert point of view, the model H from the L1 -Tilde model (see Listing 3) and the model M from the L2 -Aleph-InduceMaxrules (see List- ing 2), both obtained after a human post-processing, are very similar. They differ only on transition zones of certain characteristics, which are of low support. A detailed analysis follows.
Construction and Elicitation of a Black Box Model in the Game of Bridge 25 L0 L1 L2 suit rep(H, Suit, Hon, N um) distribution(H, [N , M, P , Q]) distribution(H, [N , M, P , Q]) nb(H, Suit, N um) nb(H, Suit, N um) nb(H, Suit, N um) hcp(H, N um) hcp(H, N um) hcp(H, N um) semibalanced(H) semibalanced(H) unbalanced(H) unbalanced(H) balanced(H) balanced(H) vuln(P os, V ul, Dlr, N S, EW ) vuln(P os, V ul, Dlr, N S, EW ) vuln(P os, V ul, Dlr, N S, EW ) vuln(P os, V ul, Dlr, N S, EW ) lteq(N um, N um) lteq(N um, N um) lteq(N um, N um) gteq(N um, N um) gteq(N um, N um) gteq(N um, N um) longest suit(H, Suit) longest suit(H, Suit) shortest suit(H, Suit) shortest suit(H, Suit) major(Suit) major(Suit) minor(Suit) minor(Suit) nbs(H, N um) Table 2 Predicates in the domain theories. 8.1 Transition zones within HCP The models differ on the range 14-15 HCP (M2 vs H2). Note that with very unbalanced hand distributions, some parameters will be of a greater influ- ence than a deviation of 1 or 2HCP to take a decision: we can mention for example the number of HCP outside the ♠ suit or the number of HCP in the 4-card-suits (or more). An increase in the associated values would be an in- dication that, for the same value of HCP, the position of honors in the hand of the player is more favorable to the decision bid. 8.2 Transition zones within distribution A M2b rule deals with hands holding a 6-card-suit exactly while those which have a 7-card-suit generate the same decision bid. Within L1 and L2 models hands with 5-card-suits also trigger similar rules. L1 rule H2 always generate a bid while the L2 rules of model M lead to a distinction according to the number of cards in ♠ and the number of HCP: the more cards you have in ♠, the more HCP you need to make the bid decision. If we look more closely, Tilde before post-processing had an embryo of similar rule since it predicted a bid with a 6-card-suit (or more) and exactly 4 HCP. The support being very small (2 examples), it had been ignored.
26 Véronique Ventos et al 8.3 Summary The existence of these transition zones is beyond doubt for experts, but their very identification is likely to advance knowledge in the field. New experi- ments conducted on these transition zones, with a more specific vocabulary devised with experts in the loop, could lead to a more accurate elicitation of the black box model.
You can also read