Construction and Elicitation of a Black Box Model in the Game of Bridge

Page created by Brett Sharp
 
CONTINUE READING
Construction and Elicitation of a Black Box
                                        Model in the Game of Bridge

                                        Véronique Ventos, Daniel Braun, Colin Deheeger, Jean Pierre Desmoulins,
                                        Jean Baptiste Fantun, Swann Legras, Alexis Rimbaud, Céline Rouveirol,
                                        Henry Soldano and Solène Thépaut
arXiv:2005.01633v2 [cs.AI] 4 Apr 2022

                                        Abstract We address the problem of building a decision model for a specific
                                        bidding situation in the game of Bridge. We propose the following multi-
                                        step methodology: i) Build a set of examples for the decision problem and
                                        use simulations to associate a decision to each example ii) Use supervised
                                        relational learning to build an accurate and interpretable model iii) Per-
                                        form a joint analysis between domain experts and data scientists to improve
                                        the learning language, including the production by experts of a handmade
                                        model iv) Use insights from iii) to learn an almost self-explaining and accu-
                                        rate model.

                                        1 Introduction

                                        Our goal is to model expert decision processes in Bridge. To do so, we pro-
                                        pose a methodology involving human experts, black box decision programs,
                                        and relational supervised machine learning systems. The aim is to obtain a
                                        global model for this decision process, that is both expressive and has high
                                        predictive performance.
                                            Following the success of supervised methods of the deep network fam-
                                        ily, and a growing pressure from society imposing that automated deci-

                                        Daniel Braun, Colin Deheeger, Jean Pierre Desmoulins, Jean Baptiste Fantun, Swann
                                        Legras, Alexis Rimbaud, Céline Rouveirol, Henry Soldano, Solène Thépaut and
                                        Véronique Ventos
                                        NukkAI, Paris, France
                                        Henry Soldano and Céline Rouveirol
                                        Université Sorbonne Paris-Nord, L.I.P.N UMR-CNRS 7030
                                        Villetaneuse, France
                                        Henry Soldano
                                        Muséum National d’Histoire Naturelle, I.SY.E.B UMR-CNRS 7205, Paris, France

                                                                                                                       1
2                                                            Véronique Ventos et al

sion processes be made more transparent, a growing number of AI re-
searchers are (re)exploring techniques to interpret, justify, or explain “black
box” classifiers (referred to as the Black Box Outcome Explanation Prob-
lem [Guidotti et al., 2019]). It is a question of building, a posteriori, explicit
models in symbolic languages, most often in the form of rules or decision
trees that explain the outcome of the classifier in a format intelligible to
an expert. Such explicit models extracted by supervised learning can carry
expert knowledge [Murdoch et al., 2019], which has intrinsic value for ex-
plainability, pedagogy, and evaluation in terms of ethics or equity (they can
help to explain biases linked to the learning system or to the set of train-
ing examples). The vast majority of works focus on local interpretability of
decision models: simple local explanations (linear models/rules) are built
to justify individual predictions of black box classifiers (see for instance
the popular LIME [Ribeiro et al., 2016] and ANCHORS [Ribeiro et al., 2018]
systems). Such local explanations do not give an overview of the black box
model behaviour, they moreover rely of the notion of neighborhood of in-
stances to build explainations and may be highly unstable. Our approach is
to build post-hoc global explanations by approximating the predictions of
black box models with interpretable models such as decision lists and rule
sets.
   We present here a complete methodology for acquiring a global model
of a black box classifier as a set of relational rules, both as explicit and as
accurate as possible. As we consider a game, i.e. a universe with precise and
known rules, we are in the favorable case where it is possible to generate, on
demand, data that i) is in agreement with the problem specification, and ii)
can be labelled with correct decisions through simulations. The methodol-
ogy we propose consists of the following elements:
1. Problem modelling with relational representations, data generation and
   labelling.
2. Initial investigation of the learning task and learning with relational
   learners.
3. Interaction with domain experts who refine a learned model to produce
   a simpler alternative model, which is more easily understandable for do-
   main users.
4. Subsequent investigation of the learning task, taking into account the
   concepts used by experts to produce their alternative model, and the
   proposition of a new, more accurate model.
The general idea is to maximally leverage interactions between experts and
engineers, with each group building on the analysis of the other.
   We approach the learning task using relational supervised learning meth-
ods from Inductive Logic Programming (ILP) [Muggleton and Raedt, 1994].
The language of these methods, a restriction of first-order logic, allows
learning compact rules, understandable by experts in the domain. The logi-
cal framework allows the use of a domain vocabulary together with domain
Construction and Elicitation of a Black Box Model in the Game of Bridge             3

knowledge defined in a domain theory, as illustrated in early work on the
subject [Legras et al., 2018].
   The outline of the article is as follows. After a brief introduction to bridge
in Section 2, we describe in Section 3 the relational formulation of the target
learning problem, and the method for generating and labelling examples.
We then briefly describe in Section 4 the ILP systems used, and the first
set of experiments run on the target problem, along with their results (Sec-
tion 4.2). In Section 5, bridge experts review a learned model’s output and
build a powerful alternative model of their own, an analysis of which leads
to a refinement of the ILP setup and further model improvements. Future
research avenues are outlined in the conclusion.

2 Problem Addressed

Bridge is played by four players in two competing partnerships, namely,
North and South against East and West. A classic deck of 52 playing cards
is shuffled and then dealt evenly amongst the players ( 52
                                                        4 = 13 cards each).
The objective of each side is to maximize a score which depends on:
• The vulnerability of each side. A non-vulnerable side loses a low score
  when it does not make it’s contract, but earns a low score when it does
  make it. In contrast, a vulnerable side has higher risk and reward.
• The contract reached at the conclusion of the auction (the first phase
  of the game). The contract is the commitment of a side to win a mini-
  mum of lmin ∈ {7, . . . 13} tricks in the playing phase (the second phase of
  the game). The contract can either be in a Trump suit (♣, q, r, ♠), or No
  Trumps (NT), affecting which suit (if any) is to gain extra privileges in the
  playing phase. An opponent may Double a contract, thus imposing a big-
  ger penalty for failing to make the contract (but also a bigger reward for
  making it). A contract is denoted by pS (or pS X if it is Doubled), where
  p = lmin − 6 ∈ {1, . . . 7} is the level of the contract, and S ∈ {♣, q, r, ♠, N T }
  the trump suit. The holder of the contract is called the declarer, and the
  partner of the declarer is called the dummy.
• The number of tricks won by the declaring side during the playing
  phase. A trick containing a trump card is won by the hand playing the
  highest trump, whereas a trick not containing a trump card is won by the
  hand playing the highest card of the suit led.

For more details about the game of bridge, the reader can consult [ACBL, 2019].
Two concepts are essential for the work presented here:
• Auction: This allows each player (the first being called the dealer) the
  opportunity to disclose coded information about their hand or game plan
4                                                                   Véronique Ventos et al

  to their partner1 . Each player bids in turn, clockwise, using as a language
  the elements: Pass, Double, or a contract higher than the previous bid
  (where ♣ < q < r < ♠ < N T at each level). The last bid contract, followed
  by three Passes, is the one that must be played.
• The evaluation of the strength of a hand: Bridge players assign a value
  for the highest cards: an Ace is worth 4 HCP (High Card Points), a King
  3 HCP, a Queen 2 HCP and a Jack 1 HCP. Information given by the players
  in the auction often relate to their number of HCP and their distribution
  (the number of cards in one or more suits).

2.1 Problem Statement

After receiving suggestions from bridge experts, we chose to analyse the
following situation:
• West, the dealer, bids 4♠, which (roughly) means that they have a mini-
  mum of 7 spades, and a maximum of 10 HCP in their hand.
• North Doubles, (roughly) meaning that they have a minimum of 13 HCP
  and, unless they have a very strong hand, a minimum of three cards in
  each of the other suits (♣, q and r).
• East passes, which has no particular meaning.
South must then make a decision: pass and let the opponents play 4♠X , or
bid, and have their side play a contract. This is a high stakes decision that
bridge experts are yet to agree on a precise formulation for. Our objective is
to develop a methodology for representing this problem, and to find accu-
rate and explainable solutions using relational learners. It should be noted
that Derek Patterson was interested in solving this problem using genetic
algorithms [Patterson, 2008].
   In the remainder of the article, we describe the various processes used in
data generation, labelling, supervised learning, followed by a discussion of
the results and the explicit models produced. These processes use relational
representations of the objects and models involved, keeping bridge experts
in the loop and allowing them to make adjustments where required.
   In the next section we consider the relational formulation of the problem,
and the data generation and labelling.

1The coded information given by a player is decipherable by both their partner and the
opponents, so one can only deceive their opponents if they’re also willing to deceive their
partner. In practice, extreme deception in the auction is rare, but for both strategical and
practical reasons, the information shared in the auction is usually far from complete.
Construction and Elicitation of a Black Box Model in the Game of Bridge     5

3 Automatic Data Generation and Modelling Methodology

The methodology to generate and label the data consists of the following
steps:
•   Problem modelling
•   Automatic data generation
•   Automatic data labelling
•   ILP framing
These steps are the precursors to running relational rule induction (Aleph)
and decision tree induction (Tilde) on the problem.

3.1 Problem Modelling

The problem modelling begins by asking experts to define, in the context
described above, two rule based models: one to characterize the hands such
that West makes the 4♠ bid, and another to characterize the hands such
that North makes the Double bid. These rule based models are submitted
to simulations allowing the experts to interactively validate their models.
With the final specifications, we are able to generate examples for the target
problem. For this section we introduce the terms:
• nmpq exact distribution which indicates that the hand has n cards in ♠, m
  cards in r, p cards in q and q cards in ♣, where n + m + p + q = 13.
• nmpq distribution refers to an exact distribution sorted in decreasing or-
  der (thus ignoring the suit information). For instance, a 2533 exact dis-
  tribution is associated to a 5332 distribution.
• |c| is the number of cards held in the suit c ∈ {♠, r, q, ♣}.

3.1.1 Modelling the 4♠ bid

Experts have modeled the 4♠ bid by defining a disjunction of 17 rules that
relate to West hand:
                             17
                             _
                      4♠ ←         Ri , where Ri = C0 ∧ Vi ∧ Ci            (1)
                             i=1

in which
• C0 is a condition common to the 17 rules and is reported in Listing ?? in
  the Appendix.
• Vi is one of the four possible vulnerability configurations (no side vulner-
  able, both sides vulnerable, exactly one of the two sides vulnerable).
6                                                                           Véronique Ventos et al

• Ci is a condition specific to the Ri rule.
For instance, the conditions for rules R2 and R5 are:
• V2 = East-West not vulnerable, North-South vulnerable.
• C2 = all of:
    – |r| = 1,
    – 2 cards exactly among Ace, King, Queen and Jack of ♠,
    – a 7321 distribution.
• V5 = East-West not vulnerable, North-South vulnerable.
• C5 = all of:
    – |♣| ≥ 4 or |q| ≥ 4,
    – 2 cards exactly among Ace, King, Queen and Jack of ♠,
    – a 7mpq distribution with m ≥ 4.
To generate boards that satisfy these rules, we randomly generated com-
plete boards (all 4 hands), and kept the boards where the West hand satisfies
one of the 17 rules. The experts were able to iteratively adjust the rules as
they analysed boards that either passed through the filter, or failed to pass
through the filter (but perhaps should have). After the experts were happy
with the samples, 8,200,000 boards were randomly generated to analyse
rule adherence, and 10,105 of them contained a West hand satisfying one
of the 17 rules. All rules were satisfied at least once. Of the times where at
least one rule was satisfied, R2 , for example, was satisfied 16.2% of the time,
and R5 was satisfied 15% of the time.

3.1.2 Double Modelling

Likewise, the bridge experts also modeled the North Double by defining a
disjunction of 3 rules relating to the North hand:
                                        3
                                        _
                           Double ←           R0i , where R0i = C00 ∧ Ci0                      (2)
                                        i=1

This time, the conditions do not depend on the vulnerability. The common
condition C00 and the specific conditions Ci0 are as follows:
•   C00   - for all c, c1 , c2 ∈ {r, q, ♣}, |c| ≤ 5 and not (|c1 | = 5 and |c2 | = 5).
•   C10   - HCP ≥ 13 and |♠| ≤ 1.
•   C20   - HCP ≥ 16 and |♠| = 2 and |r| ≥ 3 and |q| ≥ 3 and |♣| ≥ 3.
•   C30   - HCP ≥ 20.
The same expert validation process was carried out as in Section 3.1.1. A
generation of 70,000,000 boards resulted in 10,007 boards being satisfied by
at least one 4♠ bid rule for the West hand and at least one Double rule for
Construction and Elicitation of a Black Box Model in the Game of Bridge       7

the North hand. All rules relating to Double were satisfied at least once. Of
the times that at least one rule was satisfied, R01 , for example, was satisfied
69.3% of the time, and R02 was satisfied 24.8% of the time.

3.2 Data Generation

The first step of the data generation process is to generate a number of South
hands in the context described by the 4♠ and Double rules mentioned above.
Note, again, that East’s Pass is not governed by any rules, which is close to
the real situation.
   We first generated 1,000 boards whose West hands satisfied at least one
4♠ bid rule and whose North hands satisfied at least one Double rule. One
such board is displayed in Example 1:

Exemple 1 A board (North-South vulnerable / East-West not vulnerable) which
contains a West hand satisfying rules R0 and R5 and a North hand satisfying
rules R00 and R01 :
                                     ♠   6
                                     r   AQ 9 3
                                     q   K Q 10 4 2
                                     ♣   A9 4

                                             N
                     ♠   AK J 8752                    ♠   10 3
                     r   7                            r    J 84
                                         W       E
                     q   J 863                        q   A 7
                     ♣   5                            ♣   Q J 8762
                                             S

                                     ♠   Q 9 4
                                     r   K 10 6 5 2
                                     q   9 5
                                     ♣   K 10 3

   For reasons that become apparent in Section 3.3, for each of the 1000
generated boards, we generated an additional 999 boards. We did this by
fixing the South hand in each board, and randomly redistributing the cards
of the other players until we found a board satisfying Equations 1 and 2. As
a result of this process, 1,000 files were created, with each file containing
1,000 boards that have the same South hand, but different West, North and
East hands.

Exemple 2 One of the 999 other boards generated:
• West hand: ♠AK108653 r4 q832 ♣84
• North hand: ♠J2 rAJ73 qAK106 ♣AJ9
• East hand: ♠7 rQ98 qQJ74 ♣Q7652
8                                                       Véronique Ventos et al

• South hand: ♠Q94 rK10652 q95 ♣K103.
Note that the South hand is identical to the one in Example 1, the West hand
satisfies rules R0 and R2 , and the North hand satisfies rules R00 and R02 .

3.3 Data Labelling

The goal is to label each of the 1,000 South hands associated to the 1,000
sample files, with one of following labels:
• Pass when the best decision of South is to pass (and therefore have West
  play 4♠X ).
• Bid when the best decision of South is to bid (and therefore play a con-
  tract on their side).
• ? when the best decision is not possible to be determined. We exclude
  these examples from ILP experiments.
   These labels were assigned by computing the score at 4♠X and other pos-
sible contracts by simulation.

3.3.1 Scores Computation

During the playing phase, each player sees their own hand, and that of
the dummy (which is laid face up on the table). A Double Dummy Solver
(DDS) is a computation (and a software) that calculates, in a deterministic
way, how many tricks would be won by the declarer, under the pretext that
everyone can see each other’s cards. Though unrealistic, it is nevertheless
a good estimator of the real distribution of tricks amongst the two sides
[Pavlicek, 2014]. For any particular board, a DDS can be run for all possi-
ble contracts, so one can thus deduce which contract will likely yield the
highest score.
   For each example file (which each contain 1,000 boards with the same
South hand), a DDS software [Haglund et al., 2014] was used to determine
the score for the following situations:
• 4♠X played by the East/West side.
• All available contracts played by the North/South side, with the exclu-
  sion of 4NT, 5NT, 5♠, 6♠ and 7♠ (experts deemed these contracts infeasi-
  ble to be reached in practice).

3.3.2 Labels Allocation

The labels are assigned to each unique South hand by applying the following
tests sequentially.
Construction and Elicitation of a Black Box Model in the Game of Bridge            9

1. Take the DDS score from the best North/South contract on each of the
   1,000 boards (the optimal contract may not always be the same), and
   average them. If the average score of defending 4♠X is higher than this
   score, assign the label of Pass to the South hand.
2. Take the North/South contract with the highest mean DDS score over all
   of the boards (this may not be the best contract on each board, but merely
   the contract with the highest mean score across all boards). If the mean
   score of this contract is greater than the mean score at 4♠X , assign Bid to
   the South hand. If the mean score of the contract is at least 30 total points
   lower than the mean score at 4♠X , assign Pass. Otherwise, assign the label
   ?.
  The initial dataset containing 1,000 sample files is reduced to a set S of
961 examples after elimination of the 39 deals with the ? label. The resulting
dataset consists of:
• 338 Bid labels, and
• 623 Pass labels.

3.4 Relational data modelling

Now that we have generated and labeled our data, the next step is to create
the relational representations to be used in relational learners. The learning
task consists in using what is known about a given board in the context
described above, and the associated South hand, and predicting what the
best bid is. The best bid is known for the labeled examples generated in the
previous section, but, of course, unknown for new boards.
   We first introduce the notations used in the remainder of the paper.

3.4.1 Logic Programming Notations

Examples are represented by terms and relations between terms. Terms are
constants (for instance the integers between 1 and 13), or atoms whose iden-
tifiers start with a lower-case character (for instance west, east, spade, heart,
. . . ) or variables, denoted by an upper-case character (X, Y , Suit, . . . ). Vari-
ables may instantiate to any constant in the domain. Relations between ob-
jects are described using predicates applied to terms. A literal is a predicate
symbol applied to terms, and a fact is a ground literal, i.e. a literal with-
out variables, whose arguments are ground terms only. For instance, deci-
sion([sq, s9, s4, hk, h10, h6, h5, h2, d9, d5, ck, c10, c3], 4, n, west, bid) is a
ground literal of predicate symbol decision, with four arguments, namely,
a hand described as a list of cards, and 4 constant arguments (see Section
3.4.2 for further details).
10                                                                Véronique Ventos et al

   In order to introduce some flexibility in the problem description, the
background knowledge (also known as the domain theory) is described as set
of definite clauses. We can think of a definite clause as a first order logic rule,
containing a head (the conclusion of the rule) and a body (the premises of
the rule). For instance, the following clause:
nb(Hand, Suit, V alue) ← suit(Suit), count cards of suit(Hand, Suit, V alue)
has the head nb(Hand, Suit, V alue) where Hand, Suit and V alue are vari-
ables and two literals in the body suit(Suit) and count cards of suit(Hand,
Suit, V alue). The declarative interpretation of the rule is that given that
variable Suit is a valid bridge suit, and given that V alue is the num-
ber of cards of the suit Suit in the hand Hand, one can derive, in ac-
cordance with the logical consequence relationship |= [Lloyd, 1987] that
nb(Hand, Suit, V alue) is true. This background knowledge will be used to
derive additional information concerning the examples, such as hand and
board properties.
   Let us now describe the decision predicate, the target predicate of the
relational learning task.

3.4.2 Target Predicate

Given a hand, the position of the hand, vulnerability, and the dealer’s posi-
tion, the goal is to predict the class label. The target predicate for the bid-
ding phase is decision(Hand, Position, Vul, Dealer, Class) where the arguments
are variables.
• Hand: the 13 cards of the player who must decide to bid or pass.
• Position: the relative position of the player to the dealer. In this task, Po-
  sition is always 4, the South hand.
• Vul: the vulnerability configuration (b = both sides, o = neither side, n =
  North-South or e = East-West).
• Dealer: the cardinal position of the dealer among north, east, south or west.
  In this task, Dealer is always west.
• Class: label to predict (bid or pass).
In this representation, a labeled example is a grounded positive literal con-
taining the decision predicate symbol. Example 1 given in section 3.2 is
therefore represented as:
decision([sq, s9, s4, hk, h10, h6, h5, h2, d9, d5, ck, c10, c3], 4, n, west, bid). Such a
ground literal contains all the explicit information about the example. How-
ever, in order to build an effective relational learning model of the target
predicate as a set of rules, we need to introduce abstractions of this informa-
tion, in the form of instances of additional predicates in the domain theory.
These predicates are described below.
Construction and Elicitation of a Black Box Model in the Game of Bridge              11

3.4.3 Target concept predicates

In order to learn a general and explainable model for the target concept,
we need to carefully define the target concept representation language by
choosing predicate symbols that can appear in the model, more specifically,
in the body of definite clauses for the target concept definition. We thus
define new predicates in the domain theory, both extensional (defined by
a set of ground facts) and intensional (defined by a set of rules or definite
clauses). The updated domain theory allows to complete the example de-
scription, by gathering/deriving true facts related to the example. It also
implicitly defines a target concept language that forms the search space for
the target model which the Inductive Logic Programming algorithms will
explore using different strategies. Part of the domain theory for this prob-
lem was previously used in another relational learning task for a bridge
bidding problem [Legras et al., 2018], illustrating the reusability of domain
theories. The predicates in the target concept language are divided in two
subsets:
• Predicates that were introduced in [Legras et al., 2018]. These include
  both general predicates, such as gteq(A, B) (A ≥ B), lteq(A, B) (A ≤ B) and
  predicates specific to bridge, such as nb(Hand, Suit, N umber), which tests
  that a suit Suit of a hand Hand has length N umber (e.g. nb(Hand, heart, 3)
  means that the hand Hand has 3 cards in r)
• Higher level predicates not used in [Legras et al., 2018]. Among which:
   – hcp(Hand, N umber) is the number of High Card Points (HCP) in Hand.
   – suit representation(Hand, Suit, Honors, N umber) is an abstract repre-
     sentation of a suit. Honors is the list of cards strictly superior to 10 of
     the suit Suit in hand Hand. N umber is the total number of cards in
     Suit suit.
   – distribution(Hand, [N , M, P , Q]) states that the nmpq distribution (see
     Section 3.1) of hand Hand is the ordered list [N , M, P , Q].
   For instance, in Example 1 from Section 1, we may add to the represen-
tation the following literals:
• hcp([sq, s9, s4, hk, h10, h6, h5, h2, d9, d5, ck, c10, c3], 8).
• suit representation([sq, s9, s4, hk, h10, h6, h5, h2, d9, d5, ck, c10, c3], spade, [q], 3).
• distribution([sq, s9, s4, hk, h10, h6, h5, h2, d9, d5, ck, c10, c3], [2, 3, 3, 5]).
   Note again, that a fact (i.e. ground literal) can be independent of the rest
of the domain theory, such as has suit(hk, heart) which states that hk is a r
card, or it can be inferred from existing facts and clauses of the domain the-
ory, such as nb([sq, s9, s4, hk, h10, h6, h5, h2, d9, d5, ck, c10, c3], heart, 5), which
derives from:
• suit(heart): heart is a valid bridge suit.
• has suit(hk, heart): hk is a heart card.
12                                                              Véronique Ventos et al

• nb(Handlist, Suit, V alue) ← suit(Suit), count card of suit(Handlist, Suit,
  V alue) where count card of suit has a recursive definition within the
  domain theory.

4 Learning Expert Rules

4.1 Inductive Logic Programming Systems

The ILP systems used in our experiments were Aleph and Tilde. These are
two mature state of the art ILP systems of the symbolic SRL paradigm2 . Re-
cent work showed that such ILP systems, Tilde in particular, are competitive
with recent distributional SRL approaches that first learn a knowledge graph
embedding to encode the relational datasets into score matrices that can then
fed to non-relational learners. Given a range of some relational datasets,
[Dumancic et al., 2019] shows that there is no clear winner on classification
tasks when comparing distributional and symbolic learning methods, and
that both exhibit strengths and limitations. Tilde on the other hand outper-
forms KGE based approaches on Knowledge Base Completion tasks (namely
concerning the ability to infer the status (true/false) for missing facts in re-
lational datasets).
   As our work aims to build explainable models in a deterministic context
we only consider here ILP systems. The most prominent difference between
Aleph and Tilde is their output: a set of independent relational rules for
Aleph and a relational decision tree for Tilde. One important declarative
parameter of both systems – and in symbolic SRL systems in general – is the
so-called language bias that specifies which predicates defined in the back-
ground knowledge may be used and in what form these predicates appear
in the learned model (the rule set for Aleph and the decision tree for Tilde)

4.1.1 Aleph

Given a set of positive examples E + , a set of negative examples E − , each
as ground literals of the target concept predicate and a domain theory T ,
Aleph [Srinivasan, 1999] builds a hypothesis H as a logic program, i.e. a set
of definite clauses, such that ∀e+ ∈ E + : H ∧ T |= e+ and ∀e− ∈ E − : H ∧ T 6|= e− .
In other words, given the domain theory T , and after learning hypothesis
H, it is possible from each positive example description e+ to derive that e+
is indeed a positive example, while such a derivation is impossible for any

2 SRL stands for Statistical Relational Learning and is a recent development of ILP
[Raedt et al., 2016] that integrates learning and reasoning under uncertainty about in-
dividuals and actions effect.
Construction and Elicitation of a Black Box Model in the Game of Bridge           13

of the negative examples e− . The domain theory T is represented by a set of
facts, i.e. ground positive literals from other predicates and clauses, possibly
inferred from a subset of primary facts and a set of unground clauses (see
Section 3.4). H is a set of definite clauses where the head predicate of each
rule is the target predicate. An example of such a hypothesis H is given in
Listing 5.

4.1.2 Tilde

Tilde3 [Blockeel and Raedt, 1998] relies on the learning by interpretation set-
ting. Tilde learns a relational binary decision tree in which the nodes are
conjunctions of literals that can share variables with the following restric-
tion: a variable introduced in a node cannot appear in the right branch be-
low that node (i.e. the failure branch of the test associated to the node). In
the decision tree, each example, described through a set of ground literals, is
propagated through the current tree until it reaches a leaf. In the final tree,
each leaf is associated to the decision bid/pass, or a probability distribution
on these decisions. An example of a Tilde decision tree output is given in
Listing 4.

4.2 Learning setup

In our problem, the target predicate is decision, as defined in Section 3.4.2
and exemplified in Section 3.4.2. Regarding Aleph, in positive examples the
last argument of the target predicate has value bid while in negative exam-
ples it has value pass. The Aleph standard strategy induce learns the logic
program as a set of clauses using a hedging strategy: at each iteration it se-
lects a seed positive example e not entailed by the current solution, find a
clause Re which covers e, and then remove from the learning set the positive
examples covered by Re . The learning procedure is repeated until all pos-
itive examples are covered. The induce strategy is sensitive to the order of
the learning examples. We also used Aleph with induce max, which is more
expensive but insensitive to the order of positive examples: it constructs a
best clause, in terms of coverage, for each positive example, then selects a
clause subset. induce max has proved beneficial in some of our experiments.

3   Available as part of the ACE Ilp system at https://dtai.cs.kuleuven.be/ACE/
14                                                            Véronique Ventos et al

4.2.1 Experimental setup

The performance of the learning system with respect to the size of the train-
ing set is averaged over 50 executions i. The sets T esti and T raini are built
as follows:
• T esti = 140 examples randomly selected from S using seed i.
• Each T esti is stratified, i.e. it has the same ratio of examples labeled bid
  as S (35.2 %).
• T raini = S \ T esti (820 examples).
   For each execution i, we randomly generate from T raini a sequence of
increasing number of examples nk = 10 + 100 × k with k between 0 and 8.
We denote each of these subsets Ti,k , so that |Ti,k | = nk and Ti,k ⊂ Ti,k+1 . Each
model trained on Ti,k is evaluated on T esti .

4.2.2 Evaluation criteria

As we can interpret our methodology as a post-hoc agnostic black-box ex-
planation framework, we evaluate this framework in terms of fidelity, unam-
biguity and interpretability[Lakkaraju et al., 2019].
• A high fidelity explanation should faithfully mimic the behavior of the
  black box model. To measure fidelity we use the accuracy of the surrogate
  relational model with respect to the he labels assigned by the black box
  model.
• Unambiguity evaluates if one or multiple (potentially conflicting) expla-
  nations are provided by the relational surrogate model for a given in-
  stance. While we do not explicitly report on unambiguity in our experi-
  ments, we note that decision trees by nature are non ambiguous models.
  Rule sets, on the other hand, may indeed display some overlap. However,
  but as our Aleph experiments only consider one target label (see Listing
  5) even if two rules apply for some instance, they conclude on the same
  label and the model again displays no ambiguity. That does not mean
  that the notion is useless in our context, rather that we have to find more
  sophisticate ways to evaluate in which sense such models include some
  level of ambiguity.
• Interpretability quantifies how easy it is to understand and reason about
  the explanation. This dimension of explanation is subjective and as such,
  maybe prone to the expert/user bias. We therefore choose a crude but
  objective metric: the model complexity, i.e. the number of rules/nodes of
  the models. We will also see that in our experiments we obtain similar
  decision tree accuracies with L2 , a language containing only predicates
  appearing in a human-made model, as with the much larger language L1 .
  Clearly, the L2 models have better interpretability.
Construction and Elicitation of a Black Box Model in the Game of Bridge               15

4.3 First experiments

The first experiments involve a baseline non relational data representation,
together with the L0 language mentioned earlier as well as a smaller lan-
guages L1 . Table 1 and Figures 1 and 2 display the results of our experiments
as well as results obtained using a different language L2 further introduced
in Section 5.2.
   We have first learned Random Forest models, using the sklearn python
package[Pedregosa et al., 2011], with default parameters including using
forests made of 100 trees. The data representation is as follows: each hand
is described by 54 attributes: vulnerability, target class, highest card, second
highest card, . . . , 13th highest card in each suit (ordered as ♣ < q < r < ♠). If a
player has n cards in a given suit, the corresponding n + 1, . . . , 13th attributes
for this suit are set to 0.
   The relational models are learned with Tilde and Aleph where Aleph is
trained on both the induce and induce max settings. Figure 1-left presents
the average accuracy, over all i = 1 . . . 50 executions, of the models trained
on the sets Ti,k and therefore requiring building 450 models for each curve.
Figure 2 displays the corresponding model complexity, expressed in average
number of nodes for Tilde and Random Forest4 . Table 1 displays the aver-
age accuracies and complexities of each experiment for learning set sizes
310 (k=4) and 810 (k=9). An experiment represents 450 Tilde or Aleph exe-
cutions, and the resulting CPU cost of these executions is also recorded. As
the experiments were performed on similar machines with a different num-
ber of cores (either 8 or 16) we report the CPU cost as the number of cores
of the machine times the total CPU time, i.e. in core hours units5 .
   Aleph failed to cope with the large L0 language, and couldn’t return a
solution in reasonable time6 . As can be seen in Figure 1 and Table 1, the L0 -
Tilde models had rather low accuracy and a large number of nodes (which
grew linearly with the number of training examples used).
   After an analysis of the Tilde output trees, it became apparent that Tilde
was selecting many nodes with very specific conditions, in particular, with
the suit representation and distribution predicates. While this improved per-
formance in training, the specificity hindered the performance in testing.
To counter this, we removed from L0 the suit representation and distribution
predicates, creating the more concise language L1 .
   As expected, the average number of nodes in L1 -Tilde reduced dramati-
cally. L1 -Tilde had the best accuracy of all models, which continued to in-
crease between 610 and 810 examples. Its complexity was relatively stable,
4 For Random Forest, the reported number of nodes is an average over the 100 trees of a
model
5 For instance, 16 core-hours represents 1 hour of CPU time on a 16-core machine or 2

hours CPU time on a 8-core machine.
6 Aleph was able to learn models after we removed the nb predicate from L , albeit rather
                                                                         0
slowly (≈ 384 core hours).
16                                                                 Véronique Ventos et al

Fig. 1 Average model accuracies of L0 , L1 models (left) and L2 models (right).

Fig. 2 Average model complexities of rule-based (left) and tree-based (right) models.

Table 1 Summary of average accuracy and complexity for all models when trained on
310 and 810 examples, together with total CPU cost of the 450 Tilde or Aleph executions
in CPUtime×hours. Reported complexity of Random Forest only concerns one tree among
the 100 trees in the forest.
Construction and Elicitation of a Black Box Model in the Game of Bridge                17

increasing only slightly between 510 and 810 examples, to a total of 21 tree
nodes. With the smaller L1 language, L1 -Aleph-induce was able to find solu-
tions in reasonable time. It’s accuracy was worse than that of L1 -Tilde (85.6%
compared to 89.8% when training on 810 examples). After 310 examples,
the L1 -Aleph-induce model size remained stable at 16 rules (its accuracy
also showed stability over the same training set sizes).
   We also used Aleph’s induce max strategy with L1 . Compared to induce,
this search strategy has a higher time cost and can produce rules with large
overlaps in coverage of learning examples, but can generalize better due to
the fact that it saturates every example in the training set, not just a subset
based on the coverage of a single seed example. This search strategy was of
no benefit in this case, with L1 -Aleph-induce max showing slightly worse
accuracy (84.4% compared to 85.6%) and a much higher complexity (47
rules compared to 15) when compared with L1 -Aleph-induce.
   The CPU cost of learning the Random Forest models is negligible (less
than 10 seconds) compared to those of relational models. The accuracies
lie between those of L1 -Tilde and of L1 -Aleph models, while the num-
ber of nodes per tree is much higher (see Figure 2-right). This indicates
that there is no gain in using ensemble learning on non-relational deci-
sion trees, although the data representation is much simpler and needs
only limited Bridge expertise to define. The Random Forest models are,
as expected, difficult to interpret, apart from variable importance: It ap-
pears that the first three highest cards in each suit and vulnerability are
the most important variables in this task. It would be possible to use post-
hoc formal methods to a posteriori explain those random forest models
[Izza and Marques-Silva, 2021], however the whole process would be ex-
pensive, and, after [Rudin et al., 2021], Explaining black boxes, rather than
replacing them with interpretable models, can make the problem worse by pro-
viding misleading or false characterizations.
   Following these experiments, we had bridge experts do a thorough anal-
ysis of one of the successful L1 -Tilde outputs. Their goal was to create a sub-
model which was much easier to interpret than the large decision trees7 ,
and had an accuracy comparable to that seen by L1 -Tilde. We also used the
experts’ insights to further revise the domain theory, in the hope of learning
a simpler model with Aleph and Tilde. We describe this process in the next
section.

7 The experts noted that, in general, the Tilde trees were more difficult to read than the

rule-based output produced by Aleph.
18                                                                      Véronique Ventos et al

5 Expert Rule Analysis and new Experiments

5.1 Expert Rule Analysis

To find an L1 -Tilde output tree for the experts to analyse, we chose a model
with good accuracy and low complexity (a large tree would be too hard to
parse for a bridge expert). A subtree of the chosen L1 -Tilde output is dis-
played in Listing 1 (the full tree is displayed in Listing 4 in the Appendix).
Despite the low complexity of this tree (only 15 nodes), it remains somewhat
difficult to interpret.
Listing 1 Part of the L1-tree learned by Tilde
d e c i s i o n ( −A, −B, −C, −D, −E )
nb (A, −F , −G) , gteq (G, 6 ) ?
+−−yes : hcp (A, 4 ) ?
|               +−−yes : [ pass ]
|               +−−no :    [ bid ]
+−−no : hcp (A, −H) , gteq (H, 1 6 ) ?
                +−−yes : [ bid ]
                +−−no : nb (A, spade , − I ) , gteq ( I , 1 ) ?
                           +−−yes : l t e q ( I , 1 ) ?
                           |           +−−yes : hcp (A, − J ) , l t e q ( J , 5 ) ?
                           |           |           +−−yes : [ pass ]
                           |           |           +−−no : nb (A, −K, 5 ) ?
                           |           |                     +−−yes : hcp (A, 6 ) ?
                           |           |                     |             +−−yes : [ pass ]
                           |           |                     |             +−−no :   [ bid ]
                           |           |                     +−−no :        [ pass ]

   Simply translating the tree to a decision list does not solve the issue of
interpretability. Thus, with the help of bridge experts, we proceeded to
do a manual extraction of rules from the given tree, following these post-
processing steps:
1. Natural language translation from relational representations.
2. Removal of nodes with weak support (e.g. a node which tests for precise
   values of HCP instead of a range).
3. Selection of rules concluding on a bid leaf.
4. Reformulation of intervals (e.g. combining overlapping ranges of HCP or
   number of cards).
5. Highlighting the specific number of cards in ♠, and specific nmpq distri-
   butions.
   The set of five rules resulting from this post-processing, shown Listing 3
in the next section, are much more readable than L1 -Tilde trees. Despite its
low complexity, this set of rules has an accuracy of 90.6% on the whole of S.
Construction and Elicitation of a Black Box Model in the Game of Bridge    19

5.2 New experiments

Following this expert review, we created the new L2 language, which only
used the predicates that appear in the post-processed rules above. This
meant the reintroduction of the distribution predicate, the addition of the
nbs predicate (which is similar to nb but denotes only the number of ♠ in
the hand), and the omission of several other predicates. We display Table 2
in the Appendix the predicates involved in the languages L0 , L1 and L2 . Re-
sults are displayed together withe those regarding L0 and L1 on Figure 1,
Figure 2 and Table 1.
   Using L2 with Aleph-induce results in a much simpler model than L1 -
Aleph-induce (6 rules compared to 16 when trained on 810 examples), at
the cost of somewhat lower accuracy (83.5% compared to 85.6%). The com-
plexity of L2 -Tilde was also smaller than its counterpart L1 -Tilde (12 nodes
compared to 15), also at the expense of accuracy (87.3% compared to 89.8%).
In both cases, the abstracted and simplified L2 language produced models
which were far less complex (and thus more easily interpretable), but had a
slight degradation in accuracy, when compared with L1 .
   When using L2 the more expensive induce max search strategy proved
useful, with L2 -Aleph-induce max showing an accuracy of 88.2%, second
only to L1 -Tilde (89.8%). This was at the cost of complexity, with L2 -Aleph-
induce max models having an average of 20 rules compared to 6 for L2 -
Aleph-induce.
   We selected a L2 -Aleph-induce max model, displayed Listing 5 in the Ap-
pendix. The rules of this model are relatively easy to apprehend, and the
model achieves an accuracy of 89.7% on the whole example set S. We pro-
vide Listing 2 the experts translation M of this model and compare it to the
rewriting H displayed Listing 3 of the L1 -Tilde tree model which led to the
L2 language definition. The translation is much simpler that the previous
one and required only steps 1,4 and 5 of the original translation process.
An in-depth expert explanation of how the translated models differ is pro-
vided in the Appendix. It is noticeable that the two translated models are
very similar: each rule in M matches, with only slight differences, the rule
in H with same number. The only apparent exception is the overly specific
rule M2b. A closer look to rule H2 reveals that rule M2b completes rule M2
by considering the case in which the player has 6 cards, leading to a closer
match to human rule H2.
20                                                                Véronique Ventos et al

Listing 2 Translation of the L2 -Aleph-induce max model
We bid :
M1 − With 0♠. .
M2 − With 7+ c a r d s i n a non−spade s u i t OR 14+HCP.
M2b − With 6+ c a r d s i n a non−spade s u i t AND
         ( 1 ♠ AND 5−13HCP)     OR
         ( 2 ♠ AND 7−13HCP)     OR
         ( 3 ♠ AND 10−13HCP)
M3 − With 1♠ AND (5−4−3−1♠ OR 5−5−2−1♠ ) AND 8−13HCP.
M4 − With 2♠ AND 5−5−2♠−1
M5 − With 2♠ AND (5−4−2−2♠ OR 5−3−3−2♠ ) AND 13HCP.

Listing 3 Rules extracted by the experts from the L1-Tilde tree
We bid :
H1 . − With   0♠ .
H2 − With     6+ c a r d s i n a non−spade s u i t      OR 16+HCP.
H3 − With     1♠ AND (5−4−3−1♠ or 5−5−2−1♠ )            AND 6−15HCP.
H4 − With     2♠ AND       5−5−2♠−1                     AND 7−15HCP.
H5 − With     2♠ AND       5−4−2−2♠                     AND 11−15HCP.

6 Conclusion

We conducted an end-to-end experiment to acquire expert rules for a bridge
bidding problem. We did not propose new technical tools, our contribution
was in investigating a complete methodology to handle a learning task from
beginning to end, i.e. from the specifications of the learning problem to a
learned model, obtained after expert feedback, who not only criticise and
evaluate the models, but also propose hand-made (and computable) alter-
native models. The methodology includes generating and labelling exam-
ples by using both domain experts (to drive the modelling and relational
representation of the examples) and a powerful simulation-based black box
for labelling. The result is a set of high quality examples, which is a prere-
quisite for learning a compact and accurate relational model. The final rela-
tional model is built after an additional phase of interactions with experts,
who investigate a learned model, and propose an alternative model, an anal-
ysis of which leads to further refinements of the learning language. We ar-
gue that interactions between experts and engineers are made easy when
the learned models are relational, allowing experts to revisit and contradict
them. Indeed, a detailed comparison between the models, as translated by
experts, shows that the whole process has led to quite a fixed-point. Clearly,
learning tasks in a well defined and simple universe with precise and de-
terministic rules, such as in games, are good candidates for the interactive
methodology we proposed. The extent to which experts and learned models
Construction and Elicitation of a Black Box Model in the Game of Bridge                     21

cooperating in this way can be transferred to less well defined problems,
including, for instance, noisy environments, has yet to be investigated. Note
that complementary techniques can also be applied, such as active learning,
where machines or humans dynamically provide new examples to the sys-
tem, leading to revised models. Overall, our contribution is a step towards
a complete handling of learning tasks for decision problems, giving domain
experts (and non-domain experts alike) the tools to both make accurate de-
cisions, and to understand and scrutinize the logic behind those decisions.

References

ACBL, 2019. ACBL (2019). American contract bridge league. https://www.acbl.org/
  learn_page/how-to-play-bridge/.
Blockeel and Raedt, 1998. Blockeel, H. and Raedt, L. D. (1998). Top-down induction of
  first-order logical decision trees. Artif. Intell., 101(1-2):285–297.
Dumancic et al., 2019. Dumancic, S., Garcia-Duran, A., and Niepert, M. (2019). A com-
  parative study of distributional and symbolic paradigms for relational learning. In
  Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence,
  IJCAI-19, pages 6088–6094. International Joint Conferences on Artificial Intelligence
  Organization.
Guidotti et al., 2019. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and
  Pedreschi, D. (2019). A survey of methods for explaining black box models. ACM
  Comput. Surv., 51(5):93:1–93:42.
Haglund et al., 2014. Haglund, B., Hein, S., et al. (2014). Dds solver. Available at https:
  //github.com/dds-bridge/dds.
Izza and Marques-Silva, 2021. Izza, Y. and Marques-Silva, J. (2021). On explaining ran-
  dom forests with SAT. In Proceedings of the Thirtieth International Joint Conference on
  Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021,
  pages 2584–2591.
Lakkaraju et al., 2019. Lakkaraju, H., Kamar, E., Caruana, R., and Leskovec, J. (2019).
  Faithful and customizable explanations of black box models. In Proceedings of the 2019
  AAAI/ACM Conference on AI, Ethics, and Society, AIES 2019, Honolulu, HI, USA, January
  27-28, 2019, pages 131–138. ACM.
Legras et al., 2018. Legras, S., Rouveirol, C., and Ventos, V. (2018). The game of bridge:
  a challenge for ilp. In Inductive Logic Programming 28th International Conference, ILP
  2018 Proceedings, pages 72–87. Springer.
Lloyd, 1987. Lloyd, J. W. (1987). Foundations of logic programming; (2nd extended ed.).
  Springer-Verlag New York, Inc., New York, NY, USA.
Muggleton and Raedt, 1994. Muggleton, S. and Raedt, L. D. (1994). Inductive logic pro-
  gramming: Theory and methods. J. Log. Program., 19/20:629–679.
Murdoch et al., 2019. Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R., and Yu, B.
  (2019). Interpretable machine learning: definitions, methods, and applications. arXiv
  e-prints, page arXiv:1901.04592.
Patterson, 2008. Patterson, D. (2008). Computer learning in Bridge. Msci Individual
  Project, Queen Mary College, University of London.
Pavlicek, 2014. Pavlicek, R. (2014). Actual play vs. double-dummy. Available at http:
  //www.rpbridge.net/8j45.htm.
Pedregosa et al., 2011. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B.,
  Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,
  A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn:
  Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
22                                                                   Véronique Ventos et al

Raedt et al., 2016. Raedt, L. D., Kersting, K., Natarajan, S., and Poole, D. (2016). Sta-
  tistical Relational Artificial Intelligence: Logic, Probability, and Computation. Synthesis
  Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publish-
  ers.
Ribeiro et al., 2016. Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ”Why Should I
  Trust You?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD
  International Conference on Knowledge Discovery and Data Mining, pages 1135–1144.
Ribeiro et al., 2018. Ribeiro, M. T., Singh, S., and Guestrin, C. (2018). Anchors: High-
  precision model-agnostic explanations. In Proceedings of AAAI-18, pages 1527–1535.
Rudin et al., 2021. Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., and Zhong,
  C. (2021). Interpretable machine learning: Fundamental principles and 10 grand chal-
  lenges. CoRR, abs/2103.11251.
Srinivasan, 1999. Srinivasan, A. (1999). The Aleph manual. Available at http://www.
  comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/.
Construction and Elicitation of a Black Box Model in the Game of Bridge                       23

Appendix 1

7 Tree-based and rule-based models

Listing 4 A complete L1-tree learned by Tilde
d e c i s i o n ( −A, −B, −C, −D, −E )
nb (A, −F , −G) , gteq (G, 6 ) ?
+−−yes : hcp (A, 4 ) ?
|      +−−yes : [ pass ] 2 . 0
|      +−−no :        [ bid ] 5 7 . 0
+−−no : hcp (A, −H) , gteq (H, 1 6 ) ?
       +−−yes : [ bid ] 6 . 0
       +−−no : nb (A, spade , − I ) , gteq ( I , 1 ) ?
                +−−yes : l t e q ( I , 1 ) ?
                |     +−−yes : hcp (A, − J ) , l t e q ( J , 5 ) ?
                |     |    +−−yes : [ pass ] 1 1 . 0
                |     |    +−−no : nb (A, −K, 5 ) ?
                |     |          +−−yes : hcp (A, 6 ) ?
                |     |           |     +−−yes : [ pass ] 3 . 0
                |     |           |     +−−no :     [ bid ] 2 7 . 0
                |     |          +−−no :     [ pass ] 3 . 0
                |     +−−no : hcp (A, 1 5 ) ?
                |          +−−yes : [ bid ] 3 . 0
                |          +−−no : nb (A, −L , 3 ) ?
                |                +−−yes : [ pass ] 146.0
                |                +−−no : hcp (A, 1 1 ) ?
                |                       +−−yes : [ bid ] 3 . 0
                |                       +−−no : hcp (A, −M) , l t e q (M, 6 ) ?
                |                            +−−yes : [ pass ] 1 5 . 0
                |                            +−−no :       lteq ( I ,2)?
                |                                  +−−yes : nb (A, −N, 1 ) ?
                |                                   |     +−−yes : [ bid ] 3 . 0
                |                                   |     +−−no : hcp (A, −O) , gteq (O, 1 1 ) ?
                |                                   |              +−−yes : [ bid ] 5 . 0
                |                                   |              +−−no :    [ pass ] 1 6 . 0
                |                                  +−−no :         [ pass ] 5 . 0
                +−−no :     [ bid ] 5 . 0
24                                                                      Véronique Ventos et al

Listing 5 A L2 -Aleph-induce max model
d e c i s i o n (A, 4 , B , C, bid ) : −
     nbs (A , 0 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     hcp (A,D) , gteq (D, 5 ) , nb (A, E , F ) , gteq ( F , 6 ) , nbs (A , 1 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     hcp (A,D) , gteq (D, 9 ) , nbs (A , 1 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     hcp (A,D) , gteq (D, 8 ) , nb (A, E , F ) , gteq ( F , 5 ) , nbs (A , 1 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     hcp (A,D) , gteq (D, 4 ) , d i s t r i b u t i o n (A, [ 6 , 4 , 2 , 1 ] ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     hcp (A,D) , gteq (D, 8 ) , nb (A, E , F ) , gteq ( F , 6 ) , nbs (A , 2 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     hcp (A,D) , gteq (D, 1 3 ) , nb (A, E , F ) , gteq ( F , 5 ) , nbs (A , 2 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     hcp (A,D) , gteq (D, 1 0 ) , nb (A, E , F ) , gteq ( F , 6 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     nb (A, D, E ) , gteq ( E , 7 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     hcp (A,D) , gteq (D, 1 4 ) , nb (A, E , F ) , gteq ( F , 5 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     nb (A, D, E ) , l t e q ( E , 1 ) , nbs (A , 2 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     hcp (A,D) , nb (A, E ,D) , nb (A, F ,G) , gteq (G, 6 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     hcp (A,D) , gteq (D, 1 2 ) , nb (A, E , F ) , l t e q ( F , 1 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     hcp (A,D) , l t e q (D, 7 ) , gteq (D, 7 ) , nb (A, E , F ) , gteq ( F , 6 ) .
d e c i s i o n (A, 4 , B , C, bid ) : −
     hcp (A,D) , gteq (D, 1 4 ) , nbs (A , 2 ) .

Appendix 2

8 Expert comparison of models M and H

From an expert point of view, the model H from the L1 -Tilde model (see
Listing 3) and the model M from the L2 -Aleph-InduceMaxrules (see List-
ing 2), both obtained after a human post-processing, are very similar. They
differ only on transition zones of certain characteristics, which are of low
support. A detailed analysis follows.
Construction and Elicitation of a Black Box Model in the Game of Bridge                       25

               L0                              L1                             L2
 suit rep(H, Suit, Hon, N um)
 distribution(H, [N , M, P , Q])                                 distribution(H, [N , M, P , Q])
       nb(H, Suit, N um)              nb(H, Suit, N um)                nb(H, Suit, N um)
         hcp(H, N um)                   hcp(H, N um)                     hcp(H, N um)
       semibalanced(H)                semibalanced(H)
        unbalanced(H)                  unbalanced(H)
          balanced(H)                    balanced(H)
 vuln(P os, V ul, Dlr, N S, EW ) vuln(P os, V ul, Dlr, N S, EW )
 vuln(P os, V ul, Dlr, N S, EW ) vuln(P os, V ul, Dlr, N S, EW )
       lteq(N um, N um)               lteq(N um, N um)                 lteq(N um, N um)
       gteq(N um, N um)               gteq(N um, N um)                 gteq(N um, N um)
     longest suit(H, Suit)          longest suit(H, Suit)
    shortest suit(H, Suit)          shortest suit(H, Suit)
          major(Suit)                    major(Suit)
          minor(Suit)                    minor(Suit)
                                                                         nbs(H, N um)
Table 2 Predicates in the domain theories.

8.1 Transition zones within HCP

The models differ on the range 14-15 HCP (M2 vs H2). Note that with very
unbalanced hand distributions, some parameters will be of a greater influ-
ence than a deviation of 1 or 2HCP to take a decision: we can mention for
example the number of HCP outside the ♠ suit or the number of HCP in the
4-card-suits (or more). An increase in the associated values would be an in-
dication that, for the same value of HCP, the position of honors in the hand
of the player is more favorable to the decision bid.

8.2 Transition zones within distribution

A M2b rule deals with hands holding a 6-card-suit exactly while those
which have a 7-card-suit generate the same decision bid. Within L1 and L2
models hands with 5-card-suits also trigger similar rules. L1 rule H2 always
generate a bid while the L2 rules of model M lead to a distinction according
to the number of cards in ♠ and the number of HCP: the more cards you
have in ♠, the more HCP you need to make the bid decision. If we look more
closely, Tilde before post-processing had an embryo of similar rule since it
predicted a bid with a 6-card-suit (or more) and exactly 4 HCP. The support
being very small (2 examples), it had been ignored.
26                                                        Véronique Ventos et al

8.3 Summary

The existence of these transition zones is beyond doubt for experts, but their
very identification is likely to advance knowledge in the field. New experi-
ments conducted on these transition zones, with a more specific vocabulary
devised with experts in the loop, could lead to a more accurate elicitation of
the black box model.
You can also read