Case-Based Strategies in Computer Poker

Page created by Brenda Farmer

Travel

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Case-Based Strategies in Computer Poker

Jonathan Rubin a and Ian Watson a prohibitively large. Furthermore, empirical results
a
Department of Computer Science. tend to support the intuition that solving larger
University of Auckland Game AI Group models results in better quality strategies1 . How-
E-mail: jrubin01@gmail.com, ever, equilibrium finding algorithms are only one
E-mail: ian@cs.auckland.ac.nz of many approaches available within the computer
poker test-bed. Alternative approaches such as im-
The state-of-the-art within Artificial Intelligence has perfect information game tree search [8] and, more
directly benefited from research conducted within the recently, Monte-Carlo tree search [36] have also re-
computer poker domain. One such success has been ceived attention from researchers in order to han-
the advancement of bottom up equilibrium finding al- dle challenges within the computer poker domain
gorithms via computational game theory. On the other that cannot be suitably addressed by equilibrium
hand, alternative top down approaches, that attempt
finding algorithms, such as dynamic adaptation to
to generalise decisions observed within a collection of
changing game conditions.
data, have not received as much attention. In this work
we employ a top down approach in order to construct The algorithms mentioned above take a bottom
case-based strategies within three computer poker do- up approach to constructing sophisticated strate-
mains. Our analysis begins within the simplest vari- gies within the computer poker domain. While
ation of Texas Hold’em poker, i.e. two-player, limit the details of each algorithm differ, they roughly
Hold’em. We trace the evolution of our case-based ar- achieve their goal by enumerating (or sampling)
chitecture and evaluate the effect that modifications a state space together with its pay-off values in
have on strategy performance. The end result of our order to identify a distribution over actions that
experimentation is a coherent framework for produc- achieves the greatest expected value. An alterna-
ing strong case-based strategies based on the observa-
tive top down procedure attempts to construct so-
tion and generalisation of expert decisions. The lessons
phisticated strategies by generalising decisions ob-
learned within this domain offer valuable insights, that
we use to apply the framework to the more complicated served within a collection of data. This lazier top
domains of two-player, no-limit Hold’em and multi- down approach offers its own set of problems in
player, limit Hold’em. For each domain we present re- the domain of computer poker. In particular, any
sults obtained from the Annual Computer Poker Com- top down approach is a slave to its data, so quality
petition, where the best poker agents in the world are data is a necessity. While massive amounts of data
challenged against each other. We also present results from online poker sites is available [25], the quality
against human opposition. of the decisions contained within this data is usu-
Keywords: Imperfect Information Games, Game AI, ally questionable. The imperfect information world
Case-Based Reasoning of the poker domain can often mean that valuable
information may be missing from this data. More-
over, the stochastic nature of the poker domain en-
1. Introduction sures that it is not enough to simply rely on out-
come information in order to determine decision
The state-of-the-art within Artificial Intelli- quality.
gence (AI) research has directly benefited from re- Despite the problems described above, top down
search conducted within the computer poker do- approaches within the computer poker domain
main. Perhaps its most notable achievement has have still managed to produce strong strategies
been the advancement of equilibrium finding al- [4,28]. In fact, empirical evidence from interna-
gorithms via computational game theory. State-
of-the-art equilibrium finding algorithms are now 1 See [38] for a discussion of why this is not always the

able to solve mathematical models that were once case.

AI Communications 25 (2012) 1948
DOI 10.3233/AIC-2012-0513
ISSN 0921-7126, IOS Press. All rights reserved

2 Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

tional computer poker competitions [1] suggest how our framework deals with these issues. For
that, in a few cases, top down approaches have each of the three poker sub-domains mentioned
managed to out-perform their bottom up counter- above we produce strategies that have been ex-
parts. In this work we describe one such top down tensively evaluated. In particular, we present re-
approach that we have used to construct sophis- sults from Annual Computer Poker Competitions
ticated strategies within the computer poker do- for the years 2009 – 2011 and illustrate the per-
main. Our case-based approach can be used to pro- formance trajectory of our case-based strategies
duce strategies for a range of sub-domains within against the best available opposition.
the computer poker environment, including both The remainder of this document proceeds as
limit and no-limit betting structures as well as follows. Section 2 describes the rules of Texas
two-player and multi-player matches. The case- Hold’em poker, highlighting the differences be-
based strategies produced by our approach have tween the different variations available. Section
achieved 1st place finishes for our agent (Sartre) at 3 provides the necessary background and details
the Annual Computer Poker Competition (ACPC) some related work. Section 4 further recaps the
[1]. The ACPC is the premier computer poker benefits of the poker domain as a test-bed for arti-
event and the agents submitted typically represent ficial intelligence research and provides the motiva-
the current state-of-the-art in computer poker re- tion for the use of case-based strategies as opposed
search. to alternative algorithms. Section 5 details the ini-
We have applied and evaluated case-based strate- tial evolution of our case-based architecture for
gies within the game of Texas Hold’em. Texas computer poker in the two-player, limit Hold’em
Hold’em is currently the most popular poker varia- domain. Experimental results are presented and
tion. To achieve strong performance, players must discussed. Sections 6 and 7 extrapolate the result-
be able to successfully deal with imperfect infor- ing framework to the more complicated domains
mation, i.e. they cannot see their opponents’ hid- of two-player, no-limit Hold’em and multi-player
den cards. Also, chance events occur in the do- limit Hold’em. Once again, results are presented
main via the random distribution of playing cards. and discussed for each separate domain. Finally,
Texas Hold’em can be played as a two-person game Section 8 concludes the document.
or a multi-player game. There are multiple varia-
tions on the type of betting structures used that
can dramatically alter the dynamics of the game 2. Texas Hold’em
and hence the strategies that must be employed for
successful play. For instance, a limit game restricts Here we briefly describe the game of Texas
the size of the bets allowed to predefined values. Hold’em, highlighting some of the common terms
On the other hand, a no-limit game imposes no which are used throughout this work. For more de-
such restriction. tailed information on Texas Hold’em consult [33],
In this work we present case-based strategies in or for further information on poker in general see
three poker domains. Our analysis begins within [32].
the simplest variation of Texas Hold’em, i.e. two- Texas Hold’em can be played either as a two-
player, limit Hold’em. Here we trace the evolution player game or a multi-player game. When a game
of our case-based architecture and evaluate the ef- consists only of two players it is often referred to
fect that modifications have on strategy perfor- as a heads up match. Game play consists of four
mance. The end result of our experimentation in stages – preflop, flop, turn and river. During each
the two-player, limit Hold’em domain is a coherent stage a round of betting occurs. The first round
framework for producing strong case-based strate- of play is the preflop where all players at the ta-
gies, based on the observation and generalisation ble are dealt two hole cards, which only they can
of expert decisions. The lessons learned within this see. Before any betting takes place, two forced bets
domain offer valuable insights, which we use to ap- are contributed to the pot, i.e. the small blind and
ply the framework to the more complicated do- the big blind. The big blind is typically double
mains of two-player, no-limit Hold’em and multi- that of the small blind. In a heads up match, the
player, limit Hold’em. We describe the difficulties dealer acts first preflop. In a multi-player match
that these more complicated domains impose and the player to the left of the big blind acts first pre-

Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker 3

flop. In both heads up and multi-player matches, from the shuffled deck of cards as follows: flop – 3
the dealer is the last to act on the post-flop betting community cards, turn – 1 community card, river
rounds (i.e. the flop, turn and river). The legal bet- – 1 community card. All players combine their hole
ting actions are fold, check/call or bet/raise. These cards with the public community cards to form
possible betting actions are common to all vari- their best five card poker hand. A showdown oc-
ations of poker and are described in more detail curs after the river where the remaining players re-
below: veal their hole cards and the player with the best
hand wins all the chips in the pot. If both players’
Fold: When a player contributes no further chips hands are of equal value, the pot is split between
to the pot and abandons their hand and any them.
right to contest the chips that have been
added to the pot.
Check/Call: When a player commits the minimum 3. Background
amount of chips possible in order to stay in
the hand and continues to contest the pot. 3.1. Strategy Types
A check requires a commitment of zero fur-
ther chips, whereas a call requires an amount As mentioned in the introduction, many AI
greater than zero. researchers working in the computer poker do-
Bet/Raise: When a player commits greater than main have focused their efforts on creating strong
the minimum amount of chips necessary to strategies via bottom up, equilibrium finding algo-
stay in the hand. When the player could have rithms. When equilibrium finding algorithms are
checked, but decides to invest further chips applied to the computer poker domain, they pro-
in the pot, this is known as a bet. When the duce -Nash equilibria. -Nash equilibria are ro-
player could have called a bet, but decides to bust, static strategies that limit their exploitability
invest further chips in the pot, this is known () against worst-case opponents. A pair of strate-
as a raise. gies are said to be an -Nash equilibrium if nei-
In a limit game all bets are in increments of a ther strategy can gain more than by deviating.
certain amount. In a no-limit game a player may In this context, a strategy refers to a probabilistic
bet any amount up to the total value of chips that distribution over available actions at every deci-
they possess. For example, assuming a player be- sion point. Two state-of-the-art equilibrium find-
gins a match with 1000 chips, after paying a forced ing algorithms are Counterfactual Regret Minimi-
small blind of one chip they then have the op- sation (CFRM) [39,18] and Excessive Gap Tech-
tion to either fold, call one more chip or raise by nique (EGT) [13]. CFRM is an iterative, regret
contributing anywhere between 3 and 999 extra minimising algorithm that was developed by the
chips2 . In a standard game of heads-up, no-limit University of Alberta Computer Poker Research
poker, both players’ chip stacks would fluctuate Group (CPRG)3 . The EGT algorithm, developed
between hands, e.g. a win from a previous hand by Andrew Gilpin and Thomas Sandholm from
would ensure that one player had a larger chip Carnegie Mellon University, is an adapted version
stack to play with on the next hand. In order to of Nesterov’s excessive gap technique [21], which
reduce the variance that this structure imposes, a has been specialised for two-player, zero-sum, im-
variation known as Doyle’s Game is played where perfect information games.
the starting stacks of both players are reset to a The -Nash equilibrium strategies produced via
specified amount at the beginning of every hand. CFRM and EGT are solid, unwavering strate-
Once the round of betting is complete, as long gies that do not adapt given further observations
as at least two players still remain in the hand, made by challenging particular opponents. An al-
play continues on to the next stage. Each post- ternative strategy type is one that attempts to
flop stage involves the drawing of community cards exploit perceived weaknesses in their opponents’
strategies, by dynamically adapting their strat-
2 The minimum raise would involve paying 1 more chip to egy given further observations. This type of strat-
match the big blind and then committing at least another
2 chips as the minimum legal raise. 3 http://poker.cs.ualberta.ca/

4 Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

egy is known as an exploitive (or maximal) strat- As poker is a stochastic game that consists of
egy. Exploitive strategies typically select their ac- chance events, the variance can often be large es-
tions based on information they have observed pecially between agents that are close in strength.
about their opponent. Therefore, constructing an This requires many hands to be played in order to
exploitive strategy typically involves the added dif- arrive at statistically significant conclusions. Due
ficulty of generating accurate opponent models. to the large variance involved, the ACPC employs
a duplicate match structure, whereby all players
3.2. Strategy Evaluation and the Annual end up playing the same set of hands. For example,
Computer Poker Competition in a two-player match a set of N hands are played.
This is then followed by dealing the same set of
Both -Nash equilibrium based strategies and
N hands a second time, but having both players
exploitive strategies have received attention in the
switch seats so that they receive the cards their
computer poker literature [14,15,7,8,17]. Overall a
opponent received previously. As both players are
larger focus has been applied to equilibrium find-
ing approaches. This is especially true regarding exposed to the same set of hands, this reduces the
agents entered into the Annual Computer Poker amount of variance involved in the game by en-
Competition. Since 2006, the ACPC has been held suring one player does not receive a larger pro-
every year at conferences such as AAAI and IJCAI. portion of higher quality hands than the other. A
The agents submitted to the competition typically two-player match involves two seat enumerations,
represent the strongest computer poker agents in whereas a three-player duplicate match involves
the world, for that particular year. Since 2009, the six seat enumerations to ensure each player is ex-
ACPC has evaluated agents in the following vari- posed to the same scenario as their opponents. For
ations of Texas Hold’em: three players (ABC) the following seat enumera-
tions need to take place:
1. Two-player, Limit Hold’em.
2. Two-player, No-Limit Hold’em. ABC ACB
3. Three-player, Limit Hold’em. CAB CBA
In this work, we restrict our attention to these BCA BAC
three sub-domains. Agents are evaluated by play-
ing many hands against each other in a round-
robin tournament structure. The ACPC employs 4. Research Motivation
two winner determination procedures:
This work describes the use of case-based strate-
1. Total Bankroll. As its name implies the total
gies in games. Our approach makes use of the Case-
bankroll winner determination simply records
based Reasoning (CBR) methodology [26,19]. The
the overall profit or loss of each agent and
CBR methodology encodes problems, and their so-
uses this to rank competitors. In this divi-
lutions, as cases. CBR attempts to solve new prob-
sion, agents that are able to achieve larger
bankrolls are ranked higher than those with lems or scenarios by locating similar past prob-
lower profits. This winner determination pro- lems and re-using or adapting their solutions for
cedure does not take into account how an the current situation. Case-based strategies are top
agent achieves its overall profit or loss, for in- down strategies, in that they are constructed by
stance it is possible that the winning agent processing and analysing a set of training data.
could win a large amount against one com- Common game scenarios, together with their play-
petitor, but lose to all other competitors. ing decisions are captured as a collection of cases,
2. Bankroll Instant Run-Off. On the other hand, referred to as the case-base. Each case attempts to
the instant run-off division uses a recursive capture important game state information that is
winner determination algorithm that repeat- likely to have an impact on the final playing de-
edly removes the agents that performed the cision. The training data can be both real-world
worst against a current pool of players. This data, e.g. from online poker casinos, or artificially
way agents that achieve large profits by ex- generated data, for instance from hand history
ploiting weak opponents are not favoured, as logs generated by the ACPC. Case-based strate-
in the total bankroll division. gies attempt to generalise the game playing deci-

Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker 5

sions recorded within the data via the use of sim- lows the opportunity to apply an abundance of
ilarity metrics that determine whether two game strategies ranging from basic concepts to sophisti-
playing scenarios are sufficiently similar to each cated strategies and counter-strategies. Moreover,
other, such that their decisions can be re-used. the rules of Texas Hold’em poker are incredibly
Case-based strategies can be created by training simple. Contrast this with CBR related research
on data generated from a range of expert players or into complex environments such as real-time strat-
by isolating the decisions of a single expert player. egy games [3,20,22,23], which offer similar issues
Where a case-based strategy is produced by train- to deal with – uncertainty, chance, deception –
ing on and generalising the decisions of a single but don’t encapsulate this within a simple set of
expert player, we refer to the agent produced as rules, boundaries and performance metrics. Suc-
an expert imitator. In this way, case-based strate- cesses and failures achieved by applying case-based
gies can be produced that attempt to imitate dif- strategies to the game of poker may provide valu-
ferent styles of play simply by training on separate able insights for CBR researchers using complex
datasets generated by observing the decisions of strategy games as their domain, where immedi-
expert players, each with their own style. The lazy ate success is harder to evaluate. Furthermore, it
is hoped that results may also generalise to do-
learning [2] of case-based reasoning is particularly
mains outside the range of games altogether to
suited to expert imitation where observations of
complex real world domains where hidden infor-
expert play can be recorded and stored for use at
mation, chance and deception are commonplace.
decision time.
One of the major benefits of using case-based
Case-based approaches have been applied and
strategies within the domain of computer poker
evaluated in a variety of gaming environments. is the simplicity of the approach. Top down case-
CHEBR [24] was a case-based checkers player that based strategies don’t require the construction
acquired experience by simply playing games of of massive, complex mathematical models that
checkers in real-time. In the RoboCup soccer do- some other approaches rely on [13,30,27]. Instead,
main, [11] used case-based reasoning to construct an autonomous agent can be created simply via
a team of agents that observes and imitates the the observation of expert play and the encoding
behaviour of other agents. Case-based planning of observed actions into cases. Below we outline
[16] has been investigated and evaluated in the some further reasons why case-based strategies
domain of real-time strategy games [3,22,23,34]. are suited to the domain of computer poker and
Case-based tactician (CaT) described in [3] selects hence worthy of investigation. The reasons listed
tactics based on a state lattice and the outcome of are loosely based on Sycara’s [35] identification
performing the chosen tactic. The CaT system was of characteristics of a domain where case-based
shown to successfully learn over time. The Darmok reasoning is most applicable (these were later ad-
architecture described by [22,23] pieces together justed by [37]).
fragments of plans in order to produce an over-
1. A case is easily defined in the domain.
all playing strategy. Performance of the strategies
A case is easily identified as a previous sce-
produced by the Darmok architecture were im- nario an (expert) player has encountered in
proved by first classifying the situation it found the past and the action (solution) associated
itself in and having this affect plan retrieval [20]. with that scenario such as whether to fold,
Combining CBR with other AI approaches has also call or raise. Each case can also record a final
produced successful results. In [31] transfer learn- outcome from the hand, i.e. how many chips
ing was investigated in a real time strategy game a player won or lost.
environment by merging CBR with reinforcement 2. Expert human poker players compare cur-
learning. Also, [6] combined CBR with reinforce- rent problems to past cases.
ment learning to produce an agent that could re- It makes sense that poker experts make their
spond rapidly to changes in conditions of a domi- decisions based on experience. An expert
nation game. poker player will normally have played many
The stochastic, imperfect information world of games and encountered many different sce-
Texas Hold’em poker is used as a test-bed to narios; they can then draw on this experience
evaluate and analyse our case-based strategies. to determine what action to take for a current
Texas Hold’em offers a rich environment that al- problem.

6 Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

3. Cases are available as training data. Nash equilibrium for the game. In fact, it proves
While many cases are available to train a impossible to reasonably store this strategy by to-
case-based strategy, the quality of their solu- day’s hardware standards [18]. For these reasons
tions can vary considerably. The context of alternative approaches, such as case-based strate-
the past problem needs to be taken into ac- gies, can prove useful given their ability for gener-
count and applied to similar contexts in the alisation.
future. As the system gathers more experi- Over the years we have conducted an exten-
ence it can also record its own cases, together sive amount of experimentation on the use of case-
with their observed outcomes. based strategies, using two-player, limit Hold’em
4. Case comparisons can be done effectively. as our test-bed. In particular we have investigated
Cases are compared by determining the sim- and measured the effect that changes have on areas
ilarity of their local features. There are many such as feature and solution representation, simi-
features that can be chosen to represent a larity metrics, system training and the use of dif-
case. Many of the salient features in the poker ferent decision making policies. Modifications have
domain (e.g. hand strength) are easily com- ranged from the very minor, e.g. training on dif-
parable via standard metrics. Other features, ferent sets of data to the more dramatic, e.g. the
such as betting history, require more involved development of custom betting sequence similar-
similarity metrics, but are still directly com- ity metrics. For each modification and addition to
parable. the architecture we have extensively evaluated the
5. Solutions can be generalised. strategies produced via self-play experiments, as
For case-based strategies to be successful, the well as by challenging a range of third-party, arti-
re-use or adaptation of similar cases’ solu- ficial agents and human opposition. Due to space
tions should produce a solution that is (rea- limitations we restrict our attention to the changes
sonably) similar to the actual, known solu- that had the greatest affect on the system architec-
tion (if one exists) of the target case in ques- ture and its performance. We have named our sys-
tion. This underpins one of CBR’s main as- tem Sartre (Similarity Assessment Reasoning for
sumptions: that similar cases have similar so- Texas hold’em via Recall of Experience) and we
lutions. We present empirical evidence that trace the evolution of its architecture below.
suggests the above assumption is reasonable
in the computer poker domain. 5.1. Overview

In order to generalise betting decisions from a
5. Two-Player, Limit Texas Hold’em set of (artificial or real-world) training data, first
it is required to construct and store a collection
We begin with the application of case-based of cases. A case’s feature and solution representa-
strategies within the domain of two-player, limit tion must be decided upon, such as the identifica-
Texas Hold’em. Two-player, limit Hold’em offers tion of salient attribute-value pairs that describe
a beneficial starting point for the experimenta- the environment at the time a case was recorded.
tion and evaluation of case-based strategies, within Each case should attempt to capture important in-
computer poker. Play is limited to two players and formation about the current environment that is
a restricted betting structure is imposed, whereby likely to have an impact on the final solution. Af-
all bets and raises are limited to pre-specified ter a collection of cases has been established, deci-
amounts. The above restrictions limit the size of sions can be made by searching the case-base and
the state space, compared to Hold’em variations locating similar scenarios for which solutions have
that allow no-limit betting and multiple oppo- been recorded in the past. This requires the use of
nents. However, while the size of the domain is re- local similarity metrics for each feature.
duced, compared to more complex poker domains, Given a target case, t, that describes the im-
the two-player limit Hold’em domain is still very mediate game environment, a source case, s ∈
large. The game tree consists of approximately S, where S is the entire collection of previously
1018 game states and, given the standards of cur- recorded cases and a set of features, F , global sim-
rent hardware, it is intractable to derive a true ilarity is computed by summing each feature’s lo-

Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker 7

Fig. 1. Overview of the architecture used to produce case-based strategies. The numbers identify the six key areas within the
architecture where the affects of maintenance has been evaluated.

Table 1
cal similarity contribution, simf , and dividing by
Preflop and postflop case feature representation.
the total number of features:
Preflop Postflop
X simf (tf , sf ) 1. Hole Cards Hand Strength
G(t, s) = (1)
|F | 2. Betting Sequence Betting Sequence
f ∈F
3. Board Texture
Fig. 1. provides a pictorial representation of the
architecture we have used to produce case-based
strategies. The six areas that have been labelled in for each game scenario. Our case-based strategies
Fig. 1. identify six key areas within the architec- use a simple attribute-value representation to de-
ture where maintenance has had the most impact scribe a set of case features. Table 1 lists the fea-
and led to positive affects on system performance. tures used within our case representation. A sep-
They are: arate representation is used for preflop and post-
flop cases, given the differences between these two
1. Feature Representation
stages of the game. The features listed in Table 1
2. Similarity Metrics
3. Solution Representation were chosen by the authors as they concisely cap-
4. Case Retrieval ture all the necessary public game information, as
5. Solution Re-Use Policies, and well as the player’s personal, hidden information.
6. System Training Each feature is explained in more detail below:
Preflop
5.2. Architecture Evolution
1. Hole Cards: the personal hidden cards of the
player, represented by 1 out of 169 equivalence
Here we describe some of the changes that have
classes.
taken place within the six key areas of our case-
2. Betting Sequence: a sequence of characters that
based architecture, identified above. Where possi-
represent the betting actions witnessed until
ble, we provide a comparative evaluation for the
the current decision point, where actions can
maintenance performed, in order to measure the
be selected from the set, Alimit = {f, c, r}.
impact that changes had on the performance of the
case-based strategies produced. Postflop
5.2.1. Feature Representation 1. Hand Strength: a description of the player’s
The first area of the system architecture that we hand strength given a combination of their
discuss is the feature representation used within personal cards and the public community
a case (see Fig. 1, Point 1). We highlight results cards.
that have influenced changes to the representation 2. Betting Sequence: identical to the preflop se-
over time. In order to construct a case-based strat- quence, however with the addition of round
egy a case representation is required that estab- delimiters to distinguish betting from previ-
lishes the type of information that will be recorded ous rounds, Alimit ∪ {−}.

8 Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

3. Board Texture: a description of the public com- ues to hands with greater potential. Typically
munity cards that are revealed during the in poker, hands with similar strength values,
postflop rounds but differences in potential, are required to
be played in strategically different ways [33].
While the case features themselves have re-
Once again bucketing is used where the de-
mained relatively unchanged throughout the archi-
rived E[HS2 ] values are mapped into 1 of 20
tecture’s evolution, the actual values that each fea-
unique buckets for each postflop round.
ture records has been experimented with to deter-
mine the affect on final performance. For example, The resulting case-based strategies were eval-
we have compared and evaluated the use of differ- uated by challenging the computerised opponent
ent metrics for the hand strength feature from Ta- Fell Omen 2 [10]. Fell Omen 2 is a solid two-player
ble 1. Fig. 2. depicts the result of a comparison be- limit Hold’em agent that plays an -Nash equilib-
tween three hand strength feature values. In this rium type strategy. Fell Omen 2 was made pub-
experiment, the feature values for betting sequence licly available by its creator Ian Fellows and has
and board texture were held constant, while the become widely used as an agent for strategy evalu-
hand strength value was varied. The values used to ation [12]. The results depicted in Fig. 2. are mea-
represent hand strength were as follows: sured in small bets per hand (sb/h), i.e. where the
total number of small bets won or lost are divided
CATEGORIES: Uses expert defined categories to by the total number of hands played. Each data
classify hand strength. Hands are assigned point records the outcome of three matches, where
into categories by mapping a player’s per- 3000 duplicate hands were played. The 95% confi-
sonal cards and the available board cards dence intervals for each data point are also shown.
into one of a number of predefined categories. Results were recorded for various levels of case-
Each category represents the type of hand the base usage to get an idea of how well the system is
player currently has, together with informa- able to generalise decisions. The results in Fig. 2.
tion about the drawing potential of the hand, show that (when using a full case-base) the use of
i.e. whether the hand has the ability to im- E[HS2 ] for the hand strength feature produces the
prove with future community cards. In total strongest strategies, followed by the use of CATE-
284 categories were defined4 . GORIES and finally E[HS]. The poor performance
E[HS]: Expected hand strength is a one-dimensional, of E[HS] is likely due to the fact that this metric
numerical metric. The E[HS] metric com- does not fully capture the importance of a hand’s
putes the probability of winning at showdown future potential. When only a partial proportion of
against a random hand. This is given by enu- the case-base is used it becomes more important
merating all possible combinations of commu- for the system to be able to recognise similar at-
nity cards and determining the proportion of tribute values in order to make appropriate deci-
the time the player’s hand wins against the sions. Both E[HS] and E[HS2 ] are able to gener-
set of all possible opponent holdings. Given alise well. However, the results show that decision
the large variety of values that can be pro- generalisation begins to break down when using
duced by the E[HS] metric, bucketing takes CATEGORIES. This has to do with the similar-
place where similar values are mapped into ity metrics used. In particular, the CATEGORIES
a discrete set of buckets that contain hands strategy in Fig. 2 is actually a baseline strategy
of similar strength. Here we use a total of 20 that used overly simplified similarity metrics for
buckets for each postflop round. each of its feature values. Next we discuss the area
E[HS2 ]: The final metric evaluated involves squar- of similarity assessment within the system archi-
ing the expected hand strength. Johanson [18] tecture, which is intimately tied to the particular
points out that squaring the expected hand values chosen within the feature representation.
strength (E[HS2 ]) typically gives better re-
5.2.2. Similarity Assessment
sults, as this assigns higher hand strength val-
For each feature that is used to represent
4 A listing of all 284 categories can be found at a case, a corresponding local similarity metric,
the following website: http://www.cs.auckland.ac.nz/ simf (f1 , f2 ), is required that determines how simi-
research/gameai/sartreinfo.html lar two feature values, f1 and f2 , are to each other.

Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker 9

Fig. 2. The performance of three separate case-based strategies produced by altering the value used to represent hand strength.
Results are measured in sb/h and were obtained by challenging Fell Omen 2.

The use of different representations for the hand check if possible, otherwise it would call an oppo-
strength feature in Fig. 2. also requires the use nent’s bet. This default-policy was selected by the
of separate similarity metrics. The CATEGORIES authors as it was believed to be preferable to other
strategy in Fig. 2. employs a trivial all-or-nothing trivial default policies, such as always-fold, which
similarity metric for each of its features. If the would always result in a loss for the system.
value of one feature has the same value of an- The other two strategies in Fig. 2. (E[HS] and
other feature, a similarity score of 1 is assigned. E[HS2 ]) do not use trivial all-or-nothing similar-
On the other hand, if the two feature values dif- ity. Instead the hand strength features use a sim-
fer at all, a similarity value of 0 is assigned. This ilarity metric based on Euclidean distance. Both
was done to get an initial idea of how the sys- the E[HS] and E[HS2 ] strategies also employ in-
tem performed using the most basic of similarity formed similarity metrics for their betting sequence
retrieval measures. The performance of this base- and board texture features, as well. Recall that
line system could then be used to determine how the betting sequence feature is represented as a se-
improvements to local similarity metrics affected quence of characters that lists the playing deci-
overall performance. sions that have been witnessed so far for the cur-
The degradation of performance observed in Fig. rent hand. This requires the use of a non-trivial
2. for the CATEGORIES strategy (as the propor- metric to determine similarity between two non-
tion of case-base usage decreases) is due to the use identical sequences. Here we developed a custom
of all-or-nothing similarity assessment. The use of similarity metric that involves the identification of
the overly simplified all-or-nothing similarity met- stepped levels of similarity, based on the number
ric meant that the system’s ability to retrieve sim- of bets/raises made by each player. The exact de-
ilar cases could often fail, leaving the system with- tails of this metric are presented in Section 5.3.2.
out a solution for the current game state. When Finally, for completeness, we determine similarity
this occurred a default-policy was relied upon to between different board texture classes via the use
provide the system with an action. The default- of hand picked similarity values.
policy used by the system was an always-call pol- Fig. 2. shows that, compared to the CATE-
icy, whereby the system would first attempt to GORIES strategy, the E[HS] and E[HS2 ] strategies

10                    Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

                                                                                       Table 2
do a much better job of decision generalisation as
                                                            Total cases stored for each playing round using single value
the usable portion of the case-base is reduced. The
                                                            solution representation compared to vector valued solutions
eventual strategies produced do not suffer the dra-
matic performance degradation that occurs with                Round     Total Cases - Single    Total Cases - Vector
the use of all-or-nothing similarity.                         Preflop         201,335                   857
                                                               Flop           300,577                  6,743
5.2.3. Solution Representation                                 Turn           281,529                  35,464
   As well as recording feature values, each case              River          216,597                  52,088
also needs to specify a solution. The most obvious             Total          1,000,038                95,152
solution representation is a single betting action,
a ∈ Alimit . As well as a betting action, the solution
can also record the actual outcome, i.e. the numeri-        to decrease the number of cases required to be
cal result, o ∈

Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker 11

tion from a single action solution representation 1. Probabilistic The first solution re-use policy
to a vector valued solution representation (as de- simply selects a betting action probabilisti-
scribed in Section 5.2.3). Initially, a variable value cally, given the proportions specified within
of k was allowed, whereby the total number of the action vector, P (ai ) = ai , for i = 1 . . . n.
similar cases retrieved varied with each search of Betting decisions that have greater propor-
the case-base. Recall, that a case representation tions within the vector will be made more of-
that encodes solutions as single actions results in ten then those with lower proportions. In a
a redundant case-base that contains multiple cases game-theoretic sense, this policy corresponds
with the exact same feature values. The solution to a mixed strategy.
of those cases may or may not advocate different 2. Max-frequency Given an action vector A =
playing decisions. Given this representation, a final (a1 , a2 , . . . , an ), the max-frequency solution
probability vector was required to be created on- re-use policy selects the action that corre-
the-fly at runtime by retrieving all identical cases sponds to arg maxi ai , i.e. it selects the ac-
and merging their solutions. Hence, the number of tion that was made most often and ignores all
retrieved cases, k, could vary between 0 and N . other actions. In a game-theoretic sense, this
When k > 0, the normalised entries of the proba- policy corresponds to a pure strategy.
bility vector were used to make a final playing de- 3. Best-Outcome Instead of using the values con-
cision. However, if k = 0, the always-call default- tained within the action vector, the best-
policy was used. outcome solution re-use policy selects an ac-
Once the solution representation was updated to tion, given the values contained within the
record action vectors (instead of single decisions) outcome vector, O = (o1 , o2 , . . . , on ). The fi-
a variable k value was no longer required. Instead, nal playing decision is given by the action, ai ,
the algorithm was updated to simply always re- that corresponds to arg maxi oi , i.e. the action
trieve the nearest neighbour, i.e. k = 1. Given fur- that corresponds to the maximum entry in the
ther improvements to the similarity metrics used, outcome vector.
the use of a default-policy was no longer required
as it was no longer possible to encounter scenarios Given the three solution re-use policies de-
where no similar cases could be retrieved. Instead, scribed above, it is desirable to know which policies
the most similar neighbour was always returned, produce the strongest strategies. Table 3 presents
no matter what the similarity value. This has re- the results of self-play experiments where the three
sulted in a much more robust system that is actu- solution re-use policies were challenged against
ally capable of generalising decisions recorded in each other. A round robin tournament structure
the training data, as opposed to the initial proto- was used, where each policy challenged every other
type system which offered no ability for graceful policy. The figures presented are from the row
degradation, given dissimilar case retrieval. player’s perspective and are in small bets per
hand. Each match consists of 3 separate dupli-
5.2.5. Solution Re-use Policies cate matches of 3000 hands. Hence, in total 18,000
The fifth area of the architecture that we dis- hands of poker were played between each competi-
cuss (Fig. 1, Point 5) concerns the choice of tor. All results are statistically significant with a
a final playing decision via the use of separate 95% level of confidence.
policies, given a retrieved case and its solution. Table 3 shows that the max-frequency pol-
Consider the probabilistic action vector, A = icy outperforms its probabilistic and best-outcome
(a1 , a2 , . . . , an ), and a corresponding outcome vec- counterparts. Of the three, best-outcome fares the
tor, O = (o1 , o2 , . . . , on ). There are various ways worst, losing all of its matches. The results indicate
to use the information contained in the vectors to that simply re-using the most commonly made de-
make a final playing decision. We have identified cision results in better performance than mixing
and empirically evaluated several different policies from a probability vector and that choosing the
for re-using decision information, which we label decision that resulted in the best outcome was the
solution re-use policies. Below we outline three so- worst solution re-use policy. Moreover, these re-
lution re-use policies, which have been used for sults are representative of further experiments in-
making final decisions by our case-based strategies. volving other third-party computerised agents.

12 Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

Table 3
Results of experiments between solution re-use policies. The
values shown are in sb/h with 95% confidence intervals.
Max-frequency Probabilistic Best-outcome Average
Max-frequency 0.011 ± 0.005 0.076 ± 0.008 0.044 ± 0.006
Probabilistic −0.011 ± 0.005 0.036 ± 0.009 0.012 ± 0.004
Best-outcome −0.076 ± 0.008 −0.036 ± 0.009 −0.056 ± 0.005

One of the reasons for the poor performance of down. For hands that were folded before a show-
best-outcome is likely due to the fact that good down, this information is lost. It is difficult to train
outcomes don’t necessarily represent good betting a strategy on data where this information is miss-
decisions and vice-versa. The reason for the suc- ing. More importantly, any attempt to train a sys-
cess of the max-frequency policy is less obvious. In tem on only the data where showdowns occurred
our opinion, this has to do with the type of oppo- would result in biased actions, as the decision to
nent being challenged, i.e. agents that play a static, fold would never be encountered.
non-exploitive strategy, such as those listed in Ta- It is for these reasons that our case-based strate-
ble 3, as well as strategies that attempt to approxi- gies have been trained on data made publicly avail-
mate a Nash equilibrium. As an equilibrium-based able from the Annual Computer Poker Competi-
strategy does not attempt to exploit any bias in tion [1]. This data records hand history logs for
its opponent’s strategy, it will only gain when the all matches played between computerised agents
opponent ends up making a mistake by selecting at a particular year’s competition. The data con-
an inappropriate action. The action that was made tains perfect information for every hand played
most often is unlikely to be an inappropriate ac- and therefore can easily be used to train an
tion, therefore sticking to this decision avoids any imitation-based system. Furthermore, the comput-
exploration errors made by choosing other (possi- erised agents that participate at the ACPC each
bly inappropriate) actions. Moreover, biasing play- year are expected to improve in playing strength
ing decisions towards this action is likely to go un- over the years and hence re-training the system
punished when challenging a non-exploitive agent. on updated data should have a follow on affect on
On the other hand, against an exploitive opponent performance for any imitation strategies produced
the bias imposed by choosing only one action is from the data. Our case-based strategies have typ-
likely to be detrimental to performance in the long ically selected subsets of data to train on, based
run and therefore it would become more important on the decisions made by the agents that have per-
to mix up decisions. formed the best in either of the two winner deter-
mination methods used by the ACPC.
5.2.6. System Training There are both advantages and disadvantages
How the system is trained is the final key area of for producing strategies that rely on generalising
the architecture that we discuss, in regard to sys- decisions from training data. While this provides a
tem maintenance. One of the major benefits of pro- convenient mechanism for easily upgrading a sys-
ducing case-based strategies via expert imitation, tem’s play, there is an inherent reliance on the
is that different types of strategies can be produced quality of the underlying data in order to produce
by simply modifying the data that is used to train reasonable strategies. Furthermore, it is reasonable
the system. Decisions that were made by an expert to assume that strategies produced in this way are
player can be extracted from hand history logs and typically only expected to do as well as the original
used to train a case-based strategy. Experts can be expert(s) they are trained on.
either human or other artificial agents.
In order to train a case-based strategy, per- 5.3. A Framework for Producing Case-Based
fect information is required, i.e. the data needs to Strategies in Two-Player, Limit Texas
record the hidden card information of the expert Hold’em
player. Typically, data collected from online poker
sites only contains this information when the orig- For the six key areas of our architecture (de-
inal expert played a hand that resulted in a show- scribed above) maintenance was guided via com-

Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker 13

Table 4
A case is made up of three attribute-value pairs, which describe the current state of the game. A solution consists of an action
and outcome triple, which records the average numerical value of applying the action (-∞ refers to an unknown outcome).
Attribute Type Example
1. Hand Strength Integer 1 – 50

2. Betting Sequence String rc-c, crrc-crrc-cc-, r, ...

No-Salient, Flush-Possible,
3. Board Texture Class Straight-Possible, Flush-Highly-Possible,
...
Action Triple (0.0, 0.5, 0.5), (1.0, 0.0, 0.0), ...

Outcome Triple (-∞, 4.3, 15.6), (-2.0, -∞, -∞), ...

parative evaluation and overall impact on perfor- actions that have taken place in the current
mance. The outcome of this intensive, systematic round, as well as previous rounds. Characters
maintenance is the establishment of a final frame- in the string are selected from the set of al-
work for producing case-based strategies in the do- lowable actions, Alimit = {f, c, r}, rounds are
main of two-player, limit Hold’em. delimited by a hyphen.
Here we present the details of the final frame- 3. Board Texture: The board texture refers to im-
work we have established for producing case-based portant information available, given the com-
strategies. The following sections illustrate the de- bination of the publicly available community
tails of our framework by specifying the following cards. In total, nine board texture categories
sufficient components: were selected by the authors. These categories
are displayed in Table 5 and are believed
1. A representation for encoding cases and game
to represent salient information that any hu-
state information
man player would notice. Specifically, the cat-
2. The corresponding similarity metrics required
egories focus on whether it is possible that an
for decision generalisation.
opponent has made a flush (five cards of the
5.3.1. Case Representation same suit) or a straight (five cards of sequen-
Table 4 depicts the final post-flop case repre- tial rank), or a combination of both. The cate-
sentation used to capture game state information. gories are broken up into possible and highly-
A single case is represented by a collection of possible distinctions. A category labelled pos-
attribute-value pairs. Separate case-bases are con- sible refers to the situation where the oppo-
structed for the separate rounds of play by pro- nent requires two of their personal cards in
cessing a collection of hand histories and recording order to make their flush or straight. On the
values for each of the three attributes listed in Ta- other hand, a highly-possible category only
ble 4. The attributes have been selected by the au- requires the opponent to use one of their per-
thors as they capture all the necessary information sonal cards to make their hand, making it
required to make a betting decision. Each of the more likely they have a straight or flush.
post-flop attribute-value pairs are now described
in more detail: 5.3.2. Similarity Metrics
Each feature requires a corresponding local sim-
1. Hand Strength: The quality of a player’s hand ilarity metric in order to generalise decisions con-
is represented in our framework by calculat- tained in a set of data. Here we present the metrics
ing the E[HS2 ] of the player’s cards and then specified by our framework.
mapping these values into 1 out of 50 evenly
divided buckets, i.e. uniform bucketing. 1. Hand Strength: Equation 2 specifies the met-
2. Betting Sequence: The betting sequence is rep- ric used to determine similarity between two
resented as a string. It records all observed hand strength buckets (f1 , f2 ).

14                      Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

                                                                   A B   C           D E    F             G     H      I
                                                                                                                         
                                                               A   1  0   0           0  0   0             0     0     0
                                      |f1 − f2 |
       sim(f1 , f2 ) = max{1 − k ·               , 0} (2)      B  0  1  0.8         0.7 0   0             0     0     0 
                                          T                    C
                                                                                                                         
                                                                  0 0.8 1           0.7 0   0             0     0     0 
    Here, T refers to the total number of buckets              D 0 0.7 0.7          1  0   0             0     0     0 
    that have been defined, where f1 , f2 ∈ [1, T ]            E0   0   0           0  1 0.8            0.7    0    0.6 
                                                                                                                          
    and k is a scalar parameter used to adjust the             F 
                                                                  0  0   0           0 0.8 1             0.7    0    0.5 
                                                                                                                          
    rate at which similarity should decrease.                  G 0  0   0           0 0.7 0.7            1    0.8   0.8 
                                                                                                                          
2. Betting Sequence: To determine similarity be-               H0    0   0           0  0   0            0.8    1    0.8 
    tween two betting sequences we developed                   I   0  0   0           0 0.6 0.5           0.8   0.8    1
    a custom similarity metric that involves the                       Fig. 3. Board texture similarity matrix.
    identification of stepped levels of similarity,
    based on the number of bets/raises made
    by each player. The first level of similarity
    (level0) refers to the situation when one bet-                                     Table 5
    ting sequence exactly matches that of another.                               Board Texture Key
    If the sequences do not exactly match the next               A    No salient
    level of similarity (level1) is evaluated. If two            B    Flush possible
    distinct betting sequences exactly match for                 C    Straight possible
    the active betting round and for all previous                D    Flush possible, straight possible
    betting rounds the total number of bets/raises               E    Straight highly possible
    made by each player are equal then level1 sim-               F    Flush possible, straight highly possible
    ilarity is satisfied and a value of 0.9 is as-               G    Flush highly possible
    signed. Consider the following example where                 H    Flush highly possible, straight possible
    the active betting round is the turn and the                 I    Flush highly possible, straight highly possible
    two betting sequences are:
     1. crrc-crrrrc-cr
     2. rrc-rrrrc-cr                                              equal (the same applies for the flop and the
     Here, level0 is clearly incorrect as the se-                 turn). Therefore, level1 similarity is not sat-
     quences do not match exactly. However, for                   isfied. However, the number of raises encoun-
     the active betting round (cr ) the sequences                 tered for all the previous betting rounds com-
     do match. Furthermore, during the preflop (1.                bined (1. rrc-cc-cc and 2. cc-rc-crc) are the
     crrc and 2. rrc) both players made 1 raise                   same for each player, namely 1 raise by each
     each, albeit in a different order. During the                player. Hence, level2 similarity is satisfied and
     flop (1. crrrrc and 2. rrrrc) both players now               a similarity value of 0.8 would be assigned. Fi-
     make 4 raises each. Given that the number                    nally, if level0, level1 and level2 are not satis-
     of bets/raises in the previous rounds (preflop               fied level3 is reached where a similarity value
     and flop) match, these two betting sequences                 of 0 is assigned.
     would be assigned a similarity value of 0.9.             3. Board Texture: To determine similarity between
     If level1 similarity was not satisfied the next              board texture categories a similarity matrix
     level (level2) would be evaluated. Level2 simi-              was derived. Matrix rows and columns in Fig.
     larity is less strict than level1 similarity as the
                                                                  3. represent the different categories defined in
     previous betting rounds are no longer differen-
                                                                  Table 5. Diagonal entries refer to two sets of
     tiated. Consider the river betting sequences:
                                                                  community cards that map to the same cate-
     1. rrc-cc-cc-rrr                                             gory, in which case similarity is always 1. Non-
     2. cc-rc-crc-rrr                                             diagonal entries refer to similarity values be-
     Once again the sequences for the active round                tween two dissimilar categories. These values
     (rrr ) matches exactly. This time, the num-                  were hand picked by the authors. The matrix
     ber of bets/raises in the preflop round are not              given in Fig. 3. is symmetric.

Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker 15

5.4. Experimental Results 5.4.2. 2010 AAAI Computer Poker Competition
Following the maintenance experiments pre-
We now present a series of experimental results sented in Section 5.2, an updated case-based strat-
collected in the domain of two-player, limit Texas egy was submitted to the 2010 ACPC, held at
Hold’em. The results presented are obtained from the Twenty-Forth AAAI Conference on Artificial
annual computer poker competitions and data col- Intelligence. Our entry, once again named Sartre,
lected by challenging human opposition. For each used the following architecture snapshot:
evaluated case-based strategy, we provide an ar-
chitecture snapshot that captures the relevant de- 1. Feature Representation
tails of the parameters used for each of the six key (a) Hand Strength – 50 buckets E[HS2 ]
architecture areas, that were previously discussed. (b) Betting Sequence – string
5.4.1. 2009 IJCAI Computer Poker Competition (c) Board Texture – categories
We begin with the results of the 2009 ACPC, 2. Similarity Assessment
held at the International Joint Conference on Ar-
tificial Intelligence. Here, we submitted our case- (a) Hand Strength – Euclidean
based agent, Sartre, for the first time, to challenge (b) Betting Sequence – custom
other computerised agents submitted from all over (c) Board Texture – matrix
the world. The following architecture snapshot de- 3. Solution Representation – vector
picts the details of the submitted agent: 4. Case Retrieval – k = 1
1. Feature Representation 5. Re-Use Policy – probabilistic
6. System Training MANZANA
(a) Hand Strength – categories
(b) Betting Sequence – string Here a vector valued solution representation was
(c) Board Texture – categories used together with improved similarity assessment.
2. Similarity Assessment – all-or-nothing Given the updated solution representation, a sin-
3. Solution Representation – single gle nearest neighbour, k = 1, was retrieved via
4. Case Retrieval – variable k the k-NN algorithm. A probabilistic solution re-use
5. Re-Use Policy – max-frequency policy was employed and the system was trained
6. System Training – Hyperborean-08 on the decisions of the winner of the 2009 total
bankroll division. The final results are presented
The architecture snapshot above represents a in Table 7. Once again two winner determination
baseline strategy where maintenance had yet to be divisions are presented and the values are depicted
performed. Each of the entries listed above corre- in small bets per hand with 95% confidence inter-
sponds to one of the six key architecture areas in- vals. Given the improvements, Sartre was able to
troduced in Section 5.2. Notice that trivial all-or- achieve a 6th place finish in the runoff division and
nothing similarity was employed along with a sin- a 3rd place finish in the total bankroll division.
gle action solution representation, which resulted
in a redundant case-base. The value for system 5.4.3. 2011 AAAI Computer Poker Competition
training refers to the original expert whose deci- The 2011 ACPC was held at the Twenty-Fifth
sions were used to train the system. AAAI Conference on Artificial Intelligence. Our
The final results are displayed in Table 6. The entry to the competition is represented by the fol-
competition consisted of two winner determina- lowing architecture snapshot:
tion methods: bankroll instant run-off and total 1. Feature Representation
bankroll. Each agent played between 75 and 120
duplicate matches against every other agent in or- (a) Hand Strength – 50 buckets E[HS2 ]
der to obtain the average values displayed. Each (b) Betting Sequence – string
match consisted of 3000 duplicate hands. The val- (c) Board Texture – categories
ues presented are the number of small bets per
2. Similarity Assessment
hand won or lost. Our case-based agent, Sartre,
achieved a 7th place finish in the instant run-off (a) Hand Strength – Euclidean
division and a 6th place finish in the total bankroll (b) Betting Sequence – custom
division. (c) Board Texture – matrix

You can also read