An approach of solving early phase multiplayer No-Limit Hold'em poker using empirical Bayesian statistics and vector spaces
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
An approach of solving early phase multiplayer No-Limit Hold’em poker using empirical Bayesian statistics and vector spaces A thesis presented by Damiaan Reijnaers (10804137) for the degree of Bachelor of Science in Artificial Intelligence Credits: 18 EC University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor dr. M.A.F. Lewis Institute for Logic, Language and Computation Faculty of Science University of Amsterdam Science Park 107 1098 XG Amsterdam March 20th, 2020 1
Abstract This thesis aims to propose an efficient methodology for the purpose of representation, and decision-making based on comparison, of early-game states in No-Limit Texas Hold’em poker. The paper presents a predictive model in the form of an augmented decision tree based on the distance between geometrically represented game situations in an euclidean vector space, wherein opponent models contribute as situational characteristics. This research identifies a shortcoming in existing work on opponent modelling, and solves it by introducing a mixed technique based on Bayesian statistics and beta-binomial regression. An implementation is suggested and tested. By using the proposed method, it can be concluded that an artificial agent is able to distinguish between different game situations and to make (a variety of) decisions based on situational factors. 2
Contents Abstract 2 1 Introduction 5 2 Putting the idea into perspective 6 3 Method 8 3.1 Proposed methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.1 General definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.2 Opponent modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1.3 Situational space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.4 Decision trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Suggested implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.2 Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.3 Decisions based on situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Experiments and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.1 Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.2 Estimated hole card ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.3 Predictions for actions and raise sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4 Conclusion and discussion 28 Bibliography 31 Appendix A - Relevant rules of Texas Hold’em Poker 33 Appendix B - Glossary of used poker terms 35 Appendix C - Questionnaire concerning quality of experiment results 37 3
List of Figures 1 Dependency graph for variables in a poker game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Visualisation of VPIP- and PFR/VPIP-values for players in dataset . . . . . . . . . . . . . . . . . . . 11 3 Clusters of player characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4 Relation between number of observations and player metrics . . . . . . . . . . . . . . . . . . . . . . 14 5 Beta distributions resulting from beta-binomial regression . . . . . . . . . . . . . . . . . . . . . . . 15 6 Generated situational space for a raise from first position . . . . . . . . . . . . . . . . . . . . . . . . 16 7 Relation between folds and raising size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 8 Sketch of situation with raise, call and fold to agent on BU . . . . . . . . . . . . . . . . . . . . . . . 20 9 Sketch of situation with raise from first position, folds to agent on SB . . . . . . . . . . . . . . . . . . 27 10 Part of decision tree after agent re-raises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 List of Tables 1 Estimations for player metrics using various methods . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2 Analysis of parameter d in equation 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3 Observations of situations in used dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4 Estimated weights for raise from first position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5 Estimated hole card range for a player who raised from first position. . . . . . . . . . . . . . . . . . . 27 List of Algorithms 1 Estimating a raising size for the agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4
1 Introduction The game’s complexity does not necessarily implicate that a computer can never be profitable playing games of The rise of Deep Blue, solving chess in 1996; and the poker. Since there is a clear distinction between profitable recent victory of AlphaGo, ‘solving’ the game of Go and non-profitable players, it can be assumed that at least in 2017—the human player against whom AlphaGo a part of the game is not based on chance. And, however competed actually won one out of three rounds, which small this part in the distribution between skill and chance can be called a victory for AlphaGo, but can merely be might be, it is big enough to turn hundreds of online play- called a definitive ‘solution’—has made developments ers into winners of tens of thousands of US dollars3. Two towards solving Texas Hold’em poker more relevant. studies involving groups of instructed and non-instructed Occasionally, programs such as PokerSnowie gain a poker players reinforced the statement of a greater role considerable amount of attention by making progress of skill than chance (DeDonno and Detterman, 2008). towards creating an ‘unbeatable poker algorithm1.’ Moreover, although broadly sceptical on the matter, However, poker is a totally different game, which, unlike Meyer et al. (2013) as well states that “experts seem to Chess and Go, involves a great deal of chance. It is a game be better able to minimize losses when confronted with with imperfect and unreliable information, involving disadvantageous conditions.” Nevertheless, it is worth opponent modelling, risk management and deception – noticing that besides concluding “that the outcomes of this illustrates the relevance of the game to significant poker games are predominantly determined by chance,” areas in AI research (Billings et al., 1998). Meyer et al. nuances this conclusion by stating that their conclusion applies “at least to short game sequences,” Despite the relatively narrow scope of this thesis, thus leaving long game sequences open for discussion. multiple different disciplines of AI are involved in DeDonno and Detterman filled up this gap by showing this paper. Moreover, the methods presented in this that skill is the determining factor in long-term outcome. paper are relevant for broader use than just poker. The This is the same advice professional poker coaches teach to-be-introduced formulas for beta-binomial regression their students: “it is about focusing on small advantages in section 3.1.2 are applicable for any problem in which to win in the long run4 , 5.” the estimation of long-term probabilities plays a role – an example may be the ‘batting average’ of baseball or In contrast to the work discussed above, this thesis cricket players. This is the number of a player’s ‘hits’ will not yield an answer to the debate whether poker is divided by their ‘at bats’2 (Robinson, 2017). Vector a game of chance or skill, but will instead focus on how spaces (further explained in section 3.1.3) can be used as a computer could flawlessly take over certain aspects an approach for a wide variety of problems containing a of the game, using the theory of Bayesian statistics and large or an infinite number of (game) states that can be vector spaces. The aim of this thesis is to answer the represented by numbers, such as drug repositioning and following question: how can an agent efficiently make use of stock markets (Manchanda and Anand 2017; Bai et al. the theory of Bayesian statistics and vector spaces to represent 2018, p. 217). For this reason, concepts in this paper are and compare game situations in order to make decisions in represented both symbolically, using first-order logic, and the early stages of a multi-player No-Limit Texas Hold’em mathematically, in an attempt to make the concepts more game? In order to adequately come up with an answer to accessible for other purposes than poker. this question, this research question will be subdivided into two subquestions: How can opponents be modelled by considering a population of players? and How can these opponent models be used to make decisions based on publicly available information? 1PokerSnowie, Challenge PokerSnowie, https://www.pokersno wie.com/blog/taxonomy/term/13, Accessed on February 27th, 2020 2Major League Baseball, What is a Batting Average (AVG)?, http:// 3HighstakesDB, Biggest Poker Winners - Top Money Winners in On- m.mlb.com/glossary/standard-stats/batting-average, Ac- line Poker, https://www.highstakesdb.com/poker-players.asp cessed on February 27th, 2020 x?sortby=winners, accessed on February 25, 2020 5
The approach presented in this paper is based on a 2 Putting the idea into perspective couple of personal observations. At first, if for a willing and competent human, taking part in a relatively small The significance of poker as a testbed for Artificial sample of poker hands (a couple of hundred thousand or Intelligence has led to extensive research in the field, million) suffice to turn the person into a winning player, which yielded a wide variety of attempted approaches a computer can certainly do it with the same number of towards ‘solving the game.’ These include reinforcement hands and probably less. Secondly, when playing against learning (Dahl, 2001) and neural networks (Davidson an unknown opponent, one uses the knowledge gained 1999; Billings et al. 2002, p. 226-227). The approach previously by playing against other opponents, which which is presented in this thesis will solely try to indicates the use of a ‘population’ – a statement supported maximize estimated expected value for a decision tree, by Chen and Ankenman (2006, p. 38, 67). using statistics directly derived from past observations (further explained in section 3.1.4) – this highly benefits It should be noted that it is assumed that the reader the explainability of choices the agent makes. is aware of the rules of Texas Hold’em, which is necessary to understand the essence of the method presented in According to popular opinion within the commu- this paper. A brief introduction to the rules of the game nity of poker players; within their community, poker relevant for this thesis are outlined in Appendix A. players are divided into two groups: mathematical players Throughout the document, conventional poker terms will (players who base their decisions on calculations) and be used. A list of these terms and their explanations can intuitive players (players who base their decisions on a be found in the glossary in Appendix B. This document certain ‘gut feeling’). I challenge the distinction between will often use terms such as ’the agent’ or ’the program,’ these two ‘types’ of players and treat them as equivalent referring to a hypothetical program implementing the – mathematical players actively make use of existing strategy described in this thesis. theorems and formulas, while intuitive players apply these same methods subconsciously. Because of the uncertain and complex nature of the game, one has to infer playing characteristics of opponents solely by observing an opponent play. The sample sizes of these observations are often small or non-existent, which makes a Bayesian approach to poker a straightforward choice compared to a frequentist interpretation6 of the game. This reasoning is followed by Chen and Ankenman (2006, p. 38). A known approach to the problem of predicting op- ponents’ hole cards is to use Bayes’ theorem in a like manner (Chen and Ankenman 2006, p. 60-64; Van der Kleij 2010, p. 39; Korb et al. 1999). A corresponding 4My Poker Coaching, How to become a professional poker player, https://www.mypokercoaching.com/become-professio nal-poker-player/, accessed on February 4, 2020 5Best Poker Coaching, Create Good Habits with a Winning Pregame Routine, https://www.bestpokercoaching.com/create-good-h abits-winning-pregame-routine/, accessed on February 4, 2020 6In probability theory, a distinction is often made between frequentist and Bayesian approaches; the former defining probabilities as represent- ing the occurrence of events in long run frequencies, the latter basing a probability on indications of the plausibility of an event. “A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule.” - Karl Pearson 6
approach is taken for the prediction of opponents’ actions In addition, when combining multiple variables into a (Ponsen et al. 2008; Southey et al. 2012; Korb et al. Bayesian model, which is generally required when mod- 1999). This thesis introduces an empirical Bayesian elling a multi-player No-Limit Hold’em game, another approach towards the modelling of opponents. The problem arises: the huge complexity of the game. A large proposition made in this paper is uncommon—and proba- number of possible game states causes data sets to often bly unprecedented—in the field of research on poker. The be too sparse for reliably representing the needed prior , method allows for an accurate estimation of statistics on beliefs9 29. The choice of vector spaces also formulates a opponents (e.g. a player’s degree of aggression) based on solution to this problem, as any previously observed poker the player population. It demonstrates a bias in existing game state can be considered in a weighted comparison work, such as in the mentioned Ponsen et al. (2008), with a newly encountered game state in which the same and solves it using beta-binomial regression which is action sequence occurred. This model will still behave explained in section 3.1.2. Although Bayesian statistical similarly as when a complete Bayesian model would methods will be extensively used in the most fundamental have been considered, albeit still achieving to avoid the aspects of the opponent models, this paper will differ from explained complications related to continuous bet and the earlier mentioned work by using vector spaces for the stack sizes, and being less prone to small samples. This representation of different game states and the prediction will be further discussed at the end of section 3.1.3. of an opponent’s actions, hole cards and raise sizes. This choice is inspired by the idea of conceptual spaces. Essential for calculating the expected value for branches of the to-be-constructed decision trees, as explained Conceptual spaces allow for the representation of in section 3.1.4, is estimating the relative strength or concepts in a geometric space, in which the dimensions equity of a starting hand. After discounting the two represent characteristics (or features) of the concept. A hole cards initially dealt to the agent, (50 5 ) = 2, 118, 760 similarity function is used to measure the similarity of different board combinations exist. The strength of one concepts within the space (Gärdenfors, 2004). An imple- single starting hand versus another hand can be precisely mentation based on conceptual spaces is pre-computable calculated by concerning all possible run-outs of the and handles bet sizing and stack sizing in a continuous board. The results of these calculations are readily way—No-Limit Hold’em is often deemed ‘too complex’ available10. In more thorough problems, such as when for AIs to solve because of the abundance of possible game determining the strength of a starting hand versus a range , states due to the allowance of bet sizes without limit7 8 of hands, a hand’s equity can be approximated with (Johanson, 2013)—whereas different implementations Monte Carlo simulations11 and are observed to be fastly opt for pre-specified static bet sizes through abstractions converging12 (Metropolis and Ulam, 1949, p. 335-341). (Moravčík et al. 2017, p. 3; Brown and Sandholm 2017; Many different implementations of hand evaluators are Brown et al. 2018), including already cited work (Van der available on software development platforms13, many of Kleij, 2010, p. 44). The implementation proposed in this which are based on mechanisms invented for efficient thesis does not pre-specify any kind of bet sizing. computing14. Large lookup tables for pre-calculated evaluations, such as for the Two Plus Two evaluator (containing 32,487,834 entries) are freely available15. 7Pokernews, Artificial Intelligence and Holdem, Part 3: No-Limit 9More formally called the prior probability of an event occurring. Holdem, The Next Frontier, https://www.pokernews.com/strate It expresses one’s ‘belief’ of a random event happening when applying gy/artificial-intelligence-hold-em-3-23218.htm, Accessed Bayesian statistical inference. on November 15, 2019 10PokerStove, preflop-matchups.txt.gz, http://web.archive.org/ 8Cardschat, Poker Bots Arent Powerful Enough to Solve No Limit Hol- web/20110612052656/http://www.pokerstove.com/analysis/ dem (Yet), https://www.cardschat.com/news/poker-bots-are preflop-matchups.txt.gz, accessed on February 28th, 2020 nt-powerful-enough-solve-no-limit-holdem-yet-52909, Ac- 11Monte Carlo simulations are a class of random sampling algorithms cessed on November 15, 2019 to approximate probabilities by considering subsets of a set of events. 7
Considering that a hand’s perceived equity depends on 3 Method various factors, such as ‘post-flop playability,’ other ap- proaches have been studied as well. Dalpasso and Lancia This section is composed of three subsections. The first (2015) observe the concept of equity as a combination of subsection is concerned with the proposed methodology features of a (hand on the) flop, such as ‘the flop contains and it effectively answers the main research question one overcard.’ Although based on the limit variant of the stated in the introduction: how can an agent efficiently game, groupings of starting hands have been proposed by, make use of the theory of Bayesian statistics and vector spaces among others, Johanson et al. (2013) and Sklansky and to represent and compare game situations in order to make Malmuth (1999, p. 14-15). The hand groupings stemming decisions in the early stages of a multi-player No-Limit Texas from the latter mentioned work will be used in section Hold’em game? In section 3.2, an implementation of the 3.1.3 to ‘learn’ weights for the formerly introduced vector proposed methodology is suggested. Section 3.3 proceeds spaces. As the scope of this thesis pertains only to the ‘pre- to present the findings of this implementation and serves flop’ phase of the game, a precise interpretation of equity to strengthen the usefulness of further investigating the, is prefered over an interpretation depending on ‘post-flop in section 3.1, proposed methods. In appendix C, as an play.’ In this thesis, Monte Carlo simulations will be used additional feature, a questionnaire and its results regarding to estimate the equity of a hand versus (possibly multiple) the quality of the suggested implementation is attached. opponents’ ranges of hands. In this thesis, a database of 6,552,060 hands and 63,657 different players is consistently used. Each hand is played by six players. The choice of this particular dataset is motivated in section 3.2.1. Only the first stage of the game is considered: the pre-flop betting round. This is the first betting round of the game when no ‘community cards’ have been revealed. For convenience purposes this part of the game will be referred to by using the following terminology: poker game, game, poker hand or hand. All games are assumed to take place in a ‘rake-free environment:’ no commission is withheld, unlike usually the case in poker games. Although briefly explained in appendix B, it is necessary to clarify what is meant by a ‘hole card range.’ At the start of every hand, player are dealt two hidden cards, which are used to form combinations with five community cards visible to all players. These cards are a player’s ‘hole cards.’ A ‘hole card range’ is a group of cards which a player is assumed to hold in a certain situation – instead of predicting an opponent’s exact hole cards, a weighted spectrum of possible holdings is proposed. 13GitHub, XPokerEval - A collection of poker hand evaluation source code compiled by James Devlin., https://github.com/tangentfo rks/XPokerEval, accessed on February 28th, 2020 14Suffecool, Cactus Kev’s Poker Hand Evaluator, http://suffe.co ol/poker/evaluator.html, accessed on February 28th, 2020 12Miscellaneous Remarks, Ideas, Trials, Poker 4: Monte Carlo Anal- 15GitHub, HandRanks.dat, https://github.com/christophsc ysis, http://oscar6echo.blogspot.com/2012/09/poker-4-mon hmalhofer/poker/blob/master/XPokerEval/XPokerEval.TwoP te-carlo-analysis.html, accessed on February 28th, 2020 lusTwo/HandRanks.dat, accessed on February 28th, 2020 8
ω ∈Ω Ω Ωβ = {ω ∣ β ∈ Bω } big blinds in order to preserve the ability to shift between, and take into account different stakes. [sβ ∣ β = m ∧ om ∈ O] • An ordered action sequence Aω of I actions ⟨α1 , . . . , αI ⟩. During a game, this action sequence can ζ γ sβ {Υ(oβ ) ∣ oβ ∈ Bω ∧ oβ ∈ O} be expanded by an action [αi ∈ ϕ (ωh )] performed by a player [pn = {{γ , ζ , η }, β } ∧ pn ∈ Bω ]. ′ B – A poker situation or situation is reffered to by a game’s action sequence. If referring to a η ∈ ϕ (ω ) αi+1 A O τ similar situation, an identical action sequence is assumed, regardless of the size of x when ∀pn ∈ Ψ(ωh ) αi = raise x. Figure 1: Dependency graph for variables in a poker game – x denotes the value of the total bet including ω . The dashed line illustrates the dynamics of players the amount before the bet’s increment. In this influencing each other. Dotted lines point to logical ex- paper, raise sizes are also denoted as a percent- pressions for relations. age for the total bet, with respect to the pot. For example: if the pot contains $0.15 and a player raises to $0.25, this raise is denoted as a 3.1 Proposed methodology 166.67%-raise. This subsection has been further subdivided into four And in general defined are: parts. The first part introduces a formal explanation of the • A function ϕ (ωh ) returning a set of currently legal game and deals with general definitions which are used actions a ⊂ {fold, call, check, raise x} where x is a dis- in the three subsequent subsections to answer the research crete amount between νω and γβ . question. Section 3.1.2 addresses the first subquestion: How can opponents be modelled by considering a population • A function ψ (ωh ) returning a one-element set (or the of players? While the two remaining sections focus on the empty set) containing the player pn = {{γ , ζ , η }, β } second subqestion: How can these opponent models be used who is to generate the next action αi+1 . The ordered to make decisions based on publicly available information?. sequence of all acting players is denoted by Ψ(ωh ) and we denote the i-th acting player pi by Ψi . 3.1.1 General definitions • A set Bω = {β1 . . . βN } ⊆ O of N players and a ′ nested set Bω = {p1 , . . . , pn } = {{γ , ζ , η }} × B where A set of definitions and variables are introduced below. γβ denotes the player’s current stack size divided The dependence of these variables is shown as a graphical by νω ; ζβ denotes the relative position on the ta- model in figure 1. 1 2 ble; and ηβ , ηβ ∈ {2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A}× A poker game will be denoted by ωh and is defined 1 2 {♠, ♡, ♣, ♢}, ηβ ≠ ηβ denotes the player’s hole as a set consisting of the following elements: cards. • An integer νω denoting the value of the big blind. – Since players can get raised, Ψ(ωh ) might con- tain the same β -element multiple times. – Players on a later position at the table have ac- – When referring to a player’s hole cards, a two- cess to more information (the actions of earlier letter notation (such as AK and T9 for variants positioned opponents) which influences action of A♣ K♠ and 10♣ 9♣ ) is often used, which may in- taken on such position (seat). clude an additional o or s (see appendix B under – The player’s stack is divided by the number of ‘suited’). A ‘T’ refers to a ten ( 10♣ , 10♠ , 10♡ or 10♢ ). 9
• A set Ω = {ω1 . . . ωH } of H previously com- 3.1.2 Opponent modelling pleted games. Each completed poker game ωh = One of the unknown variables defined in figure 1 is the {Aω , Bω , νω } is inferred by parsing hand histories. strategy sβ or playing style of an opponent. In accordance with section 3.1.1, it is assumed that a player’s strategy • A variable τ representing the factor of time. depends on the previous games in which the player participated. This relation is illustrated in figure 1. – A population changes its playing style over time The agent will emulate and enhance this behaviour by since opponents are influenced by each other additionally taking into account games in which the agent and by the increasing availability of outlines on did not participate. , strategy and the mathematics of the game16 17. A form of representing poker situations is needed • A set O of all possible opponents, which is referred in order to compare similar game situations. In the next to as the population. Every possible opponent is section, a multi-dimensional space will be introduced in considered as having strategy Υ(om ) = sm . Further, which vectors live whose components consist of values ′ defined is a set O = {o1 . . . oM ∣ om ∈ Bωh ∧ ωh ∈ Ω} describing the characteristics of these situations. The of all previously encountered opponents. described opponent models of each player participating in the hand serve as some of the components of these vectors. Since only the pre-flop part of the game is considered, Equation 1 and Equation 2 are introduced for describing sβ . These two metrics will be used solely to characterize each player and to identify similar players, by comparing these metrics in a geometric space. count({ω ∣ ω ∈ Ω ∧ β ∈ Bω ∧ ∃αi (αi ∈ Aω ∧ αi ∈ {call, raise x} ∧ β ∈ Ψi (ω ))}) VPIP(β ) = ⋅ 100 (1) count({ω ∣ ω ∈ Ω ∧ β ∈ Bω }) count({ω ∣ ω ∈ Ω ∧ β ∈ Bω ∧ ∃αi (αi ∈ Aω ∧ αi = raise x ∧ β ∈ Ψi (ω ))}) PFR(β ) = ⋅ 100 (2) count({ω ∣ ω ∈ Ω ∧ β ∈ Bω }) 16Reddit, How has NLHE strategy changed over the last 5 - 17TwoPlusTwo, Will Poker games continue to get tougher?, 10 years?, https://www.reddit.com/r/poker/comments/2ojlig https://forumserver.twoplustwo.com/32/beginners-quest /how_has_nlhe_strategy_changed_over_the_last_5_10/, Ac- ions/will-poker-games-continue-get-tougher-1435373/, cessed on November 20th, 2019 Accessed on November 20th, 2019 10
Equation 1 formalizes the concept of a VPIP-value – the percentage of hands a player voluntarily puts money into a pot when presented with the opportunity of doing so. 80 This value is an indicator of the tightness or looseness of a player, i.e. how many hands a player likes to 60 PFR/VPIP play. Equation 2 is similar to the preceding equation, except that in this equation only raises count towards the percentage. The resulting value is the player’s pre-flop 40 raise- or PFR-value and indicates the aggressiveness or passiveness of a player pre-flop. 20 In agreement with commonly held beliefs in the 0 poker community, different groups of poker players exist, 20 40 60 80 distinguished by their playing characteristics. The plots in VPIP figure 2 hint the existence of at least three of the ‘classical’ (a) Sample of 3,821 players (≥ 1000 hands) groups of players which are believed to significantly populate the overall player population: tight-aggressive, 100 tight-passive and loose-passive players. For a minimum of 1,000 hand observations per player, the Mean Shift procedure (Fukunaga and Hostetler, 1975) seems to 80 affirm this intuition as shown in figure 3a on page 13. PFR/VPIP Thorndike’s famous Elbow method (Thorndike, 1953) as 60 well hints the existence of three clusters when a clustering range of 1 to 10 is specified. This is visualized together 40 with respective k-means clustering (with k = 3) in figure 3b and 3c (Lloyd, 1982). These findings confirm the 20 usefulness of characterizing players by the two introduced metrics, as they convey sufficient information to classify 0 a playing style. 20 40 60 80 100 VPIP As oftentimes there are too little observations (or (b) Sample of 20,989 players (≥ 100 hands) no observations at all) on opponents to be encountered, a frequentist approach to calculating an individual opponent’s VPIP and PFR values is often inaccurate. Figure 2: Visualisation of VPIP- and PFR/VPIP-values Nevertheless, many profitable online poker players use for players in dataset software based on frequentist analyses to keep track of , , these statistics18 19 20. Although some players opt to 18BeatingBetting, How Valuable Are Poker HUDs?, https: //www.beatingbetting.co.uk/poker/best-poker-hud/#How_ Valuable_Are_Poker_HUDs, accessed February 29th, 2020 19CardsChat, How necessary is a HUD?, https://www.cardscha t.com/f61/how-necessary-a-hud-234552/, accessed on February 29th, 2020 20TwoPlusTwo, What % of players use a HUD?, https: //forumserver.twoplustwo.com/32/beginners-questions /what-players-use-hud-962957/, accessed on February 29th, 2020 11
, tweak their software21 22; the only known software in of probabilities of a probability (in this case a players’ the field to have directly implemented an alternative VPIP or PFR/VPIP value) as shown in equation 3 to 6 acknowledges the shortcomings of a frequentist approach, (Raïffa and Schlaifer 1961; Diaconis and Ylvisaker 1979, but does not implement a full solution such as described p. 274). Here, s denotes the number of successes (e.g. below23. To solve the problem of inaccurate statistics, a player voluntarily putting money into a pot), while players using this kind of software generally wait until f = n − s denotes the number of failures where n is the , , their statistics on players start to converge24 25 26, missing total number of observations on a player. As P(θ ) is valuable information in the meantime. beta distributed, and P(X∣θ ) is binomially distributed, P(θ ∣X) is also beta distributed. By integrating the prod- Instead, in this paper these metrics are proposed to uct of their probability mass and density functions, the be estimated using the population as prior knowledge. By probability density function for another beta distribution, replacing the introduced PFR-metric with a derived met- with parameters α + s and β + f , is derived. This is the ric abbreviated as PFR/VPIP and defined as VPFR PIP ⋅ 100, player-specific distribution based on observations on the both metrics can be interpreted as probabilities27. Since analysed player. Note that α , β and σ in this section refer the population can be modelled as a distribution of these to parameters of distributions rather than the definitions player characteristics, the population essentially becomes in section 3.1.1. a distribution of probabilities. As the beta distribution28 is a continuous probability distribution defined on population [0, 1], this distribution is a probability distribution of player specific Ì ÏÍ Î probabilities. This makes it perfectly suitable as the Ì ÏÍ Î P(X∣θ ) ⋅ P(θ ) P(X∣θ ) ⋅ P(θ ) Bayesian prior29. Since the player characteristics are P(θ ∣X) = = P(X) ∫ P(X∣θ ) ⋅ P(θ )d θ probabilities themselves, the Bayesian likelihood is Bernoulli distributed30. The theory of conjugate priors (3) can be used to construct a player’s own Beta distribution α −1 β −1 s f θ (1−θ ) (ns)θ (1 − θ ) ⋅ B(α ,β ) 21TwoPlusTwo, Help - Using PT4 to analyse population tendencies, = (4) θ α −1 (1−θ )β −1 https://forumserver.twoplustwo.com/185/heads-up-sng-s ∫ ((ns)θ s (1 − θ ) f ⋅ B(α ,β ) ) d θ pin-gos/help-using-pt4-analyse-population-tendencie s-1208561/, accessed on March 1st, 2020 22TwoPlusTwo, , https://forumserver.twoplustwo.com/15/ s+α −1 f +β −1 poker-theory/wilson-score-interval-1023431/, accessed on θ (1−θ ) March 1st, 2020 (ns) B(α ,β ) 23Poker Copilot, User Guide – Statistics or probabili- = (5) (ns)θ s+α −1 (1−θ ) f +β −1 ties?, https://pokercopilot.com/userguide/6/en/topic/sta ∫( B(α ,β ) ) dθ tistics-or-probabilities, accessed on February 29th, 2020 24Smart Poker Study, HUD Reliability: Number of Hands and Sam- ple Sizes, https://www.smartpokerstudy.com/hud-reliability s+α −1 f +β −1 -number-of-hands-and-sample-sizes-226, accessed on March θ (1 − θ ) 1st, 2020 = = Beta(α + s, β + f ) 25PokerStars School, HUD Stats Youre Doing it Wrong!, B(s + α , f + β ) https://www.pokerstarsschool.com/strategies/hud-sta (6) ts-doing-it-wrong/668/, accessed on March 1st, 2020 26TwoPlusTwo, How Do I Use HUD Stats With Small Samples?, https://forumserver.twoplustwo.com/32/beginners-quest 29Bayes’ theorem defines a probability by taking prior knowledge ions/how-do-i-use-hud-stats-small-samples-1551067/, into account. A probability of an event A given evidence B is given by: P(B∣A)P(A) accessed on March 1st, 2020 P(A ∣ B) = P(B) where P(B ∣ A) conveys the likelihood of evidence 27If we wouldn’t have divided the value for a player’s PFR by their B given that A indeed occurred, and P(A) conveys the prior probability VPIP-value, then their PFR-value would always been capped at their of A occurring at all. VPIP-value. 30The probability density function of a Bernoulli distribution is 28The probability density function of a Beta distribution is f (x; α , β ) = k f (k; p) = p (1 − p) 1−k where k ∈ {0, 1} and 0 ≤ p ≤ 1. Here, p is α −1 β −1 1 B(α ,β ) x (1 − x) , where 0 ≤ x ≤ 1 and α , β > 0. the parameter of the Bernoulli distribution. 12
A subsequent problem arises: the majority of players in a population are often players on which there are very few observations. The dataset used in this paper consists 80 TAg TAg for 67,02% of players with less than 100 observations. Attempting to improve the level of certainty of a dataset 60 PFR/VPIP by leaving out all players with a number of observations below a certain threshold, results in a bias. As pointed out 40 in section 2 of this thesis, some well-received papers on L/TPs this subject still incorrectly implement this process. The L/TPs 20 reason for the biased Bayesian prior is straightforward: LPs LPs professional players can be assumed to play differently 0 than ‘recreational players,’ but professional players also 20 40 60 80 play more often and thus have a higher chance of ending up VPIP (more frequently) in the dataset of observations on players. (a) Mean Shift clustering (3 clusters detected) By letting n denote the total number of played hands for a player in the dataset, I proceed to verify the statement k = 3, score = 439298.53 1750000 above by calculating the mean VPIP and PFR/VPIP for all 1500000 players for every possible n. So, the means for n = 55 are 1250000 the averaged metrics of every player who participated in a 1000000 total of 55 hands. To indicate convergence of a metric, an uncertainty value based on the Wilson Score Interval31 750000 is introduced and shown in equation 7 (Wilson, 1927). 500000 This uncertainty value is plotted against a logarithmic 250000 scale of n in figure 4 on page 14. By performing both 1 2 3 4 5 6 7 8 9 weighted and unweighted linear regression, the figure k illustrates a correlation between the VPIP and PFR/VPIP (b) Distortion score elbow for K-means clustering metrics and the number of observations on players. These graphs comply with a generally known idea among the poker player community that recreational players (or fish in poker jargon) often play more hands and show less 80 , TAg TAg aggression32 33. 60 PFR/VPIP T/LAg T/LAg √ z2 2 p̂⋅(1− p̂)+ 4⋅n 40 p̂ + z 2⋅n −z⋅ n Uncertainty = 2 ⋅ ( p̂ − 2 ) (7) 1 + zn 20 T/LPs T/LPs 31The Wilson Score Interval is a method for showing binomial pro- 0 portion confidence intervals. 20 40 60 80 32BlackRain79, What Does VPIP Mean? An Extremely Sim- VPIP ple Explanation, https://www.blackrain79.com/2016/07/what -is-vpip.html, accessed on December 7th, 2019 (c) K-means clustering of Figure 2a (k = 3) 33PokerVIP, Tips for Identifying the Recreational Players, https://www.pokervip.com/strategy-articles/texas-h old-em-no-limit-beginner/tips-for-identifying-the-rec Figure 3: Clusters of player characteristics reational-players, accessed on December 7th, 2019 13
In equation 3 to 6, a new beta distribution is introduced of which the parameters depend not only on observations on the population as a whole, but also on player-specific 1.0 Uncertainty = 2( p̂-Wilson(z = 0.95, n, p̂ = 0.257)) observations. As information on specific players gradu- Mean VPIP for players with n hands Trend (unweighted linear regressor) ally accumulates, information on the population becomes 0.8 Trend (weighted by number of players) VPIP as probability less relevant for the estimation of player-specific statistics. The more we observe a player’s play, the more we look 0.6 at that player’s playing history instead of to that of the population. The less information we have on a player, 0.4 the more we use the population to fill the gap. If we totally lack information on an opponent—if we have never observed that particular opponent—the population is the 0.2 only data we look at. In order to deal with the problem related to uncertainty, as explained above and illustrated 0.0 in figure 4, we let the original parameters (α , β ) depend 0 2 4 6 8 on ln(n). This way, a different pair of (α , β ) can be ln(n) for 0 < n ≤ 5000 (3,821 players) obtained for every possible n. Now this pair of parameters (a) Average VPIP values among players seem to decrease as cer- (α , β ) (which now only depends on the number of hands tainty increases. Value p̂ = 0.257 is chosen as it is the weighted played) can be updated in accordance with equation 6 as average VPIP value over the whole sample. player-specific observations come through. In equation 8, µ is defined as the inverse logit Uncertainty = 2 ( p̂-Wilson(z = 0.95, n, p̂ = 0.720)) function (or expit function)34 of a linear function of ln(n). Mean PFR/VPIP for players with n hands 0.8 The parameters α and β can now be defined with regard Trend (unweighted linear regressor) PFR/VPIP as probability Trend (weighted by number of players) to a mean and dispersion value (Pananos, 2020). By defining σ as the dispersion value, α and β are written 0.6 in terms of µ and σ in equation 9. By considering the probability density function of the beta distribution; a, 0.4 b and σ can be estimated using maximum likelihood estimation35 (Robinson, 2017, p. 56-64). The effect is illustrated in figure 5 on page 15, which plots the beta 0.2 distributions for four subsequent exponents of n. As expected, the distribution shifts towards a lower value for 0.0 a player’s VPIP as the number of played hands increases. 0 2 4 6 8 Finally, the expected value for a metric estimated using ln(n) for 0 < n ≤ 5000 (3,821 players) this method is given in equation 10. (b) Average PFR/VPIP values among players seem to increase as certainty increases. This is partly due to decreased VPIP values 1 as shown in subfigure a. Value p̂ = 0.720 is chosen as it is the µ= (8) weighted average PFR/VPIP value over the whole sample. 1 + e−(a+b⋅ln(n)) ⇒ α = µ ⋅ σ , β = (1 − µ ) ⋅ σ (9) α E[Beta(α , β )] = (10) Figure 4: Relation between number of observations and α +β player metrics 34The logit function maps probabilities (thus, values ranging from 0 to 1) to a value ranging to infinity. The inverse function, which we use in this thesis, does the opposite: it maps any value to a probability. 14
3.0 The idea of a ‘situational space’ is introduced, in 0 10 hands which a player’s expected value for the VPIP and 1 2.5 10 hands PFR/VPIP metrics are merely used as ‘input values’ 2 10 hands for constructing this geometric space. For every poker 3 10 hands 2.0 situation, a multi-dimensional space is generated in which vectors live whose components represent ‘characteristics’ p(V PIP∣α , β ) 1.5 of the situation. Each vector, directing to a point in this space, refers to an identical historical situation (as 1.0 explained in section 3.1.1, a poker situation is described to be identical if the same action sequence occurred) drawn from the dataset. Every player taking part in the 0.5 situation (in other words: players who did not fold at the point of evaluation) produces a part of the attributes 0.0 0.0 0.2 0.4 0.6 0.8 1.0 of the situation, captured by the components of the V PIP vectors living in the corresponding situational space. The following values are described for each active player: Figure 5: Non-player-specific Beta distributions resulting from performing beta-binomial linear regression on the 1. The player’s expected VPIP-value, estimated by the variable of the number of hands played by a player. method presented in section 3.1.2 and equation 10; 2. the player’s expected PFR/VPIP-value, estimated by the method presented in section 3.1.2 and equation 3.1.3 Situational space 10; The, in the previous section, introduced VPIP and 3. the player’s stack size at the beginning of the hand; PFR/VPIP values can be directly used to set up a strategy 4. whether the player was all-in (either 0 or 1); against opponents by assuming a player’s hole card range 5. the size of the raise made by the player (if applicable) to consist of the top portion of hands sampled from as a percentage with respect to the pot size; the distribution of all starting hands, where the player’s 6. the size of a second raise (if applicable) as a percent- metrics determines the size of this portion36. As it is age with respect to the pot size. not impossible for a player to hold a hand outside of this implied range (for example, players can have a liking for Each vector has an additional corresponding ‘label.’ This a certain type of hand), I opt to diverge from this idea. label captures the hole cards of the analysed player (if More importantly, players can play different cards in known); the action of the analysed player; and the size different ways – a player could, for example, decide on a of the raise made by the analysed player (if applicable). more conservative play while holding a weaker hand (e.g. An example could be the situation in which the player η = 9♢ 8♢ ), and bet a lower amount compared to when the on the first position raised. If focussing on the player same player would have been dealt a stronger hand (e.g. directly due to act after the raising player, the constructed 25 ♠ ♣ ). This kind of behaviour among opponents can be K K situational space would span R since six players profitably exploited by taking a more flexible approach, participate of which one specifies a raise size (in other such as the method I propose in this thesis. words, a 25-dimensional space would be generated). The similarity of these situations can be measured by 35In maximum likelihood estimation (or MLE) the most likely param- concerning the distance between the points defined by eters for a probability distribution are estimated by either differentiating these vectors. Building on the previous example, if facing or using iterative methods of trying different parameters. 36The Pokerbank, How to use VPIP in poker, https://www.thep a raise from the first position and analysing the player in okerbank.com/articles/software/vpip/, accessed on March 8th, the second position, all identical historical situations in 2020 the dataset are used to generate vectors with labels that 15
convey information about the successive action taken by If a player is due to act after the agent, the labels can the player in the second position. If we already know that be used to predict a future action by the opponent. the player in the second position took a certain action (e.g. Elaborating on the previous example, if the agent would called the bet raised by the player in the first position), we be seated in the first position and would have yet to decide can consider all vectors with a label corresponding to that on which action to take, these situational spaces can be taken action (in this example we would only draw vectors constructed as if the agent would have raised, in order to reflecting historical situations in which the player in the determine the (1) likelihood of the player in the second second position called after the player in the first position position calling the raise, and (2) the estimated hole card raised) and suggest a hole card range for the currently range with which this player would call. As will be encountered opponent. Since the vectors geometrically explained in section 3.1.4, in order to make a decision encompass the characteristics of historical situations with based on predictions for every participating opponent’s the same action sequence, the closer two points are in actions, raise sizes and hole card ranges, the agent will space, the more similar the situations they represent. generate situational spaces for all actions for every likely Weight is given to the actually held hole cards with action sequence. respect to the distance between the point representing the historical situation in which these hole cards are known to Considering not all of a situation’s characteristics be held, and the point representing the current situation in are of equal importance, the weights for each individual which the opponent’s hole cards are still unknown. This vector component (corresponding to a dimension in the example is illustrated in figure 6 below. space) should be estimated. For example, the player facing a raise from the first position is probably more interested in the VPIP-value of the raiser, than in the VPIP-value of the person to act next. These weights cannot be raise size (%/pot) ⎡ ˆ ⎤ obtained by solving linear systems of equations as we ⎢ ⎢.85 ⎥ ⎥ ⎢ ⎢ ⎥ ⎥} part of vector in R25 lack knowledge of any distance (or similarity) between ⎢ ⎥ ⎢.61 ⎥ ⎢ ⎢ ⎥ two given situations. In order to solve this problem, two ⎣.46⎥ we ⎦ ‘substitutes’ for distance values are introduced. These igh ted serve as ‘outcome values,’ with which the similarity ⎡ ⎢ ˆ ⎤ .82 ⎥ A K Mi ⎡ ˆ ⎤ ⎢ ⎢ ⎥ ⎥ ♠ ♡ nk ⎢ ⎢.18 ⎥ ⎥ (or distance) between two situations can be roughly ⎢ ⎢.31 ⎥ ⎥ ow ski ⎢ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎥ dis ⎢ ⎢.74 ⎥ ⎥ approximated: ⎣.31⎥ ⎢ ⎢ ⎢ ⎥ ⎦ ⎣.09⎥ tan ce ⎦ • Historical situations where the hole cards of the 8 7 ♡ ♣ player for which the situational space is analysed are known are grouped together, according to Sklansky’s VPIPraiser ⎡ ˆ ⎤ K♣ 6♢ PFR/VPIPraiser hand groups. These hand groups, which have been ⎢ ⎢.48 ⎥ ⎥ ⎢ ⎢.50⎥⎥ shortly mentioned in section 1, embody nine ranges ⎢ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎣ ⎥ .03 ⎦ of hands in decreasing order with respect to hand strength. These groups are used when estimating Figure 6: Situational space for a faced raise from first weights for the purpose of determining hole cards position. The hole card range for the faced raiser is ap- and raise sizes. These groups are divided as follows: proximated by using the weighted Minkowski distance for (Sklansky and Malmuth, 1999, p. 14-15) st nd th – Group 1: AA, AKs, KK, QQ, JJ comparing with past events. The 1 , 2 and 5 compo- nents of the normalized vectors are shown. For illustrative – Group 2: AKo, AQs, AJs, KQs, TT purposes, only 4 manually selected vectors out of 536,951 – Group 3: AQo, ATs, KJs, QJs, JTs, 99 – Group 4: AJo, KQo, KTs, QTs, J9s, T9s, 98s, 88 (of which 79,578 with known hole cards) are shown. The – Group 5: A9s-A2s, KJo, QJo, JTo, Q9s, T8s, 97s, coloured vector represents the ‘live situation’ for which 87s, 77, 76s, 66 hole cards should be predicted. – Group 6: ATo, KTo, QTo, J8s, 86s, 75s, 65s, 55, 54s 16
– Group 7: K9s-K2s, J9o, T9o, 98o, 64s, 53s, 44, 43s, 33, 22 – Group 8: A9o, K9o, Q9o, J8o, J7s, T8o, 96s, 87o, mean within group mean of means within groups 85s, 76o, 74s, 65o, 54o, 42s, 32s Ì ÏÍ Ì Î ÏÍ Î k » » 1 j k j »»2 wi = ⋅ ∑ (»»»» ( ⋅ ∑ c ji ) − ( ⋅ ∑ ( ⋅ ∑ c ji )) »» ) – Group 9: All other hands 1 k 1 1 k »» k k=1 » » j j=1 k k=1 j j=1 »» • Categories grouped by the actual action taken are (11) used when determining weights for the purpose of wi − min w ⇒ wi = i (12) predicting a future opponent’s action: (1) check/fold, ∑i=1 wi − min w (2) call, (3) raise x. The similarity of two vectors (representing situations) us- ing these weights is then calculated by dividing by the weighted Minkowski distance38, as shown in equation 13. Each group now contains vectors representing historical The sum of all the similarities for a specific element (e.g. situations (note that a vector can now occur in both one of A K (hand); call (action); or 133.33 (raise size)) is divided ♡ ♡ the groups categorized by the Sklansky hand groups listed by the sum of all similarities for elements of the same type, above, and one of the groups divided by the action taken). to obtain the weight of that element being relevant to the Now, we generate a matrix of which every row represents analysed opponent (e.g. holding A K ; calling; or when ♡ ♡ a vector in the group. This matrix is then ‘divided’ by (de- raising, raising with 133.33). fined as taking the element-wise division of every value for every row in the matrix, with every value of a denom- inator vector) a vector holding the maximum encountered 1 values for the considered component. In other words: S(a, b) = i (13) 1 + ∑i=1 ∣ wi ⋅ (ai − bi ) ∣ the vector components are normalized by dividing their individual values by the outcome of a max()-function ex- ecuted column-wise37. If for a component the maximum encountered value equals 0, the maximum value is set to The proposed method in this paper is a combination of 1 to avoid division by zero. For every group, the means two main ideas: empirical Bayesian statistics and vec- are taken for each component over the normalized values. tor spaces. As explained in section 2, this approach is The variance of these mean values corresponding to the chosen to diminish the negative effects of having too lit- same component among groups is then taken as the weight tle data available. This effect is further reduced by not for that component in that situation. This idea rests on the only concerning situations at the same point of evalua- the intuition that the more a value varies among different tion. In other words: based on the example used on the groups, the more that value helps to determine to which previous page, if we are constructing the vector space for group it belongs. This is summarized in equation 11 and the raiser, we would also include situations in which the 12. In the last step, all weights are normalized so that player in the second position folded or re-raised, instead ∑i=1 wi = 1 where i in c ji corresponds to the number of of only considering situations in which that player called i k dimensions in the situational space, k to the number of the faced raise. If a player participated in many hands, groups and j to the number of vectors within group k. the player will often have contributed to many different points in a vector space generated for an often-occuring situation. But as players are characterised by their VPIP and PFR/VPIP-values, which serve as components for the 37NumPy v.1.17 manual, numpy.ndarray.max, Return the maximum 38The Minkowski distance is a generalization of the Eucledian distance along a given axis.https://docs.scipy.org/doc/numpy/referen (a straight line between two points) and ‘taxicab distance’ (or ‘Manhat- ce/generated/numpy.ndarray.max.html, accessed on March 9th, ten distance’ – a stepwise distance calculated by summing the absolute 2020 difference of discrete coordinates) used in normalized vector spaces. 17
vectors used to compare situations, points reflecting situ- the first position, players who did not fold before that ations in which the same player played on the same posi- raise are due to act more than once. This means that tion will automatically be positioned close to each other the implemented algorithm should be recursive and as their VPIP and PFR/VPIP-values are identical. This is causes the previously listed type of action sequence compatible with ideas such as that a player’s range often to be nestedly included within this type as well. widens when being positioned later39, and that holdings When presented with a situation in which the agent has to are usually stronger when a player raises from an early po- make a decision, a tree of all possible action sequences sition compared to when that player would be seated in a and corresponding vector spaces will be generated. As later position40. The more observations obtained on a spe- the raise-option could theoretically cause the tree to be cific player, the more significant the historical actions of infinite, the number of allowed raises is ideally limited. that particular player, and the heavier these are weighted Having observed too few instances of a situation is an in predicting future actions for that player. This can be insurmountable obstacle which results in inadequate seen as a geometric adaptation of prior information in a estimations, although to a lesser degree than when Bayesian model. Furthermore, this method allows for pat- exact situations (including raise sizes and such) would terns in an opponent’s playing style to be recognized and have been compared with each other. This problem is exploited. Also, since all players are taken into account in inevitable and can be overcome by leaving these situations the situational vectors, this model supports the idea that out of the equation. As this only applies to situations opponents adapt to each other at the table, instead of only with a low number of observations, these observations the agent adapting to the opponents. Another advantage of are inherently statistically improbable. Table 3 on page this technique is the possibility of caching the matrices— 23 shows the degree of relevance of this phenomenon to together with their corresponding weights—in which the the used dataset in this thesis. As cutting out situations vectors are contained, in order to speed up future com- could cause probabilities of actions occuring to not add putation. In addition, the values of the matrix represent up to one, the probabilities should be normalized after a explainable values which make it easier to understand why few more steps are introduced. the algorithm would decide on a particular move. An example is considered wherein the agent is 3.1.4 Decision trees seated in the fourth position, ‘BU’ or the ‘button,’ and In order to explain the final concept which connects all A = ⟨raise .20, fold, call⟩. If the maximum amount of previously mentioned techniques, a distinction is made raises is set to 1, all possible action sequences are between two types of player action sequences: generated as shown in the list below. Bold text denotes the agent’s action – this style is used throughout the paper. • Action that happened before the agent is due to act. For the opponents involved in this type of action only 1. A = ⟨raise .20, fold, call, call, call, call⟩ the hole card ranges will be estimated using the vector 2. A = ⟨raise .20, fold, call, call, call, fold⟩ spaces discussed in section 3.1.3. 3. A = ⟨raise .20, fold, call, call, fold, call⟩ • Action that happens after the agent is due to act. For 4. A = ⟨raise .20, fold, call, call, fold, fold⟩ these opponents the action and the hole card range 5. A = ⟨raise .20, fold, call, fold, call, call⟩ corresponding to that action is predicted using vector 6. A = ⟨raise .20, fold, call, fold, call, fold⟩ spaces. If the action is a raise, the raising size is also 7. A = ⟨raise .20, fold, call, fold, fold, call⟩ predicted. If a raise is made by a player not seated in 8. A = ⟨raise .20, fold, call, fold, fold, fold⟩ 39CardsChat, Your Guide To Pre-Flop Calling Ranges, https: The likelihood for every listed action sequence is calcu- //www.cardschat.com/preflop-calling-hand-ranges.php, ac- lated by consulting the similarity measures for the actions cessed on March 9th, 2020 40PartyPoker, How To Play Poker In ‘Early’ Position, derived from the situational spaces. If an opponent https://www.partypoker.com/en/how-to-play/school/a raises after the agent called or raised, the agent is due dvanced/early-position, accessed on March 10th, 2020 to act multiple times. This needs to be handled in a 18
You can also read