Computers aren’t limited to playing simple games like tic-tac-toe, checkers and chess. With enough computing power and time, machines can use math and game theory to comprehend the most difficult games — even poker.
A game is considered solved when a computer has determined the Game Theory Optimal (GTO) strategy. The GTO strategy either breaks even (when the opponent plays the same strategy) or wins (when the opponent plays any other strategy). Computers solved simple poker games like limit hold’em a few years ago, and they’re close to solving no-limit hold’em.
In 2017, Libratus, a bot developed at Carnegie Mellon University, crushed four of the best heads-up, no-limit hold’em players for more than 14 big blinds per 100 hands. (See “BUT CAN LIBRATUS DO THE THUMB FLIP?”) With $1/$2 blinds, that would be $28 per 100 hands, which is an incredibly high win rate.
Learning to bluff
Some players may not understand how a bot could possibly know how and when to bluff intelligently, but it turns out bluffing’s a common tactic discussed in game theory classes. For example, quite often players find themselves on the river with a polarized range, which consists of premium hands and junky hands, while an opponent has a condensed range, consisting of medium-strength hands that lose to premium hands and beat junk. In that situation, the polarized player can win the pot on average simply by betting an amount that results in opponents winning a portion of the time equal to their pot odds.
So, if a player’s range consists of 67% premium hands and 33% bluffs (using in-depth range analysis taught at pokercoaching.com), bet an amount that requires an oppo- nent to win 33% of the time. In that case, a pot-sized bet gives opponents 2:1 pot odds, meaning they need to win 33% of the time to break-even. With a polarized range, a pot-sized bet will win the pot on average, no matter what an opponent does. With a range of 83% premium hands and 17% bluffs, betting one-fourth of the size of the pot—giving an opponent 5:1 pot odds—wins the pot on aver- age. As players have more bluffs in their ranges, they can use a larger bet sizing. If the range is perfectly polar- ized with 51% premium made hands and 49% bluffs, a player could actu- ally bet 24.5 times the size of the pot, which almost no one does.
If players study using the two main GTO solvers available today (PioSolver and MonkerSolver) they’ll find patterns that come up all the time. For example, when determining which hands to continually bet on the flop against an opponent, the main concern is how the range fares against an opponent’s range. If players have the equity advantage (meaning their equity with their entire range on the flop is higher than an opponent’s equity with an entire range), bet with a large portion of the range using a small bet. Without the equity advantage, bet infrequently using a large size with a polarized range consisting of premium made hands and some draws, while checking marginal made hands and junk, plus a few traps. Using this knowledge, develop an implementable system taught at poker coaching. com to determine roughly the ideal betting and checking strategy in any situation.
While the GTO strategy is powerful, it’s usually the ideal strategy against only the best players in the world. As opponents play worse, adjust to take advantage of whatever they do incorrectly. Playing strictly on the GTO strategy would leave a ton of money on the table.
Passive exploitation happens when a player simply plays GTO and whatever an opponent does wrong wins money, while active exploitation is deviating from the GTO strategy to take further advantage of an opponent’s mistakes. The maximally exploitative strategy occurs when a player deviates from the GTO strategy in a way that maximizes profit from an opponent.
Detecting opponents’ errors
While it’s sometimes difficult to know what a specific opponent does incorrectly, many times it’s obvious. For example, many small-stakes players simply never bluff on the river. So, if a player gets to the river and an opponent with that tendency bets, fold all but the best-made hands. Other players bluff far too much, allowing a player to easily call down with all sorts of marginal made hands. Those are both examples of actively exploiting your opponent.
Themajorproblemwithusingthe maximally exploitative strategy is that an assessment of an opponent’s strategy could simply be wrong. If a player thinks an opponent never bluffs yet that opponent actually bluffs a lot, or if the player folds to most bets, then the player will get demolished. If a player thinks an opponent bluffs a lot, and thus calls
down with lots of marginal made hands, but it turns out the opponent essentially never bluffs, a player will also get demolished. If an opponent quickly and correctly counter-adjusts to combat a maximally exploitative strategy, a player will lose much more than could have potentially been won by making the initial adjustment.
Playing the GTO strategy sidesteps this dilemma, but will win less money in the long run, assuming your assessments are generally correct. So, until you are fairly certain about what your specific opponent does incorrectly against you, it is wise to play a fundamentally sound strategy.
But can libratus do the thumb flip?
Libratus, a poker bot created by Noam Brown and others at Carnegie Mellon University, beat four human professional players in 120,000 hands of heads-up, no-limit hold’em, in early 2017. Four distinguished professional poker players each played 30,000 hands. Each was summarily beaten by Libratus.
To reduce the luck factor, special rules ensured no party could just run hot over the course of the challenge. After 20 days, Libratus convincingly beat each pro at a win rate of 14.7 big blinds per 100 hands.
Despite the roughly 316,000,000,000,000,000 possible game situations, John Nash, the winner of the 1950 Nobel Prize in Economics, would deem heads- up, no-limit hold’em a game with a finite number of situations. Consequently, a Nash Equilibrium exists, which ensures that players using a Nash equilibrium strategy cannot lose against any other player in the long run.
A human poker player could never accurately recall, compute or apply the Nash equilibrium strategy to quadrillions of scenarios — but Libratus could. Nash equilibrium means that guts, bluffs, tells, reads and other differentiating strategies employed by the top pros, really don’t matter in the end.
That has implications for the online poker industry, which posted more than $1 billion in revenue last year. The poker sites have the challenge of ensuring that no online player is using AI, while convincing players of a level playing field.
For a deeper discussion of AI achievements at poker, listen to MIT’s Artificial Intelligence 12.28.18 episode podcast, which includes an interview of a Libratus co-creator, “Tuomas Sabdholm: Poker and Game Theory”. (See podcast review in Arts & Media)
Jonathan Little, a professional poker player and WPT Player of the Year, has amassed more than $7 million in live tournament winnings, written 14 best-selling books and teaches at PokerCoaching.com. @jonathanlittle