Minimax with alpha-beta pruning in Mancala

Minimax search is a fundamental depth-first search algorithm used in Artificial Intelligence, decision theory, game theory, and statistics. The purpose of minimax search is to minimize the maximum loss in a worst-case scenario - or, simply put, figure out the best action to pursue in a worst-case scenario.

This algorithm can be implemented in n-player perfect information games but is most commonly implemented in 2-player zero-sum games such as Checkers, Chess, Connect 4, Mancala, Tic-Tac-Toe, etc. Zero-sum games are simplified as games where whatever (value/utility) one player wins and, the other player loses.

Perfect information games are games where every player knows the results of all previous moves. In games of perfect information, there is at least one "best" way to play for each player. The "best" way to play can be obtained by searching into future moves using and evaluating the game states with search algorithms such as minimax. Moreover, the best strategy does not necessarily allow one to win but will minimize the losses³.

In this article, we'll discuss what minimax search is, the related vocabulary, search tree structure, search process, alpha-beta pruning, and, finally, implement minimax with and without alpha-beta pruning in mancala.

Minimax vocabulary

Let's familiarize ourselves with minimax-related variables and keywords before we implement minimax searches.

Minimax tree structure

Keyword	Variable	Definition
Player/Maximizer	p_max¹	Maximizes their action's value - or minimizes the maximum loss - in a worst-case scenario
Opponent/Minimizer	p_min	Minimizes the value of the player's (p_max's) maximum gain
Heuristic value	`v`²	Value of the player's action - or current game state - obtained by an evaluation function. Larger numbers are more beneficial for p_max, whereas smaller numbers are more beneficial for p_min.
Branching factor	`b`	How many actions the player has available
Depth	`d`	How deep into the tree - or how many moves ahead - the algorithm will search

Suppose we're playing a 2-player turn-based game where each player has a choice between two actions per turn. The branching factor, b, will be equal to 2. We'll configure the algorithm to search to a depth of 3 - that is, we'll consider the player's possible moves, the opponent's responses, and the player's responses to the opponent's responses - which will set the variable d equal to 3.

Given the variables b=2 and d=3, the algorithm will generate the following tree structure:

The number of heuristic evaluations - also called terminal nodes or leaf nodes, seen in the figure as v1-v8 - completed in a minimax search tree is denoted by: b^d. It's important to notice that the size of the search tree increases exponentially based on how deep the algorithm searches and how large the branching factor is.

We have a mere 2³ (8) heuristics evaluations in the tree above when looking 3 moves ahead. However, the tree represents an arbitrary game for which there are only two possible actions per turn.

Consider chess, where the branching factor (possible moves) is exactly 20 for opening moves and about 35 for the remainder of the game⁴. If we re-configured our minimax algorithm's branching factor (b) to reflect the number of possible moves in chess, we would have to perform roughly 35³ (42,875) heuristic evaluations. And that's only looking 3 moves ahead!

The last thing to consider about these trees is that the branching factor is not always uniform. Just because there are two possible moves this turn, it doesn't mean there will be two possible moves next turn. Perhaps there is only one possible move next turn or you may have reached a terminal state where there are no possible moves; you've won or lost.

Looking at chess again... there are 20 possible opening moves. After the first move, and depending on which piece is moved, there can be anywhere from 20 to 29 moves. The opening move below allows for 29 possible moves on white's next turn - or 28 possible moves if black moves their pawn to d5.

Minimax search process

Recall that minimax search is a depth-first search algorithm; it starts at the root node and travels as deep as possible through each branch before backtracking to the top of the tree. The depth-first search is complete once all nodes have been visited.

Minimax evaluates all terminal nodes at the maximum search depth d and then begins the search process. The non-terminal nodes' values are determined by the minimizer/maximizer's choices of the descendent leaf nodes.

Referring back to the minimax tree, the search process begins as follows for a tree with depth 3:

The gist of minimax is that the minimizer and maximizer alternate picking their most optimal value for every level of the tree. Who picks first depends on how deep the tree is, but it's guaranteed that the maximizer picks last because they are the root node and the main player.

When the search completes, the root node will contain the player's maximum gain in a worst-case scenario, assuming the opponent is always playing their best move. Minimax with deep searches (large d value) are not optimal against opponents with random actions because the algorithm assumes the opponent is competent and playing optimal moves.

One last thing to consider is that the algorithm's search time drastically increases for complex games with a large branching factor b - also called action spaces - such as chess. The algorithm visits every node - every possible action - before choosing an optimal action. So the larger and deeper the tree, the longer the search algorithm takes to find the most optimal value.

The search time of large trees can be dramatically reduced by applying an optimization called alpha-beta pruning.

Alpha-beta pruning

Alpha-beta pruning is an optimization technique for the minimax search algorithm. The optimization greatly reduces search times by preventing minimax from visiting and evaluating unnecessary nodes.

Recall that the number of nodes visited by minimax is exponential (b^d). In best-case scenarios, alpha-beta pruning reduces the exponent in half, thus considerably improving search times. But how does alpha-beta pruning optimize search times?

The pruning process cuts off leaves - or entire sub-trees - in the game tree which need not be searched because a better move already exists. It's easy to think that the algorithm may prune the most optimal action; however, alpha-beta pruning will return the same action as minimax, only much faster.

Alpha-beta vocabulary

The optimization technique adds two new terms to our minimax vocabulary: alpha (α) and beta (β)

Alpha-beta pruning process

Before diving into the pruning process, we need to understand the principles of alpha-beta:

Let's now walk through the alpha-beta pruning process. In the following figures, Nodes A, C, D, F, G are the maximizers, and Nodes B, E are the minimizers.

Keyword	Variable	Definition
Alpha	`α`	The best action (highest value) p_max can guarantee at the current tree depth or above. Initial value is -∞.
Beta	`β`	The best action (lowest value) the p_min can guarantee at the current tree depth or above. Initial value is +∞.

Node C evaluates its left-hand terminal node (2) and compares it to the current α value. If the terminal node's value is greater than the current α value - e.g. max(2, -∞) - then Node C updates α with the terminal node's value. Node C's node and α values are now equal to 2 because 2 is greater than -∞.

Node C then evaluates its right-hand terminal node. The right-hand node's value (1) is less than α (2) and less than Node C's current value (2), so nothing is updated.

Node C's value is backtracked upwards; Node B's value is set to 2. Node B then compares its β value (+∞) to its current node value (2) and updates β if the current node value is less than β - e.g. min(2, +∞). Node B's β value is now set to 2 and propagated downwards to Node D's β value.

Node D evaluates the left-hand node and updates its node and α values to 9. Here's the important part: Node D's α value is now greater than its β value, so it no longer has to evaluate the remaining nodes. The right-hand node is pruned, thus saving the algorithm from evaluating the heuristic value of one node so far.

Node D backtracks its node value to Node B. Node B selects the minimum value from C and D with min(2, 9), and selects 2. Node B then backtracks its node value to Node A, and Node A updates its node and α values to 2 after evaluating max(2, -∞).

Node A propagates its alpha-beta values (α=2, β=+∞) down to Nodes E and F. Node F evaluates its left-hand terminal node (0) and selects 0 for its node value, but does not update the α value because max(0, 2) is still 2. Node F then evaluates the right-hand node but doesn't update its node or α value because -3 is less than 0 and 2.

Node F backtracks its node value up to Node E, where E will update its β value with 0 after evaluating min(0, +∞). Now E's alpha-beta values are (α=2, β=0). Here's another important part: Node E's α value is greater than its β value, so it prunes the entire sub-tree of Node G. Thus saving the algorithm from two more node evaluations, for a total of three fewer heuristic evaluations overall.

Node E backtracks its node value of 0 up to Node A where Node A will take the max(B, E) or max(2, 0).

Of course Node A will select 2. And that's how we can find the player's most optimal action in a worst-case scenario using minimax with alpha-beta pruning!

Alpha-beta pruning considerations

The order in which nodes are examined directly affects alpha-beta pruning's effectiveness. There are two types of node orderings to be conscious of before creating a minimax game tree:

Order moves from best to worst using domain knowledge so the best move resides on/near the left-most terminal node.

In Chess, for instance, certain actions are more valuable than others: winning captures, promotions, equal captures, non-captures⁶. If we wanted to maximize the alpha-beta pruning process on a Chess game tree, we'd order the nodes from most valuable to least valuable - aka an ideal ordering.

Mancala

Mancala is a 2-player turn-based board game. Each player has 6 pits with 4 seeds/stones along with 1 mancala (store) at the end of the board.

Players take turns picking up all the seeds from one of their 6 pits and placing them one by one until they're holding no seeds. Stones are placed counterclockwise into pits and in the player's mancala at the end of the board. Players must not place seeds in their opponent's mancala.

Lastly, there is a capture rule: If the last stone in the player's hand lands in an empty pit on their side of the board, and the adjacent pit on the opponent's side contains 1+ seeds, the player may capture all seeds from both pits and place them in their mancala.

Mancala simulator

I wrote a simple, CLI Mancala game simulator in Python. The code can be found on my GitHub repo. An example of the simulator can be seen below.

The top-most row is the indices of Player 1's pits. The following row is Player 1's pits initialized with 4 seeds. The left-most number is Player 1's mancala index, 13. The number immediately to the right of Player 1's mancala is his/her score. Everything else is Player 2's indices, pits, or mancala.

Each player takes turns selecting a pit. The simulator will then output which pit the player selected along with four additional outputs:

In addition to the game simulator, I wrote an Agent class so I could play against bots using various Mancala strategies. So far, the bots' strategies are random, minimax, and minimax with alpha-beta pruning.

Additional strategies could be also implemented - such as maximizing re-turns, prioritizing captures, prevent captures - but we're focusing on minimax in this article. In the future, I plan on using OpenAI Gym to simulate a bot tournament and find the strongest Mancala strategy.

Mancala game tree

The minimax agent creates game trees by iterating over all its possible moves, simulating the opponent's move in response, and repeating the process until it reaches its maximum search depth.

Large game trees are created throughout the game - especially when each player's pits are full of seeds, such as the beginning of the game - due to Mancala's branching factor of 6. Recall that the number of terminal nodes requiring evaluations is equal to b^d, where b is the branching factor (number of possible actions in that game state) and d is the algorithm's search depth.

Each player begins with 6 possible moves, or a branching factor, b, of 6. The minimax Agent's default depth variable, d, is set to 3. Therefore, at the beginning of the game, there are a total of 6³, or 216, terminal nodes requiring heuristic evaluations before the bot makes its first move.

If we increase the depth of the search to 8, we have 6⁸ - or 1,679,616 - heuristic evaluations. Bumping the depth up once again to 9 requires 10,077,696 evaluations. An order-of-magnitude difference to look ahead one more move. In case you're wondering, looking ahead 10 moves requires 60,466,176 heuristic evaluations. Exponential numbers are impressive.

Mancala heuristic evaluation

The terminal nodes' heuristic values - the value of the players' moves - are evaluated by the Agent's evaluation method. The heuristic evaluation is rather simple: compare the Agent's score to the opponent's score after executing some move, and return the difference between scores. A positive number means the Agent is winning; a negative number means the opponent is winning. In cases where the Agent discovers multiple moves with the same optimal outcome, a random move will be selected.

In short, the agent calculates move values with a simple and weak evaluation method that uses no domain knowledge. It picks moves that increase its score; that's it.

The evaluation method can be upgraded by utilizing domain knowledge. We could implement valuations/weights to each possible move, such as captures, re-turns. For example, we could weigh capture moves to be more valuable than simply scoring one seed. Additional valuations include winning moves, re-turns, defensive moves to prevent captures, re-turns that lead to captures, etc.

The weak evaluation method means there will be many of the same values, - i.e. the player moves but the score doesn't change very much. Therefore, more nodes will need to be visited and calculated.

Stronger evaluation methods will reduce search times by increasing pruning activity with ideal ordering. If our Agent can have more granular output values, then it can prune more nodes or sub-trees. That means fewer nodes will be visited and calculated.

If you're curious about what stronger evaluation methods look like, I encourage you to check out Chess engines and their evaluation methods. Each engine's evaluation method is unique, but they share the following Chess knowledge:

Lastly, watch this video from Sebastian Lague. This video solidified my understanding of heuristic evaluation.

Mancala conclusion

The minimax agent plays naively due to its weak heuristic evaluation method. With a stronger evaluation method, the Agent could prioritize chaining re-turns or captures and win the game much quicker.

Moreover, Agents with shallow depths (small d) often play single-scoring moves instead of chaining re-turns to maximize their scoring outcome. This is more-or-less expected behavior due to two parts:

Alpha-beta pruning proved to drastically speed up the Agent's decision-making. Without alpha-beta pruning, a minimax agent with a depth greater than 5 takes over 10 seconds to select a move. With alpha-beta pruning, the Agent can look as far as 8 moves ahead in less than 10 seconds.

Increasing the depth to 9 moves increases search times to over 1 minute, but the Agent's strategy does not change. Agents with depths 7, 8, and 9 all play the same opening and mid-game strategy!

Overall, minimax is a simple, yet fundamental, search algorithm. The algorithm excels in making the safest moves in perfect information games when given a large enough search depth. Optimizations such as alpha-beta pruning drastically reduce the algorithm's search times.

Visit the references section below for more detailed articles explaining minimax⁷.

I look forward to learning more about search algorithms and pathfinding algorithms. The next step is OpenAI Gym and graph traversals!

References/Notes

Other papers or articles may refer to the player as p_i or a_i, and opponent(s) as p_-i or a_-i ↩
The heuristic value is often referred to as utility. The variable u represents utility, where u_i represents the player's utility. ↩
Stanford CS, Game Theory: Zero-Sum Games ↩
How the Computer Beat the Go Master ↩
Depth-first search animation ↩
Chess Move Ordering: Typical move ordering ↩
Minimax algorithm and alpha-beta pruning: Understanding minimax in detail from a beginner's perspective ↩