When implementing Monte Carlo Tree Search for TicTacToe, do I simulate won states?

Question

I'm trying to debug my MCTS implementation for TicTacToe (it doesn't block obvious wins for the opponent). I was wondering what the algorithm should do if it expands to a node which is a game over state. Should it continue to "simulate" that node and back-propogate the results up the tree or just ignore it if the node is chosen.

The game over states are the only states that give you a score in tic tac toe: win, lose, or draw. Back propogating that information is important. — candied_orange, Dec 10 '16 at 14:39
@CandiedOrange I understand that but I was wondering whether there was a difference between reaching a terminal state through random playout (not reflected in the game tree) vs through expansion of the tree. — Amja, Dec 10 '16 at 17:00

manlio · Accepted Answer · 2018-08-02T07:27:10.547

The basic variant of MCTS doesn't require a special management for terminal states.

It just updates the score / number of visits depending on the result (win / lose) and proceeds with back-propagation.

It's a sort of istantaneous playout/simulation and cannot be skipped (allowing MCTS to converge to the game-theoretical value).

The algorithm can be improved (e.g. MONTE-CARLO TREE SEARCH SOLVER) using large scores (say +∞ / -∞) and ad hoc backpropagation / selection schemes:

A special provision is then taken when backing such proven values up the tree. There are three cases to consider as shown in Fig. (we use the negamax formulation, alternating signs between levels).

First, when a simulation backs up a proven loss (−∞) from a child c to a parent p, the parent node p becomes, and is labelled as, a proven win (∞), that is, the position is won for the player at p because the move played leads to a win (left backup diagram in the figure).

When backing up a proven win (∞) from c to p, one must, however, also look at the other children of p to determine p’s value. In the second case, when all child nodes of p are also a proven win (∞), then the value of p becomes a proven loss (-∞), because all moves lead to a position lost for p (middle backup diagram in the figure).

However, the third case occurs if there exists at least one child with a value different value from a proven win. Then we cannot label p as a proven loss. Instead p gets updates as if a simulation win (instead of a proven win) were being backed up from node c (right backup diagram in the figure; v and u indicate non-proven values). Non-proven values are backed up as in regular MCTS.

1) Further details in Monte-Carlo Tree Search in Lines of Action by Mark H.M. Winands, Yngvi Bjornsson and Jahn-Takeshi Saito.

When implementing Monte Carlo Tree Search for TicTacToe, do I simulate won states?

1 Answers1