I'm trying to implement MCTS with UCT for a board game and I'm kinda stuck. The state space is quite large (3e15), and I'd like to compute a good move in less than 2 seconds. I already have MCTS implemented in Java from here, and I noticed that it takes a long time to actually reach a terminal node in the simulation phase.
So, would it be possible to simulate games up until a specific depth?
Instead of returning the winner of the game after running until the max depth, I could return an evaluation of the board (the board game is simple enough to write an evaluation function), which then back propagates.
The issue I'm having is in handling the backpropagation. I'm not quite sure what to do here. Any help/resources/guidance is appreciated!
 
     
    