2

I'm trying to implement Monte Carlo Tree Search for (a simplified version of) the boardgame Commands and Colors -- I'm setting up a scenario where the AI side has overwhelming force: 6 units vs 3 units played by the human.

I would hope that MCTS moves the 6 units in for the kill; instead what happens is that some units attack, some move sideways, and some retreat.

I suspect that the units in the front are already strong enough to make victory likely, so that the unit in the back does not see a difference in its moving closer to the action, and chooses to move away from it. I suspect that when evaluating the value of the moves of the far away unit, the "noise" caused by the action of the units in the front, whose actions make the value of the position swing heavily, make it very difficult to evaluate the smaller contribution to the quality of the position made by moving the unit in the back.

This is sad! A human player would move all units towards the enemy, bc if the front units get damaged, they will be moved away from the front and replaced by second-line units. Having units move randomly away from the action makes no sense.

How do I fix it?

-- Edit:

The source code is here https://github.com/xpmatteo/auto-cca The case that does not work as expected can be observed with:

  • make server
  • (open another terminal)
  • make open
  • click "End Phase" twice

The lone brown (Carthaginian) unit should close in against the gray (Roman) ones, but it doesnt.

xpmatteo
  • 21
  • 5

3 Answers3

1

One option is that the RL is not there yet, you may need to continue with the training, or maybe add to the win/lose reward some tiny "hints" what is a good move (0.001 when you eat an enemy, -0.001 when your unit is being eliminated, etc..).

I do assume that all the moves of your units are taken in one "action" of your army, and that you evaluate the reward for all the units as one single "army".

Oren
  • 141
  • 3
1

It is very nice and I recommend others to play with your code. Really nice. Did I get it right that the Carthaginian are the ones with extra power and you refer to them as the AI? It seems that the Carthaginian always win, so why should they learn to win this way rather than another way? If it is important to you how fast they will win, then add some penalty on the time maybe?

Oren
  • 141
  • 3
1

Monte Carlo Tree Search is searching for the action with the best expected reward. If the reward is determined by whether it wins or losses, then all winning moves are equal and one will be selected arbitrarily. If you want them to win quickly, then you can adjust the reward for winning quickly.

Supposing your reward distribution is [0, 1], I would keep some constant reward for winning and add some reward that decays as time to win increases. For example, your reward formula could be for a win 0.5 + 0.5 / T where T is the time taken to achieve victory.