2

I have a steady hex-map and turn-based wargame featuring WWII carrier battles.

On a given turn, a player may choose to perform a large number of actions. Actions can be of many different types, and some actions may be performed independently of each other while others have dependencies. For example, a player may decide to move one or two naval units, then assign a mission to an air unit or not, then adjust some battle parameters or not, and then reorganize a naval task force or not.


Usually, boardgames allow players to perform only one action each turn (e.g. go or chess) or a few very similar actions (backgammon).

Here the player may select

  • Several actions
  • The actions are of different nature
  • Each action may have parameters that the player must set (e.g. strength, payload, destination)

How could I approach this problem with reinforcement learning? How would I specify a model or train it effectively to play such a game?

Here is a screenshot of the game.

enter image description here

Here's another.

enter image description here

nbro
  • 42,615
  • 12
  • 119
  • 217

2 Answers2

1

The master's thesis Action space representation in combinatorial multi-armed bandits (2015) seems to provide an answer to my question.

Several algorithms can be used

  • Naive Monte-Carlo Sampling (NMC)
  • Linear Side Information (LSI)
  • Monte Carlo Tree Search with Hierarchical Expansion (MCTS-HE)
  • MCTS with Dimensional Expansion

The idea is to divide and conquer.

Here's a screenshot of the HE algorithm from section 4.2.

enter image description here

nbro
  • 42,615
  • 12
  • 119
  • 217
0

One very popular RL algorithm that is capable of predicting multiple action outputs concurrently is Proximal Policy Optimization. In that algorithm, one or more, say $n$, tuples of outputs, $(\mu, \sigma)$, can be predicted at once (having $2*n$ output nodes), where each tuple is used to parameterize a Gaussian distribution from which a respective action value is sampled. The thus sampled action values are then applied in the simulation/game. By slightly modifying this procedure, this can of course also be applied to discrete action spaces equally well.

To get you started, multiple high-quality implementations of PPO are available for rapid prototyping, e.g. OpenAI baselines or stable-baselines.

Daniel B.
  • 835
  • 1
  • 6
  • 14