1

I have a case where my state consists of relatively large number of features, e.g. 50, whereas my action size is 1. I wonder whether my state features dominate the action in my critic network. I believe in theory eventually it shouldn't matter but considering the sequential nature of RL training I am afraid the state features outweigh the action and its effect will be negligible.

What I already tried is the following:

enter image description here

Here where the state output and action are combined I use tanh activation because my action is in [-1, 1]. This led to almost flat performance from the very beginning with no improvement at all. I understand this might be due to vanishing gradients caused by tanh. I also tried the linear activation instead of the tanh, this time average episode return was fluctuating around some value with no signs of learning.

What I am currently testing is stacking the action, say 50 times, to match the number of the state features.

Any other ideas on how to tackle this issue.

Mika
  • 371
  • 2
  • 10

1 Answers1

1

It is certainly possible for the state features to dominate the action features in the critic.

There are several strategies you can use:

  1. Replace the action features with a high dimensional learned embedding vector. This way you can scale up it's importance.

  2. Introduce the action at a deeper stage of the network. This way, the action is being combined with the state after the state has been compressed by the earlier layers

  3. Simply have many action features inputs that are all identical. This is similar to method 1 but easier.

I have used all 3 of these methods. They all worked well in different cases.

chessprogrammer
  • 3,050
  • 2
  • 16
  • 26