1

I have a doubt about how clipping affects the training of the RL agents.

In particular, I have come across a code for training DDPG agents, the pseudo-code is the following:

1  for i in training iterations
2      action = clip(ddpg.prediction(state) * a + b, x, y)
3      state, reward = environment(action)
4      store action, state and reward
5      if the number of experiences is larger than L:
6          update the parameters of the agent

In this case, the actor NN that predicts the DDPG has a $\tanh$ activation in the output layer.

My question is, could we add the clipping in the output layer of the actor (changing $\tanh(x)$ by $\operatorname{clip}(a\cdot \tanh(x)+b, x, y$) in the training loop? Would the training work in that case?

nbro
  • 42,615
  • 12
  • 119
  • 217
Leibniz
  • 69
  • 5

0 Answers0