3

I am new to DRL and trying to implement my custom environment. I want to know if normalization and regularization techniques are as important in RL as in Deep Learning.

In my custom environment, the state/observation values are in a different range. For example, one observation is in the range of [1, 20], while another is in [0, 50000]. Should I apply normalization or not? I am confused. Any suggestions?

nbro
  • 42,615
  • 12
  • 119
  • 217
moyukh
  • 31
  • 1
  • 2

2 Answers2

3

The use of normalisation in neural networks and many other (but not all - decision trees are a notable exception) machine learning methods, is to improve the quality of the parameter space with respect to optimisers that will apply to it.

If you are using a function approximator that benefits from normalisation in supervised learning scenarios, it will also benefit from it in reinforcement learning scenarios. That is definitely the case for neural networks. And neural networks are by far the most common approximator used in deep reinforcement learning.

Unlike supervised learning, you will not have a definitive dataset where you can find mean and standard deviation in order to scale to the common $\mu = 0, \sigma = 1$. Instead you will want to scale to a range, such as $[-1, 1]$.

You may also want to perform some basic feature engineering first, such as using log of a value, or some power of it - anything that would make the distribution of values you expect to see more like a Normal distribution. Again, this is something you could do in supervised learning more easily, but maybe you know enough about the feature to make a good guess.

Neil Slater
  • 33,739
  • 3
  • 47
  • 66
2

On creating custom environments:

... always normalize your observation space when you can, i.e., when you know the boundaries (From stable-baselines)

You could normalize them as part of the environment's state space or before passing them as input to the policy. Depending on the the agent's algorithm implementation, what works for you may vary.

(See this answer from a related question)

mugoh
  • 549
  • 4
  • 21