15

I'm learning DDPG algorithm by following the following link: Open AI Spinning Up document on DDPG, where it is written

In order for the algorithm to have stable behavior, the replay buffer should be large enough to contain a wide range of experiences, but it may not always be good to keep everything.

What does this mean? Is it related to the tuning of the parameter of the batch size in the algorithm?

nbro
  • 42,615
  • 12
  • 119
  • 217
ycenycute
  • 351
  • 1
  • 2
  • 7

2 Answers2

11

You need to read this 2020 paper by Deepmind: "Revisiting Fundamentals of Experience Replay" They explicitly test the size of the experience replay, the replay-ratio of each experience and other parameters.

Also, to add to the answer by @nbro

Assume you implement experience replay as a buffer where the newest memory is stored instead of the oldest. Then, if your buffer contains 100k entries, any memory will remain there for exactly 100k iterations.

Such a buffer is simply a way to "see" what was up to 100k iterations ago. After the first 100k iterations you fill the buffer and begin "moving" it, much like a sliding window, by inserting new memories instead of the oldest.


The size of the buffer (relative to the total number of iterations you plan to ever train with) depends on "how much you believe your network architecture is susceptible to catastrophic forgetting".

A tiny buffer might force your network to only care about what it saw recently.

But an excessively large buffer might take a long time to "become refreshed" with good trajectories, when they finally start to be discovered. So the network would be like a university student whose book shelf is diluted with first-grade school books.

The student might have already decided that he/she wishes to become a programmer, so re-reading those primary school books has little benefit (time could have been spent more productively on programming literature) + it takes a long time to replace those with relevant university books.

Kari
  • 280
  • 3
  • 9
10

In order for the algorithm to have stable behavior, the replay buffer should be large enough to contain a wide range of experiences, but it may not always be good to keep everything.

The larger the experience replay, the less likely you will sample correlated elements, hence the more stable the training of the NN will be. However, a large experience replay also requires a lot of memory and it might slow training. So, there is a trade-off between training stability (of the NN) and memory requirements.

The authors of the linked article state (right after the sentence above)

If you only use the very-most recent data, you will overfit to that and things will break; if you use too much experience, you may slow down your learning. This may take some tuning to get right.

nbro
  • 42,615
  • 12
  • 119
  • 217