For questions related to the Lunar Lander environment used in reinforcement learning.
Questions tagged [lunar-lander]
1 questions
3
votes
1 answer
Why is training longer not better in reinforcement learning?
I have trained an RL agent (PPO) for 6 million steps to solve the OpenAI gym LunarLander-v2. Surprisingly, the agent performs best already after 320K steps and is getting worse after that.
In the tensorboard log, I can see that the mean, min reward…
Martin S
- 233
- 1
- 6