Hi! I have just made my first model in stable-baselines3 using pygame in Python. The game is about a ball reaching the highest platform out of three placed in the sky.
Now - after a few days of trying I managed to make the model learn how to get there.
But then after reaching the third platform it falls out of the map.
I wanted to train a new model that would solve this issue but to my surprise increasing total_timesteps to 500_000 completely fails the test - the ball just jumps in one place, whereas the model with much fewer - 150_000 got to the highest platform!
Why is that?
Shouldn't more timesteps converge even more into staying on the highest platform and not falling off?
Here's my call function
model = PPO("MlpPolicy", env, verbose=1, learning_rate=0.0002, ent_coef=0.2)
model.learn(total_timesteps=500_000)
