0

I have a scheduling problem that I am trying to solve with RL (if you are interested in more details you can read about it here Reinforcement learning applicable to a scheduling problem?).

I have created an own environment (OpenAI-Gym) and I have trained the model for one specific day of the simulation. So I have 288 timesteps for one day (1 for each 5 minutes) and the simulation last until the end of the day. So the agent needs to make 288 decisions when having 1 control variable.

Now my question is whether it is possible to successively train an RL agent on the same environment for different days? The environment and reward function will stay the same but the input data will change as every day has different input data (temperature, heat demand, electricity price etc.). So I would like to train an agent for one day and then tell the agent to train on another day but not forget everything it has learned during the training of the first day. Thus I can make sure that the agent is not overfitting to one special input data but also has the ability to generalize and thus be applicable for different days.

Do you know if and how I can do this?

Reminder: Can anybody tell me more about this by now. I'll highly appreciate every further comment as I still don't know how to do this.

PeterBe
  • 276
  • 3
  • 14

1 Answers1

1

You can mitigate catastrophic forgetting by storing the trajectories generated by the actors during training in a replay buffer. Then, you sample trajectories from that replay buffer. This way, each mini-batch of experience will contain data from multiple days.

There are many strategies to do this sampling, but you can start with uniform sampling. From what you're describing, it doesn't seem that storage is going to be an issue (288 data points per day is small), so you can keep all trajectories. If you can't afford to store all trajectories, then you should also design a strategy to remove them from the replay buffer.

You can refer to this handy guide describing how to implement a replay buffer in TensorFlow.