I’m developing an RL agent. This isn’t exactly my case, but I’ll use a metaphor to explain my situation (gotta protect my research’s originality, haha).
Let’s say I’m working on a self-driving car algorithm using RL (not a vehicle engineer, though). In the real world, drivers might override the self-driving mode. For example, if my agent is trained to minimize travel time, it might end up compromising the driver’s comfort with sudden acceleration or braking. As a result, drivers may intervene and override the system.
If my environment consists of (i) a car and (ii) a driver who can intervene, my RL agent might struggle to explore the full action space during training due to these overrides. I expect that, eventually, the agent will learn to interact with the driver and optimize its policy to maximize rewards, but… that could take a really long time.
I was wondering—are there any established approaches to deal with situations like this? Have there been discussions on handling cases where external interventions limit an RL agent’s exploration?