1

If we have a working reward function, providing the desired behavior and optimal policy in a continuous action/state-space problem, would adding another action significantly affect the possible optimal policy?

For example, assume you have an RL problem with an action space of 1 (de/acceleration), state-space of 2 (distance from position and velocity), and the agent is tasked to accelerate in a straight line from position a to b.

Do you think the agent would behave majorly differently? I'm under the assumption that there would be minimal change aside from a longer training time assuming enough exploration, as the task is to still move in a straight line, but the agent would only have to account for steering action too now.

nbro
  • 42,615
  • 12
  • 119
  • 217
Philori
  • 11
  • 1

0 Answers0