2

I have recently posted a question here about a problem that I have controlling a robotic arm. Basically I have a dense reward for the arms position, and a sparse reward for the arms stiffness: Reward shaping for dense and sparse rewards

I am using PPO at the moment in my attempts to solve it but with little success. Now I learned about specialised Multi-Objective RL algorithms and think that they might be suited better.

Still I am wondering why that would be or why we would need a separate algorithm for MOO. Can we not always put all reward for multiple objectives into some formula to form a single reward? Weighted sum, ratio, difference, whatever?

mavex857
  • 53
  • 2

1 Answers1

0

Yes, we can do that (look at the examples from MORL Gymnasium). They have a linear function to weight more rewards into one scalar and they use only this value as any single policy single reward problem out there.

But...

But let's take the panda kitchen environment, where a robotic arm learns how to open a microwave or to lift and tilt a boiler or to turn the right temperature on a stove or how to grasp a glass and so. Everything in the same environment. With conventional RL you can teach the robotic arm to perform only one of those tasks. But how to train the robot to open the microwave and put a glass in it? Or to put some water in a boiler and then set the temperature? You need a "supervisor" policy, which learn transitions between those tasks and learns the sequence of tasks to perform in order to increase the overall reward.

Look at the playlist I linked above. It will answer many questions.

Dave
  • 214
  • 1
  • 11