2

I'm interested in developing a RL application for a robotic arm in a virtual environment. But I stuck at the question, whether:

  1. The observation space should contain the target position + the actual position + the joint values of the robotic arm. The rewarding is then based on the euclidean distance between the target and the actual position, or:
  2. The observation space should contain only the joint values of the robotic arm. The rewarding the computed using the target and the actual position.

The reasons of my question are the followings: I would use the second approach, since the input size would be only the number of joints, so the agent would need less parameters and it would learn a policy about how to move the robotic arm to maximize the rewards based on a start and target position. This is even more intuitive for me, since it somehow "agnostic" since the agent learns independently of the starting the final position. But the first approach is what I saw in many different example code in internet.

So any help to point me in the right direction?

Dave
  • 214
  • 1
  • 11

0 Answers0