1

Imagine a pentagon-shaped room. Inside there is a robot. There are no obstacles. It can only exit through corners and there are some rewards on the corners: e.g. 5, 4, 4, 3, 3. A reinforcement learning algorithm would find the values ​​inside the pentagon.The prediction model between the robot position and the optimal value could be obtained by a value neural network. The robot could use that model to move toward the position with the highest value in each move and reach the corner with a reward of 5. Question: since there are no obstacles inside the pentagon, we could obtain the optimal values ​​through interpolation using the rewards on the corners, without the need for a neural network, am I right? With this example I try to understand RL although I know that other solutions could be used.

AnraTAnraT
  • 13
  • 2

1 Answers1

0

RL typically assumes you don’t know the reward function or dynamics in advance, instead of being given the rewards at the corners, the agent must learn them through interactions with the Markovian environment in a model-free way. Interpolation without NN is sufficiently a neat and analytic solution here if you already know the model-based corner-only rewards and the regular room geometry, forming a gradient that directs the robot toward the highest reward of 5. In fact, linear interpolation function approximators involving features vectors consisting of higher-order or interaction components of states can model complex nonlinear value functions quite well in some cases.

However, in most real high-dimensional RL applications the environment is often more complex, partially observable, and stochastic with irregularities or obstacles, so that a simple interpolation won’t accurately capture the value functions and policy functions. There nonlinear function approximators like NN are useful for generalizing from limited experience, possibly even in a experience replay fashion like deep Q-learning network (DQN).

cinch
  • 11,000
  • 3
  • 8
  • 17