3

In the apprenticeship learning algorithm described by Ng et al. in Apprenticeship Learning via Inverse Reinforcement Learning, they mention that expert trajectories come in the form of $\{s_0^i, s_1^i\, ...\}_{i=1}^m$. However, they also mentioned that $s_0 $ is drawn from distribution D. Do all expert trajectories then have to have the same starting state? Why is it not possible to compute the feature expectation based on a single trajectory?

nbro
  • 42,615
  • 12
  • 119
  • 217
calveeen
  • 1,311
  • 9
  • 18

1 Answers1

3

All right, I figured it out. trajectories need not have the same starting state because the distribution of $s_0$ is drawn from a distribution D (mentioned in the paper). Had been confused because many of the code implementations on github focus on trajectories starting from the same state.

Hope this helps everyone !

calveeen
  • 1,311
  • 9
  • 18