In the apprenticeship learning algorithm described by Ng et al. in Apprenticeship Learning via Inverse Reinforcement Learning, they mention that expert trajectories come in the form of $\{s_0^i, s_1^i\, ...\}_{i=1}^m$. However, they also mentioned that $s_0 $ is drawn from distribution D. Do all expert trajectories then have to have the same starting state? Why is it not possible to compute the feature expectation based on a single trajectory?
Asked
Active
Viewed 76 times
3
1 Answers
3
All right, I figured it out. trajectories need not have the same starting state because the distribution of $s_0$ is drawn from a distribution D (mentioned in the paper). Had been confused because many of the code implementations on github focus on trajectories starting from the same state.
Hope this helps everyone !
calveeen
- 1,311
- 9
- 18