Do all expert trajectories have the same starting state in apprenticeship learning?

Question

In the apprenticeship learning algorithm described by Ng et al. in Apprenticeship Learning via Inverse Reinforcement Learning, they mention that expert trajectories come in the form of $\{s_0^i, s_1^i\, ...\}_{i=1}^m$. However, they also mentioned that $s_0 $ is drawn from distribution D. Do all expert trajectories then have to have the same starting state? Why is it not possible to compute the feature expectation based on a single trajectory?

calveeen · Accepted Answer · 2020-04-03T13:54:36.873

3

All right, I figured it out. trajectories need not have the same starting state because the distribution of $s_0$ is drawn from a distribution D (mentioned in the paper). Had been confused because many of the code implementations on github focus on trajectories starting from the same state.

Hope this helps everyone !

edited Apr 03 '20 at 13:54

answered Apr 03 '20 at 13:10

calveeen

1,311
9
18

Do all expert trajectories have the same starting state in apprenticeship learning?

1 Answers1