1

I am thinking of applying apprenticeship learning on retrospective data. From looking at this paper by Ng https://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf which talks about apprenticeship learning, it seems to me that at the 5th step of the algorithm,

  1. Compute (or estimate) $μ^{(i)}$ = $μ(π^{(i)})$, where $\mu^{(i)}$ = $E[\sum_{t=0}^{∞}\gamma^{t}$$\phi(s_{t})$ | $\pi^{(i)}]$, $\phi(s_{t})$ is the reward feature vector at state $s_t$.

From my understanding, a sequence of $s_0, s_1, s_2 ..$ trajectory would have to be generated at this step, following this policy $\pi^{(i)}$. Hence, applying this algorithm on retrospective data would not work?

nbro
  • 42,615
  • 12
  • 119
  • 217
calveeen
  • 1,311
  • 9
  • 18

0 Answers0