I just read the following points about the number of required expert demonstrations in imitation learning, and I'd like some clarifications. For the purpose of context, I'll be using a linear reward function throughout this post (i.e. the reward can be expressed as a weighted sum of the components of a state's feature vector)
The number of expert demonstrations required scales with the number of features in the reward function.
I don't think this is obvious at all - why is it true? Intuitively, I think that as the number of features rises, the complexity of the problem does too, so we may need more data to make a better estimate of the expert's reward function. Is there more to it?
The number of expert demonstration required does not depend on -
- Complexity of the expert’s optimal policy $\pi^{*}$
- Size of the state space
I don't see how the complexity of the expert's optimal policy plays a role here - which is probably why it doesn't affect the number of expert demonstrations we need; but how do we quantify the complexity of a policy in the first place?
Also, I think that the number of expert demonstrations should depend on the size of the state space. For example, if the train and test distributions don't match, we can't do behavioral cloning without falling into problems, in which case we use the DAGGER algorithm to repeatedly query the expert and make better decisions (take better actions). I feel that a larger state space means that we'll have to query the expert more frequently, i.e. to figure out the expert's optimal action in several states.
I'd love to know everyone's thoughts on this - the dependence of the number of expert demonstrations on the above, and if any, other factors. Thank you!
Source: Slide 20/75
 
    