Questions tagged [on-policy-distribution]

2 questions
5
votes
1 answer

What is the difference between an on-policy distribution and state visitation frequency?

On-policy distribution is defined as follows in Sutton and Barto: On the other hand, state visitation frequency is defined as follows in Trust Region Policy Optimization: $$\rho_{\pi}(s) = \sum_{t=0}^{T} \gamma^t P(s_t=s|\pi)$$ Question: What is…
3
votes
2 answers

In the on-policy state distribution for episodic tasks, why don't we take into account the length of the episode?

In Sutton & Barto's "Reinforcement Learning: An Introduction", 2nd edition, page 199, they describe the on-policy distribution for episodic tasks in the following box: I don't understand how this can be done without taking the length of the episode…