For questions related to the concept of a stationary policy (in reinforcement learning and other AI sub-fields).
Questions tagged [stationary-policy]
6 questions
15
votes
4 answers
What does "stationary" mean in the context of reinforcement learning?
I think I've seen the expressions "stationary data", "stationary dynamics" and "stationary policy", among others, in the context of reinforcement learning. What does it mean? I think stationary policy means that the policy does not depend on time,…
Paula Vega
- 438
- 4
- 9
9
votes
1 answer
What is the difference between a stationary and a non-stationary policy?
In reinforcement learning, there are deterministic and non-deterministic (or stochastic) policies, but there are also stationary and non-stationary policies.
What is the difference between a stationary and a non-stationary policy? How do you…
nbro
- 42,615
- 12
- 119
- 217
2
votes
1 answer
Policy performance when the stationary state distribution is not unique in RL
Consider the chainworld above with two actions, move (in red) and stay (in blue). Moving in A is stochastic: the agent moves to B with probability $p$ and to C with probability $1-p$. Moving or staying in B and C is irrelevant.
Clearly, there…
Simon
- 263
- 1
- 8
2
votes
0 answers
Should I use the discounted average reward as objective in a finite-horizon problem?
I am new to reinforcement learning, but, for a finite horizon application problem, I am considering using the average reward instead of the sum of rewards as the objective. Specifically, there are a total of $T$ maximally possible time steps (e.g.,…
lll
- 121
- 2
1
vote
0 answers
Why do bootstrapping methods produce nonstationary targets more than non-bootstrapping methods?
The following quote is taken from the beginning of the chapter on "Approximate Solution Methods" (p. 198) in "Reinforcement Learning" by Sutton & Barto (2018):
reinforcement learning generally requires function approximation methods able to handle…
Johan
- 121
- 4
1
vote
1 answer
What is the difference between the definition of a stationary policy in reinforcement learning and contextual bandit?
A stationary policy is a function that maps a state to a probability distribution of actions.
In a contextual bandit problem, a state itself does not include the history. But in a reinforcement learning problem, the history can be used to define a…
Hunnam
- 227
- 1
- 6