For questions related to the concept of "optimal policy" in reinforcement learning.
Questions tagged [optimal-policy]
12 questions
5
votes
2 answers
Given two optimal policies, is an affine combination of them also optimal?
If there are two different optimal policies $\pi_1, \pi_2$ in a reinforcement learning task, will the linear combination (or affine combination) of the two policies $\alpha \pi_1 + \beta \pi_2, \alpha + \beta = 1$ also be an optimal policy?
Here I…
yang liu
- 53
- 4
5
votes
1 answer
What's the optimal policy in the rock-paper-scissors game?
A deterministic policy in the rock-paper-scissors game can be easily exploited by the opponent - by doing just the right sequence of moves to defeat the agent. More often than not, I've heard that a random policy is the optimal policy in this case -…
stoic-santiago
- 1,201
- 9
- 22
4
votes
1 answer
An example of a unique value function which is associated with multiple optimal policies
In the 4th paragraph of
http://www.incompleteideas.net/book/ebook/node37.html
it is mentioned:
Whereas the optimal value functions for states and state-action pairs are unique for a given MDP, there can be many optimal policies
Could you please…
Melanie A
- 143
- 2
4
votes
1 answer
Understanding the optimal value function in RL
The definition (section 3.6 Barto Sutton) for the optimal policy states that $\pi > \pi'$ iff $v_{\pi}(s) > v_{\pi'}(s)$ for all $s \in S$.
I have difficulty understanding why the value (under the optimal policy) should be higher for every state.…
ahron
- 265
- 2
- 7
3
votes
1 answer
What is the difference between a greedy policy and an optimal policy?
I am struggling to understand what is the difference between an optimal policy and a greedy policy.
Let $F(r_{t+1},s_{t+1}| s_t,a_t)$ be the probability distribution accorting to which, given action $a_t$ in state $s_t$, reward $r_{t+1}$ realizes…
fennel
- 33
- 1
- 5
3
votes
1 answer
Can an optimal policy have a value function that has a smaller value for a state than a non-optimal policy?
I'm starting to learn about the Bellman Equation and a question came to my mind.
A policy $\pi$ is optimal if the value $v_\pi(s)$ is greater or equal than the value $v_{\pi'}(s)$ for all states $s \in S$.
Why does this work?
Can't it be that the…
raphael_mav
- 133
- 4
3
votes
1 answer
How is $v_*(s) = \max_{\pi} v_\pi(s)$ also applicable in the case of stochastic policies?
I am reading Sutton & Bartos's Book "Introduction to reinforcement learning". In this book, the defined the optimal value function as:
$$v_*(s) = \max_{\pi} v_\pi(s),$$ for all $s \in \mathcal{S}$.
Do we take the max over all deterministic policies,…
Tamar
- 33
- 3
2
votes
1 answer
In which community does using a Bayesian regression model as a reward function with exploration vs. exploitation challenges fall under?
I am trying to find research papers addressing a problem that, in my opinion, deserves significant attention. However, I am having difficulty locating relevant information.
To illustrate the problem at hand, consider a multivariate Bayesian…
paul
- 35
- 1
- 5
2
votes
1 answer
What does $v(S_{t+1})$ mean in the optimal state-action value function?
In Sutton & Barto's Reinforcement Learning: An Introduction page 63 the authors introduce the optimal state value function in the expression of the optimal action-value function as follows: $q_{*}(s,a)=\mathbb{E}[R_{t+1}+\gamma…
Daviiid
- 585
- 5
- 17
2
votes
2 answers
Why is the optimal policy for an infinite horizon MDP deterministic?
Could someone please help me gain some intuition as to why the optimal policy for a Markov Decision Process in the infinite horizon case (agent acts forever) is deterministic?
stoic-santiago
- 1,201
- 9
- 22
0
votes
1 answer
How is policy iteration capable of improving on a deterministic policy?
Given a policy $\pi$ and the improved version upon it using policy iteration $\pi'$ we have, for $\forall s \in S$, $v_{\pi'}(s)\geq v_{\pi}(s)$.
I think the way we choose $\pi'$ makes it deterministic (unless there is a tie but let's not consider…
Daviiid
- 585
- 5
- 17
0
votes
0 answers
Determine Gridworld values
I am learning Reinforcement learning for games following Gridworld examples. Apologies in advance if this is a basic question, very new to reinforcement learning.
I am slightly confused in scenarios where probability of moving up, down, left and…
Krellex
- 145
- 5