Questions tagged [optimal-policy]

For questions related to the concept of "optimal policy" in reinforcement learning.

12 questions
5
votes
2 answers

Given two optimal policies, is an affine combination of them also optimal?

If there are two different optimal policies $\pi_1, \pi_2$ in a reinforcement learning task, will the linear combination (or affine combination) of the two policies $\alpha \pi_1 + \beta \pi_2, \alpha + \beta = 1$ also be an optimal policy? Here I…
5
votes
1 answer

What's the optimal policy in the rock-paper-scissors game?

A deterministic policy in the rock-paper-scissors game can be easily exploited by the opponent - by doing just the right sequence of moves to defeat the agent. More often than not, I've heard that a random policy is the optimal policy in this case -…
4
votes
1 answer

An example of a unique value function which is associated with multiple optimal policies

In the 4th paragraph of http://www.incompleteideas.net/book/ebook/node37.html it is mentioned: Whereas the optimal value functions for states and state-action pairs are unique for a given MDP, there can be many optimal policies Could you please…
4
votes
1 answer

Understanding the optimal value function in RL

The definition (section 3.6 Barto Sutton) for the optimal policy states that $\pi > \pi'$ iff $v_{\pi}(s) > v_{\pi'}(s)$ for all $s \in S$. I have difficulty understanding why the value (under the optimal policy) should be higher for every state.…
ahron
  • 265
  • 2
  • 7
3
votes
1 answer

What is the difference between a greedy policy and an optimal policy?

I am struggling to understand what is the difference between an optimal policy and a greedy policy. Let $F(r_{t+1},s_{t+1}| s_t,a_t)$ be the probability distribution accorting to which, given action $a_t$ in state $s_t$, reward $r_{t+1}$ realizes…
3
votes
1 answer

Can an optimal policy have a value function that has a smaller value for a state than a non-optimal policy?

I'm starting to learn about the Bellman Equation and a question came to my mind. A policy $\pi$ is optimal if the value $v_\pi(s)$ is greater or equal than the value $v_{\pi'}(s)$ for all states $s \in S$. Why does this work? Can't it be that the…
3
votes
1 answer

How is $v_*(s) = \max_{\pi} v_\pi(s)$ also applicable in the case of stochastic policies?

I am reading Sutton & Bartos's Book "Introduction to reinforcement learning". In this book, the defined the optimal value function as: $$v_*(s) = \max_{\pi} v_\pi(s),$$ for all $s \in \mathcal{S}$. Do we take the max over all deterministic policies,…
2
votes
1 answer

In which community does using a Bayesian regression model as a reward function with exploration vs. exploitation challenges fall under?

I am trying to find research papers addressing a problem that, in my opinion, deserves significant attention. However, I am having difficulty locating relevant information. To illustrate the problem at hand, consider a multivariate Bayesian…
2
votes
1 answer

What does $v(S_{t+1})$ mean in the optimal state-action value function?

In Sutton & Barto's Reinforcement Learning: An Introduction page 63 the authors introduce the optimal state value function in the expression of the optimal action-value function as follows: $q_{*}(s,a)=\mathbb{E}[R_{t+1}+\gamma…
Daviiid
  • 585
  • 5
  • 17
2
votes
2 answers

Why is the optimal policy for an infinite horizon MDP deterministic?

Could someone please help me gain some intuition as to why the optimal policy for a Markov Decision Process in the infinite horizon case (agent acts forever) is deterministic?
0
votes
1 answer

How is policy iteration capable of improving on a deterministic policy?

Given a policy $\pi$ and the improved version upon it using policy iteration $\pi'$ we have, for $\forall s \in S$, $v_{\pi'}(s)\geq v_{\pi}(s)$. I think the way we choose $\pi'$ makes it deterministic (unless there is a tie but let's not consider…
0
votes
0 answers

Determine Gridworld values

I am learning Reinforcement learning for games following Gridworld examples. Apologies in advance if this is a basic question, very new to reinforcement learning. I am slightly confused in scenarios where probability of moving up, down, left and…