Highest Voted 'stochastic-policy' Questions - Artificial Intelligence Stack Exchange

16

votes

3 answers

Is the optimal policy always stochastic if the environment is also stochastic?

Is the optimal policy always stochastic (that is, a map from states to a probability distribution over actions) if the environment is also stochastic? Intuitively, if the environment is deterministic (that is, if the agent is in a state $s$ and…

asked Feb 15 '19 at 13:20

nbro

42,615
12
119
217

8

votes

3 answers

What is the difference between a stochastic and a deterministic policy?

In reinforcement learning, there are the concepts of stochastic (or probabilistic) and deterministic policies. What is the difference between them?

reinforcement-learning comparison policies deterministic-policy stochastic-policy

asked May 12 '19 at 18:50

nbro

42,615
12
119
217

4

votes

2 answers

How do we estimate the value of a stochastic policy?

I'm learning about reinforcement learning, particularly policy gradient methods and actor-critic methods. I've noticed that many algortihms use stochastic policies during training (i.e. they select the actions from a probability distribution). I…

reinforcement-learning policy-gradients actor-critic-methods value-functions stochastic-policy

asked Jun 14 '22 at 11:08

mac_or_cheese

41
1

3

votes

1 answer

How is $v_*(s) = \max_{\pi} v_\pi(s)$ also applicable in the case of stochastic policies?

I am reading Sutton & Bartos's Book "Introduction to reinforcement learning". In this book, the defined the optimal value function as: $$v_*(s) = \max_{\pi} v_\pi(s),$$ for all $s \in \mathcal{S}$. Do we take the max over all deterministic policies,…

markov-decision-process value-functions stochastic-policy optimal-policy optimality

asked Mar 26 '21 at 08:15

Tamar

33
3

3

votes

1 answer

In the policy gradient equation, is $\pi(a_{t} | s_{t}, \theta)$ a distribution or a function?

I am learning about policy gradient methods from the Deep RL Bootcamp by Peter Abbeel and I am a bit stumbled by the math presented. In the lecture, he derives the gradient logarithm likelihood of a trajectory to be $$\nabla log P(\tau^{i};\theta) =…

math deep-rl policy-gradients stochastic-policy

asked Feb 21 '20 at 16:23

calveeen

1,311
9
18

3

votes

1 answer

What's the value of making the RL agent's output stochastic opposed to deterministic?

I have a question about a reinforcement learning problem. I'm training an agent to add or delete pixels in a [12 x 12] 2D space (going to be 3D in the future). Its action space consists of two discrete outputs: x[0-12] and y[0-12]. What would be…

reinforcement-learning stochastic-policy

asked Dec 15 '19 at 11:35

SumakuTension

275
3
9

3

votes

1 answer

Is it possible for value-based methods to learn stochastic policies?

Is it possible for value-based methods to learn stochastic policies? I'm trying to get a clear picture of the different categories for RL algorithms, and while doing so I started to think about settings where the optimal policy is stochastic…

reinforcement-learning value-functions pomdp stochastic-policy value-based-methods

asked Oct 24 '19 at 09:30

Krrrl

221
1
10

3

votes

1 answer

Can Q-learning be used to derive a stochastic policy?

In my understanding, Q-learning gives you a deterministic policy. However, can we use some technique to build a meaningful stochastic policy from the learned Q values? I think that simply using a softmax won't work.

reinforcement-learning q-learning stochastic-policy deterministic-policy

asked Feb 08 '19 at 01:47

Hammer. Wang

153
6

2

votes

1 answer

Can a policy with gaussian distribution allow two distinct optimal actions to have distinctively high probabilities?

As an example to show the benefits of stochastic policy, I often have seen the below grid world example. Five blocks in a row. the first, third, and fifth are white(distinguishable states), and the second and fourth are gray(for agent, these two…

reinforcement-learning stochastic-policy

asked Mar 22 '24 at 23:18

user3315463

45
3

2

votes

1 answer

Is a learned policy, for a deterministic problem, trained in a supervised process, a stochastic policy?

If I trained a neural network with 4 outputs (one for each action: move down, up, left, and right) to move an agent through a grid (deterministic problem). The output of the neural network is a probability distribution over the 4 actions, due to the…

neural-networks policies deterministic-policy stochastic-policy softmax-policy

asked Feb 03 '21 at 12:47

Xtalker

21
2

2

votes

1 answer

Did Alphago zero actually beat Alphago 100 games to 0?

tl;dr Did AlphaGo and AlphaGo play 100 repetitions of the same sequence of boards, or were there 100 different games? Background: Alphago was the first superhuman go player, but it had human tuning and training. AlphaGo zero learned to be more…

alphago-zero alphago deterministic-policy stochastic-policy

asked Oct 21 '20 at 14:33

EngrStudent

371
3
12

0

votes

0 answers

Grayscale to RGB888 vs RGB332 to RGB888 in same colorization training between two universes

Suppose there are two parallel universes that train deep learning models for color resolution. The first universe uses grayscaled image as input that has dimension (640,480,1), the second universe uses RGB332 image as input that has same dimension…

computer-vision performance metric stochastic-policy decoder

asked Oct 29 '24 at 09:30

Muhammad Ikhwan Perwira

800
3
10

0

votes

1 answer

Consequence of Dvoretzky Stochastic Approximation Theorem

I am trying to understand all the steps to prove the TD0 algorithm, and I am following a proof which uses a theorem of Tommi Jaakkola, Michael I. Jordan and Satinder P. Singh, in the paper: On the Convergence of Stochastic Iterative Dynamic…

reinforcement-learning temporal-difference-methods dynamic-programming stochastic-policy

asked May 12 '23 at 11:10

Kareit

19
3

Questions tagged [stochastic-policy]