For questions about value-based reinforcement learning (RL) methods (or algorithms), which first learn a value function and then derive the policy from it. An example of a value-based RL algorithm is Q-learning.
Questions tagged [value-based-methods]
10 questions
5
votes
1 answer
Is reinforcement learning only about determining the value function?
I started reading some reinforcement learning literature, and it seems to me that all approaches to solving reinforcement learning problems are about finding the value function (state-value function or action-state value function).
Are there any…
Felix P.
- 295
- 1
- 7
4
votes
1 answer
Why are policy gradient methods more effective in high-dimensional action spaces?
David Silver argues, in his Reinforcement Learning course, that policy-based reinforcement learning (RL) is more effective than value-based RL in high-dimensional action spaces. He points out that the implicit policy (e.g., $\epsilon$-greedy) in…
Saucy Goat
- 153
- 5
4
votes
1 answer
What is the advantage of using MCTS with value based methods over value based methods only?
I have been trying to understand why MCTS is very important to the performance of RL agents, and the best description I found was from the paper Bootstrapping from Game Tree Search stating:
Deterministic, two-player games such as chess provide an…
Hossam
- 43
- 3
3
votes
1 answer
Is it possible for value-based methods to learn stochastic policies?
Is it possible for value-based methods to learn stochastic policies? I'm trying to get a clear picture of the different categories for RL algorithms, and while doing so I started to think about settings where the optimal policy is stochastic…
Krrrl
- 221
- 1
- 10
2
votes
0 answers
What kind of reinforcement learning method does AlphaGo Deepmind use to beat the best human Go player?
In reinforcement learning, there are model-based versus model-free methods. Within model-based ones, there are policy-based and value-based methods.
AlphaGo Deepmind RL model has beaten the best Go human player. What kind of reinforcement model does…
user781486
- 201
- 2
- 5
1
vote
0 answers
Is it possible to combine two policy-based RL agents?
I am developing an RL agent for a game environment. I have found out that there are two strategies to do well in the game. So I have trained two RL agents using neural networks with distinct reward functions. Each reward function corresponds to one…
BlackBrain
- 111
- 2
1
vote
1 answer
Why do we need to have two heads in D3QN to obtain value and advantage separately, if V is the average of Q values?
I have two questions on the Dueling DQN paper. First, I have an issue on understanding the identifiability that Dueling DQN paper mentions:
Here is my question: If we have given Q-values $Q(s, a; \theta)$ for all actions, I assume we can get value…
Afshin Oroojlooy
- 175
- 1
- 7
1
vote
0 answers
What are the disadvantages of actor-only methods with respect to value-based ones?
While the advantages of actor-only algorithms, the ones that search directly the policy without the use of the value function, are clear (possibility of having a continuous action space, a stochastic policy, etc.), I can't figure out the…
unter_983
- 331
- 1
- 7
1
vote
0 answers
Are policy-based methods better than value-based methods only for large action spaces?
In different books on reinforcement learning, policy-based methods are motivated by their ability to handle large (continuous) action spaces. Is this the only motivation for the policy-based methods? What if the action space is tiny (say, only 9…
tmaric
- 402
- 3
- 14
0
votes
1 answer
Can Q-learning rewards and next states be non-deterministic?
I am working in a team to develop a Q-learning based approach for hyperparameter tuning. I have a disagreement with one of my teammates on how they defined this problem. They defined it as follows:
The states are the values of the metric we are…
Ahmed Mokhtar
- 1
- 1