Questions tagged [q-learning]

For questions related to the Q-learning algorithm, which is a model-free and temporal-difference reinforcement learning algorithm that attempts to approximate the Q function, which is a function that, given a state s and an action a, returns a real number that represents the return (or value) of state s when action a is taken from s. Q-learning was introduced in the PhD thesis "Learning from Delayed Rewards" (1989) by Watkins.

For more info, see e.g. the book Reinforcement Learning: An Introduction (2nd edition) by Sutton and Barto. See also the related Wikipedia article or e.g. http://artint.info/html/ArtInt_265.html

401 questions

votes

2 answers

What is the relation between Q-learning and policy gradients methods?

As far as I understand, Q-learning and policy gradients (PG) are the two major approaches used to solve RL problems. While Q-learning aims to predict the reward of a certain action taken in a certain state, policy gradients directly predict the…

asked Apr 28 '18 at 03:11

Tejas Ramdas

votes

2 answers

Can Q-learning be used for continuous (state or action) spaces?

Many examples work with a table-based method for Q-learning. This may be suitable for a discrete state (observation) or action space, like a robot in a grid world, but is there a way to use Q-learning for continuous spaces like the control of a…

reinforcement-learning q-learning dqn continuous-action-spaces continuous-state-spaces

asked May 11 '19 at 11:11

Bryan McGill

votes

3 answers

Why doesn't Q-learning converge when using function approximation?

The tabular Q-learning algorithm is guaranteed to find the optimal $Q$ function, $Q^*$, provided the following conditions (the Robbins-Monro conditions) regarding the learning rate are satisfied $\sum_{t} \alpha_t(s, a) = \infty$ $\sum_{t}…

reinforcement-learning q-learning deep-rl proofs function-approximation

asked Apr 05 '19 at 18:23

nbro

42,615
12
119
217

votes

2 answers

Why does DQN require two different networks?

I was going through this implementation of DQN and I see that on line 124 and 125 two different Q networks have been initialized. From my understanding, I think one network predicts the appropriate action and the second network predicts the target Q…

reinforcement-learning deep-rl q-learning dqn target-network

asked Jul 02 '18 at 07:47

amitection

votes

3 answers

What is the difference between Q-learning, Deep Q-learning and Deep Q-network?

Q-learning uses a table to store all state-action pairs. Q-learning is a model-free RL algorithm, so how could there be the one called Deep Q-learning, as deep means using DNN; or maybe the state-action table (Q-table) is still there but the DNN is…

reinforcement-learning comparison q-learning dqn deep-rl

asked Jan 22 '21 at 09:41

Dan D

1,318
1
14
39

votes

1 answer

What exactly is the advantage of double DQN over DQN?

I started looking into the double DQN (DDQN). Apparently, the difference between DDQN and DQN is that in DDQN we use the main value network for action selection and the target network for outputting the Q values. However, I don't understand why…

comparison q-learning dqn deep-rl double-dqn

asked Jul 30 '20 at 19:40

Chukwudi

votes

2 answers

How do we prove the n-step return error reduction property?

In section 7.1 (about the n-step bootstrapping) of the book Reinforcement Learning: An Introduction (2nd edition), by Andrew Barto and Richard S. Sutton, the authors write about what they call the "n-step return error reduction property": But they…

reinforcement-learning q-learning math proofs sutton-barto

asked Dec 08 '18 at 05:24

123learn

votes

1 answer

How does Q-learning work in stochastic environments?

The Q function uses the (current and future) states to determine the action that gets the highest reward. However, in a stochastic environment, the current action (at the current state) does not determine the next state. How does Q learning handle…

reinforcement-learning q-learning environment

asked Mar 29 '18 at 09:57

redlum

votes

2 answers

Are Q-learning and SARSA the same when action selection is greedy?

I'm currently studying reinforcement learning and I'm having difficulties with question 6.12 in Sutton and Barto's book. Suppose action selection is greedy. Is Q-learning then exactly the same algorithm as SARSA? Will they make exactly the same…

reinforcement-learning comparison q-learning sarsa greedy-policy

asked May 10 '20 at 10:52

hyuj

votes

1 answer

Can Q-learning be used in a POMDP?

Can Q-learning (and SARSA) be directly used in a Partially Observable Markov Decision Process (POMDP)? If not, why not? My intuition is that the policies learned will be terrible because of partial observability. Are there ways to transform these…

reinforcement-learning q-learning pomdp markov-decision-process sarsa

asked Apr 03 '19 at 02:40

drerD

votes

1 answer

What are other ways of handling invalid actions in scenarios where all rewards are either 0 (best reward) or negative?

I created an OpenAI Gym environment, and I would like to check the performance of the agent from OpenAI Baselines DQN approach on it. In my environment, the best possible outcome for the agent is 0 - the robot needs zero non-necessary resources to…

reinforcement-learning q-learning dqn implementation reward-functions

asked May 29 '17 at 09:02

AlexGuevara

votes

3 answers

What is the target Q-value in DQNs?

I understand that in DQNs, the loss is measured by taking the MSE of outputted Q-values and target Q-values. What does the target Q-values represent? And how is it obtained/calculated by the DQN?

reinforcement-learning q-learning dqn deep-rl value-functions

asked Apr 19 '20 at 03:25

BG10

votes

1 answer

Does AlphaZero use Q-Learning?

I was reading the AlphaZero paper Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, and it seems they don't mention Q-Learning anywhere. So does AZ use Q-Learning on the results of self-play or just a Supervised…

reinforcement-learning q-learning monte-carlo-tree-search supervised-learning alphazero

asked Jul 01 '19 at 17:02

Avetik

votes

3 answers

How should I model all available actions of a chess game in deep Q-learning?

I just read about deep Q-learning, which is using a neural network for the value function instead of a table. I saw the example here: Using Keras and Deep Q-Network to Play FlappyBird and he used a CNN to get the Q-value. My confusion is on the last…

neural-networks reinforcement-learning q-learning dqn deep-rl

asked Jun 28 '18 at 08:19

malioboro

2,859
3
23
47

votes

2 answers

Deep Q-Learning "catastrophic drop" reasons?

I am implementing some "classical" papers in Model Free RL like DQN, Double DQN, and Double DQN with Prioritized Replay. Through the various models im running on CartPole-v1 using the same underlying NN, I am noticing all of the above 3 exhibit a…

q-learning deep-rl gym double-q-learning

asked Jun 03 '21 at 14:33

Virus

2 3

…

26 27 Next