Questions tagged [double-dqn]

For questions about the double DQN model introduced in the paper "Deep Reinforcement Learning with Double Q-learning" (2015) by Hado van Hasselt et al.

23 questions
12
votes
1 answer

What exactly is the advantage of double DQN over DQN?

I started looking into the double DQN (DDQN). Apparently, the difference between DDQN and DQN is that in DDQN we use the main value network for action selection and the target network for outputting the Q values. However, I don't understand why…
Chukwudi
  • 369
  • 2
  • 8
8
votes
2 answers

Can DQN perform better than Double DQN?

I'm training both DQN and double DQN in the same environment, but DQN performs significantly better than double DQN. As I've seen in the double DQN paper, double DQN should perform better than DQN. Am I doing something wrong or is it possible?
Angelo
  • 211
  • 2
  • 17
5
votes
1 answer

Why does regular Q-learning (and DQN) overestimate the Q values?

The motivation for the introduction of double DQN (and double Q-learning) is that the regular Q-learning (or DQN) can overestimate the Q value, but is there a brief explanation as to why it is overestimated?
4
votes
1 answer

How can I ensure convergence of DDQN, if the true Q-values for different actions in the same state are very close?

I am applying a Double DQN algorithm to a highly stochastic environment where some of the actions in the agent's action space have very similar "true" Q-values (i.e. the expected future reward from either of these actions in the current state is…
4
votes
1 answer

Finding the true Q-values in gymnaiusm

I'm very interested in the true Q-values of state-action pairs in the classic control environments in gymnasium. Contrary to the usual goal, the ordering of the Q-values itself is irrelevant; a very close to accurate estimation of the Q-values is…
Mark B
  • 43
  • 3
3
votes
1 answer

Does the DoubleDQN algorithm use a target network or two separate policies?

I've been looking for ways to improve my DQN. That is when I found the Double DQN algorithm. After looking at explanatory videos and posts, I've seen conflicting information: The Double DQN algorithm has two separate policies Q1 and Q2 with…
3
votes
1 answer

Why do we minimise the loss between the target Q values and 'local' Q values?

I have a question regarding the loss function of target networks and current (online) networks. I understand the action value function. What I am unsure about is why we seek to minimise the loss between the qVal for the next state in our target…
user9317212
  • 181
  • 2
  • 13
3
votes
1 answer

How to compute the target for double Q-learning update step?

I've already read the original paper about double DQN but I do not find a clear and practical explanation of how the target $y$ is computed, so here's how I interpreted the method (let's say I have 3 possible actions (1,2,3)): For each experience…
2
votes
0 answers

Update Rule with Deep Q-Learning (DQN) for 2-player games

I am wondering how to correctly implement the DQN algorithm for two-player games such as Tic Tac Toe and Connect 4. While my algorithm is mastering Tic Tac Toe relatively quickly, I cannot get great results for Connect 4. The agent is learning to…
2
votes
0 answers

Can DQN outperform DoubleDQN?

I found a similar post about this issue, but unfortunately I did not find a proper answer. Are there any references where DQN is better than DoubleDQN, that is DoubleDQN does not improve DQN ?
2
votes
1 answer

How does the target network in double DQNs find the maximum Q value for each action?

I understand the fact that the neural network is used to take the states as inputs and it outputs the Q-value for state-action pairs. However, in order to compute this and update its weights, we need to calculate the maximum Q-value for the next…
1
vote
0 answers

Resulting quantiles from Quantile Regression DQN

In my QR-DQN application, the resulting quantiles for a state s and action a take the form of the blue line in the figure. The method works well in expected values and trains effectively. However, I know that in my problem the return distribution…
1
vote
0 answers

Why slow-changing policy invalidates Double DQN approach in TD3 paper?

In the paper describing TD3 (https://arxiv.org/abs/1802.09477), the authors say that they could not effectively address the Q-learning overestimation bias by using different networks for maximizing and estimating the next state Q value when…
1
vote
0 answers

DDQN Snake keeps crashing into the wall

Edit: I managed to fix this by changing the optimizer to SGD. I am very new to reinforcement learning, and I attempted to create a DDQN for the game snake but for some reason it keeps learning to crash into the wall. I've tried changing the…
1
2