Questions tagged [double-q-learning]

For questions related to the (tabular) version of the double Q-learning algorithm, which was introduced in "Double Q-learning" (2010, NeurIPS) by Hado Hasselt.

9 questions
8
votes
2 answers

Deep Q-Learning "catastrophic drop" reasons?

I am implementing some "classical" papers in Model Free RL like DQN, Double DQN, and Double DQN with Prioritized Replay. Through the various models im running on CartPole-v1 using the same underlying NN, I am noticing all of the above 3 exhibit a…
Virus
  • 81
  • 1
  • 5
5
votes
2 answers

Does value iteration still return the true Q-values in stochastic environment?

I'm working with the FrozenLake environment (8x8) from Gymnasium. In the deterministic case (is_slippery=False), I understand that using value iteration can converge to the true Q-values, since the environment is fully observable and transitions are…
5
votes
1 answer

Why does regular Q-learning (and DQN) overestimate the Q values?

The motivation for the introduction of double DQN (and double Q-learning) is that the regular Q-learning (or DQN) can overestimate the Q value, but is there a brief explanation as to why it is overestimated?
2
votes
2 answers

How to embed game grid state with walls as an input to neural network

I've read most of the posts on here regarding this subject, however most of them deal with gameboards where there are two different categories of single pieces on a board without walls etc. My game board has walls, and multiple instances of food.…
1
vote
1 answer

How is estimation bias quantified in reinforcement learning?

In various estimation problems, especially in RL domains, where we currently looking into Q-learning and its variants, we often encounter the term estimation bias, which refers to the systematic deviation of an estimator’s expected value from the…
1
vote
1 answer

Q learning achieves small reward in simple dice game

I am trying to train a Q learning agent on the following game: The states are parametrised by an integer $S \geq 0$ (representing the sum of the previous die rolls). In each step the player can choose to roll a die or quit the game. Whenever the…
deepfloe
  • 111
  • 2
0
votes
0 answers

What are the update equations for Double Expected Sarsa with an $\epsilon$-greedy target policy?

This is question 6.13 in Sutton-Barto,page 136. What are the update equations for Double Expected Sarsa with an $\epsilon$-greedy target policy? The answer is given as follows: Let $Q_1$ and $Q_2$ be the two action-value functions and let…
DSPinfinity
  • 1,223
  • 4
  • 10
0
votes
1 answer

Does "number of actions" refer to the number of actions taken or size of the action space?

In the original DDQN article (https://arxiv.org/pdf/1509.06461.pdf,) the phrase "number of actions" is used twice; First, in the following context: Secondly in Theorem 1. I have a hard time understanding the way the phrase is being used or if it…
0
votes
0 answers

Is there any toy example that can exemplify the performance of double Q-learning?

I recently tried to reproduce the results of double Q-learning. However, the results are not satisfying. I have also tried to compare double Q learning with Q-learning in Taxi-v3, FrozenLake without slippery, Roulette-v0, etc. But Q-learning…