Questions tagged [deep-rl]

For questions related to deep reinforcement learning (DRL), that is, RL combined with deep learning. More precisely, deep neural networks are used to represent e.g. value functions or policies.

518 questions
26
votes
2 answers

Are there other approaches to deal with variable action spaces?

This question is about Reinforcement Learning and variable action spaces for every/some states. Variable action space Let's say you have an MDP, where the number of actions varies between states (for example like in Figure 1 or Figure 2). We can…
22
votes
3 answers

Why doesn't Q-learning converge when using function approximation?

The tabular Q-learning algorithm is guaranteed to find the optimal $Q$ function, $Q^*$, provided the following conditions (the Robbins-Monro conditions) regarding the learning rate are satisfied $\sum_{t} \alpha_t(s, a) = \infty$ $\sum_{t}…
nbro
  • 42,615
  • 12
  • 119
  • 217
20
votes
1 answer

How does LSTM in deep reinforcement learning differ from experience replay?

In the paper Deep Recurrent Q-Learning for Partially Observable MDPs, the author processed the Atari game frames with an LSTM layer at the end. My questions are: How does this method differ from the experience replay, as they both use past…
19
votes
2 answers

Why does DQN require two different networks?

I was going through this implementation of DQN and I see that on line 124 and 125 two different Q networks have been initialized. From my understanding, I think one network predicts the appropriate action and the second network predicts the target Q…
17
votes
3 answers

What is the difference between Q-learning, Deep Q-learning and Deep Q-network?

Q-learning uses a table to store all state-action pairs. Q-learning is a model-free RL algorithm, so how could there be the one called Deep Q-learning, as deep means using DNN; or maybe the state-action table (Q-table) is still there but the DNN is…
Dan D
  • 1,318
  • 1
  • 14
  • 39
15
votes
2 answers

How large should the replay buffer be?

I'm learning DDPG algorithm by following the following link: Open AI Spinning Up document on DDPG, where it is written In order for the algorithm to have stable behavior, the replay buffer should be large enough to contain a wide range of…
12
votes
1 answer

What exactly is the advantage of double DQN over DQN?

I started looking into the double DQN (DDQN). Apparently, the difference between DDQN and DQN is that in DDQN we use the main value network for action selection and the target network for outputting the Q values. However, I don't understand why…
Chukwudi
  • 369
  • 2
  • 8
10
votes
3 answers

How can you represent the state and action spaces for a card game in the case of a variable number of cards and actions?

I know how a machine can learn to play Atari games (Breakout): Playing Atari with Reinforcement Learning. With the same technique, it is even possible to play FPS games (Doom): Playing FPS Games with Reinforcement Learning. Further studies even…
10
votes
2 answers

Was DeepMind's DQN learning simultaneously all the Atari games?

DeepMind states that its deep Q-network (DQN) was able to continually adapt its behavior while learning to play 49 Atari games. After learning all games with the same neural net, was the agent able to play them all at 'superhuman' levels…
Dion
  • 203
  • 2
  • 6
9
votes
2 answers

What are the biggest barriers to get RL in production?

I am studying the state of the art of Reinforcement Learning, and my point is that we see so many applications in the real world using Supervised and Unsupervised learning algorithms in production, but I don't see the same thing with Reinforcement…
9
votes
3 answers

What is the target Q-value in DQNs?

I understand that in DQNs, the loss is measured by taking the MSE of outputted Q-values and target Q-values. What does the target Q-values represent? And how is it obtained/calculated by the DQN?
8
votes
1 answer

Is Experience Replay like dreaming?

Drawing parallels between Machine Learning techniques and a human brain is a dangerous operation. When it is done successfully, it can be a powerful tool for vulgarisation, but when it is done with no precaution, it can lead to major…
16Aghnar
  • 601
  • 3
  • 11
8
votes
0 answers

Is there a difference in the architecture of deep reinforcement learning when multiple actions are performed instead of a single action?

I've built a deep deterministic policy gradient reinforcement learning agent to be able to handle any games/tasks that have only one action. However, the agent seems to fail horribly when there are two or more actions. I tried to look online for…
Rui Nian
  • 433
  • 4
  • 13
8
votes
3 answers

How should I model all available actions of a chess game in deep Q-learning?

I just read about deep Q-learning, which is using a neural network for the value function instead of a table. I saw the example here: Using Keras and Deep Q-Network to Play FlappyBird and he used a CNN to get the Q-value. My confusion is on the last…
malioboro
  • 2,859
  • 3
  • 23
  • 47
8
votes
2 answers

What is experience replay in laymen's terms?

I've been reading Google's DeepMind Atari paper and I'm trying to understand the concept of "experience replay". Experience replay comes up in a lot of other reinforcement learning papers (particularly, the AlphaGo paper), so I want to understand…
1
2 3
34 35