17

Q-learning uses a table to store all state-action pairs. Q-learning is a model-free RL algorithm, so how could there be the one called Deep Q-learning, as deep means using DNN; or maybe the state-action table (Q-table) is still there but the DNN is only for input reception (e.g. turning images into vectors)?

Deep Q-network seems to be only the DNN part of the Deep Q-learning program, and Q-network seems the short for Deep Q-network.

Q-learning, Deep Q-learning, and Deep Q-network, what are the differences? May be there a comparison table between these 3 terms?

nbro
  • 42,615
  • 12
  • 119
  • 217
Dan D
  • 1,318
  • 1
  • 14
  • 39

3 Answers3

13

Here is a table that attempts to systematically show the differences between tabular Q-learning (TQL), deep Q-learning (DQL), and deep Q-network (DQN).

Tabular Q-learning (TQL) Deep Q-learning (DQL) Deep Q-network (DQN)
Is it an RL algorithm? Yes Yes No (unless you use DQN to refer to DQL, which is done often!)
Does it use neural networks? No. It uses a table. Yes No. DQN is the neural network.
Is it a model? No No Yes (but usually not in the RL sense)
Can it deal with continuous state spaces? No (unless you discretize them) Yes Yes (in the sense that it can get real-valued inputs for the states)
Can it deal with continuous action spaces? Yes (but maybe not a good idea) Yes (but maybe not a good idea) Yes (but only the sense that it can produce real-valued outputs for actions).
Does it converge? Yes Not necessarily Not necessarily
Is it an online learning algorithm? Yes No, if you use experience replay No, but it can be used in an online learning setting
nbro
  • 42,615
  • 12
  • 119
  • 217
10

In Q-learning (and in general value based reinforcement learning) we are typically interested in learning a Q-function, $Q(s, a)$. This is defined as $$Q(s, a) = \mathbb{E}_\pi\left[ G_t | S_t = s, A_t = a \right]\;.$$

For tabular Q-learning, where you have a finite state and action space you can maintain a table lookup that maintains your current estimate of the Q-value. Note that in practice even the spaces being finite might not be enough to not use function approximation, if e.g. your state space contains a large number, say $10^{10000}$, of states, then it might not be manageable to maintain a separate Q-function for each state-action pair

When you have an infinite state space (and/or action space) then it becomes impossible to use a table, and so you need to use function approximation to generalise across states. This is typically done using a deep neural network due to their expressive power -- aka, a deep Q-network (DQN). As a technical aside, the Q-networks don't usually take state and action as input, but take in a representation of the state (e.g. a $d$-dimensional vector, or an image) and output a real valued vector of size $|\mathcal{A}|$, where $\mathcal{A}$ is the action space.

Now, it seems in your question that you're confused as to why you use a model (the neural network) when Q-learning is, as you rightly say, model-free. The answer here is that when we talk about RL algorithms being model-free we are not talking about how the value-functions or policy are parameterised, we are actually talking about whether the algorithms use a model of the transition dynamics to help with their learning. That is, a model free algorithm doesn't use any knowledge about $p(s' | s, a)$ (other than implicitly through repeated interactions with the environment), whereas model-based methods look to use this transition function explicitly - it can be known exactly such as in Atari environments, or it can approximated/learnt - to perform planning with the dynamics, or generate artificial data to learn from, etc.

David
  • 5,100
  • 1
  • 11
  • 33
2

Q-learning Q-learning is a basic reinforcement learning algorithm. It uses a Q-table to store and update the value of each state-action pair. The algorithm updates the Q-values using the Bellman equation based on the reward received and the estimated value of the next state. It's model-free, meaning it doesn't require a model of the environment and learns from interactions.

Deep Q-learning Deep Q-learning is an extension of Q-learning that uses a deep neural network (DNN) instead of a Q-table. The Q-table becomes impractical in environments with large or continuous state spaces because it would need to store a Q-value for every possible state-action pair. Deep Q-learning uses a DNN to approximate the Q-values, allowing it to handle more complex environments.

Deep Q-network (DQN) Deep Q-network (DQN) refers to the specific neural network architecture used in Deep Q-learning to approximate the Q-values. The DQN receives the state (often processed by convolutional layers in the case of image inputs) and outputs Q-values for all possible actions in that state.