6

I'm reading this interesting blog post explaining diffusion probabilistic models and trying to understand the following.

In order to compute the reverse process, we need to consider the posterior distribution $q(\textbf{x}_{t-1} | \textbf{x}_t)$ which is said to be intractable*

because it needs to use the entire dataset and therefore we need to learn a model $p_\theta$ to approximate these conditional probabilities in order to run the reverse diffusion process.

If we use Bayes theorem we have

$$q(\textbf{x}_{t-1} | \textbf{x}_t) = \frac{q(\textbf{x}_t |\textbf{x}_{t-1})q(\textbf{x}_{t-1})}{q(\textbf{x}_t)}$$

I understand that indeed we don't have any prior knowledge of $q(\textbf{x}_{t-1})$ or $q(\textbf{x}_t)$ since this would mean already having the distribution we are trying to estimate. Is this correct?

The above posterior becomes tractable when conditioned on $\textbf{x}_0$ and we obtain

$$q(\textbf{x}_{t-1} | \textbf{x}_t , \textbf{x}_0) = \mathcal{N}(\tilde{\bf{\mu}}(\textbf{x}_t , \textbf{x}_0) \, , \, \tilde{\beta}_t \textbf{I})$$

So, apparently, we obtain a posterior that can be calculated in closed form when we condition on the original data $\textbf{x}_0$. At this point, I don't understand the role of the model $p_\theta$ : why do we need to tune the parameters of a model if we can already obtain our posterior?

nbro
  • 42,615
  • 12
  • 119
  • 217
James Arten
  • 307
  • 1
  • 12

2 Answers2

2

I am also learning diffusion models and would like to give some information.

At this point, I don't understand the role of the model $p_\theta$

To clear a bit: $p_\theta$ is just another annotation for U-net and the role is receiving ($x_t$,$t$) (sometimes also receives classifier $y$) and predicts $x_0$ OR $x_{t-1}$ depending on different papers. So at the end of the day, to synthesize new data, given a noisy (usually Gaussian) image, U-net can iteratively predict $x_0$ better - check out algorithm 2 in the DDPM paper (2020).

Your question about the posterior might be answered in more detail here: Diffusion Models | Paper Explanation | Math Explained - YouTube

Check video time around 18:00 that explains a bit more information regarding $x_0$ guided process in the optimization of lower boundary.

lqi
  • 21
  • 3
0

You do not yet have $\mathbf{x}_0$ during sampling (not training). That's why you need to approximate $q(\mathbf{x}_{t−1}|\mathbf{x}_t, \mathbf{x}_0)$ with $p_{\theta}(\mathbf{x}_{t−1}|\mathbf{x}_t)$ via variational inference such as KL divergence. After training with good data, this should produce an approximation of $\mathbf{x}_0$.

lkjsfkshd
  • 101
  • 2