Currently I'm trying to understand how diffusion models work for a thesis I'm writing. I have an intuition on how it works, but I'm still trying to deepen my understanding. Something I keep stumbling over is a formula(shown below), from this paper, with the usage of a gaussian distribution. $$p(x_t|x_{t-1}) := \mathcal{N}(x_t;\sqrt{1-\beta_t}x_{t-1},\beta_tI)$$ This formula is Formula 6 on page 3.
Now I know the following:\
- meaning of the gaussian distribution.
- meaning of p being a probability function
- $x_t$ is a sample after t diffusion steps
What I lack an understanding of, is what exactly $x_t$ and $p(x_t)$ is. I know it is $x$ at point $t$ in time. What I don't understand is what is $x_t$ and $p(x_t)$ in practice? Is $x_t$ the color values over an image as distribution? Or is $x_t$ the pixel position and $p(x_t)$ is the the color value as a probability or density distribution? But if that were the case, why then is $x$ indexed over time and not $p$?