How does using the ELBO in VAEs make the problem tractable?

Question

I'm studying Variational Autoencoders and a lot of the literature says that the posterior is intractable because the marginal distribution p(x) is intractable since the space of z is so large we cannot possibly integrate over it all. So to avoid this they create a lower bound on the log likelihood, the ELBO, which they then try to maximize. The term for the ELBO is:

$$E_q[log~ p(z,x)]-E_q[log~q(z)]$$

What I am trying to understand is how is this now tractable. The expectations in the ELBO are still over the distribution of q. Take the first term for example:

$$E_q[log~ p(z,x)]=\int_{}^{}q(z)~log~p(z,x)dz$$

Is this not still an integral over all z values? How did we make this problem any more tractable by finding the ELBO?

Additional Question: Also another thing I was confused about is we always say the posterior p(z|x) is not computable because we don't have p(x), but how exactly do we have the numerator, p(x,z). $$p(z|x)=\frac{p(x,z)}{p(x)}$$

Is this because we assume a prior, and then also assume that we can model p(x|z) with a decoder?

score 0 · Answer 1 · answered Feb 19 '24 at 05:40

Though the posterior $p(z|x)$ is intractable, we can still compute $p(x,z)=p(z)p(x|z)$ using conditional probability. As you claimed correctly $p(z)$ is assumed a simple and known prior distribution such as a standard normal distribution. Depending on the underlying generative process $p(x|z)$ is often assumed to be a multivariate Gaussian distribution with a diagonal covariance matrix and is typically implemented using a neural network known as decoder which takes the latent variables $z$ as input and outputs the parameters $\theta$ of $p(x|z)$.

Your ELBO seems not exactly correct compared to the reference: $${L_{\theta ,\phi }(x):=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]=\ln p_{\theta }(x)-D_{KL}(q_{\phi }({\cdot |x})\parallel p_{\theta }({\cdot |x}))}$$

The form given is not very convenient for maximization, but the following, equivalent form, is: $${L_{\theta ,\phi }(x)=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln p_{\theta }(x|z)\right]-D_{KL}(q_{\phi }({\cdot |x})\parallel p_{\theta }(\cdot ))}$$ where ${\ln p_{\theta }(x|z)}$ is implemented as ${-{\frac {1}{2}}\|x-D_{\theta }(z)\|_{2}^{2}}$, since that is, up to an additive constant, what ${x\sim {\mathcal {N}}(D_{\theta }(z),I)}$ yields. That is, we model the distribution of $x$ conditional on $z$ to be a Gaussian distribution centered on ${D_{\theta }(z)}$.

Since distribution of the approximate posterior ${q_{\phi }(z|x)}$ and the prior ${p_{\theta }(z)}$ are often chosen to be Gaussians, ELBO in the reference is finally tractable as $${L_{\theta ,\phi }(x)=-{\frac {1}{2}}\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\|x-D_{\theta }(z)\|_{2}^{2}\right]-{\frac {1}{2}}\left(N\sigma _{\phi }(x)^{2}+\|E_{\phi }(x)\|_{2}^{2}-2N\ln \sigma _{\phi }(x)\right)+Const}$$

Here $N$ is the dimension of $z$.

How does using the ELBO in VAEs make the problem tractable?

1 Answers1

Linked