Questions tagged [variational-inference]

For questions related to variational inference (VI), an optimization-based approach to the inference problem (i.e. the computation of the posterior given the prior, likelihood, and marginal). VI is used, for example, in the context of auto-encoders (VAEs) and Bayesian neural networks (BNNs).

For more info, you could read the paper Variational Inference: A Review for Statisticians (2018) by David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe.

12 questions
5
votes
1 answer

What is the intuition behind variational inference for Bayesian neural networks?

I'm trying to understand the concept of Variational Inference for BNNs. My source is this work. The aim is to minimize the divergence between the approx. distribution and the true posterior $$\text{KL}(q_{\theta}(w)||p(w|D) = \int q_{\theta}(w) \…
4
votes
1 answer

How does the VAE learn a joint distribution?

I found the following paragraph from An Introduction to Variational Autoencoders sounds relevant, but I am not fully understanding it. A VAE learns stochastic mappings between an observed $\mathbf{x}$-space, whose empirical distribution…
3
votes
1 answer

Why don't we also need to approximate $p(x \mid z)$ in the VAE?

In the VAE, we approximate the probability distribution $p(z \mid x)$, where $z$ is the latent vector and $x$ is our data. The reason is that $p(z \mid x)$ becomes impossible to calculate for continuous data because of $p(x)$, which require…
3
votes
1 answer

What does the approximate posterior on latent variables, $q_\phi(z|x)$, tend to when optimising VAE's

The ELBO objective is described as follows $$ ELBO(\phi,\theta) = E_{q_\phi(z|x)}[log p_\theta (x|z)] - KL[q_\phi (z|x)||p(z)] $$ This form of ELBO includes a regularisation term in the form of the KL divergence which drives $q_\phi(z|x)…
2
votes
2 answers

What is the meaning of log p(x) in VAE math and why is it constant

I was reading the article on medium, where the author cites this equation for Variational Inference: \begin{align*} \text{KL}(q(z|x^{(i)})||p(z|x^{(i)})) &= \int_z q(z|x^{(i)})\text{log}\frac{q(z|x^{(i)})}{p(z|x^{(i)})} dz \\ &=…
2
votes
1 answer

Why do we use $q_{\phi}(z \mid x^{(i)})$ in the objective function of amortized variational inference, while sometimes we use $q(z)$?

In page 21 here, it states: General Idea of Amortization: if same inference problem needs to be solved many times, can we parameterize a neural network to solve it? Our case: for all $x^{(i)}$ we want to solve: $$ \min _{q(z)} \mathrm{KL}\left(q(z)…
1
vote
1 answer

How does using the ELBO in VAEs make the problem tractable?

I'm studying Variational Autoencoders and a lot of the literature says that the posterior is intractable because the marginal distribution p(x) is intractable since the space of z is so large we cannot possibly integrate over it all. So to avoid…
1
vote
1 answer

If we know the joint distribution, can we simply derive the evidence from it?

I'm struggling to understand one specific part of the formalism of the free energy principle. My understanding is that the free energy principle can be derived from considering statistical dynamics of a system that is coupled with its environment in…
1
vote
2 answers

Why optimise log p(x) rather than log p(x|z) in a Variational AutoEncoder?

Background The loss function in a Variational AutoEncoder is the Evidence Lower Bound (ELBO): $\mathbb{E}_q[log$ $p(x|z)] - KL[q(z)||p(z)]$ And has this inequality: $log$ $p(x) \ge \mathbb{E}_q[log$ $p(x|z)] - KL[q(z)||p(z)]$ It is said in the…
1
vote
1 answer

Do we use two distinct layers to compute the mean and variance of a Gaussian encoder/decoder in the VAE?

I am looking at appendix C of the VAE paper: It says: C.1 Bernoulli MLP as decoder In this case let $p_{\boldsymbol{\theta}}(\mathbf{x} \mid \mathbf{z})$ be a multivariate Bernoulli whose probabilities are computed from $\mathrm{z}$ with a…
0
votes
0 answers

Particle filtering versus variational inference in dynamic bayesian networks

I am looking at a paper that references dynamic bayesian networks--the simplest case being a hidden markov model. The author uses a particle filter to model the posterior distribution for the current state, given the sensor observations. So I…
0
votes
1 answer

Why isn't the evidence $p(x) = 1$ if it's an observed variable?

Every explanation of variational inference starts with the same basic premise: given an observed variable $x$, and a latent variable $z$, $$ p(z|x)=\frac{p(x,z)}{p(x)} $$ and then proceeds to expand $p(x)$ as an expectation over $z$: $$ p(x) =…