What is the advantage of using a VAE over a deterministic auto-encoder?

Question

What is the advantage of using a VAE over a deterministic auto-encoder?

For example, assuming we have just 2 labels, a deterministic auto-encoder will always map a given image to the same latent vector. However, one expects that after the training, the 2 classes will form separate clusters in the encoder space.

In the case of the VAE, an image is mapped to an encoding vector probabilistically. However, one still ends up with 2 separate clusters. Now, if one passes a new image (at the test time), in both cases the network should be able to place that new image in one of the 2 clusters.

How are these 2 clusters created using the VAE better than the ones from the deterministic case?

nbro · Answer 1 · 2022-06-04T14:06:46.513

It seems that you think that we want to perform classification with VAEs or that images that we pass to the encoder fall into more than one category. The other answer already points out that VAEs are not typically used for classification but for generation tasks, so let me try to answer the main question.

The variational auto-encoder (VAE) and the (deterministic) auto-encoder both have an encoder and a decoder and they both convert the inputs to a latent representation, but their inner workings are different: a VAE is a generative statistical model, while the AE can be viewed just as a data compressor (and decompressor).

In an AE, given an input $\mathbf{x}$ (e.g. an image), the encoder produces one latent vector $\mathbf{z_x}$, which can be decoded into $\mathbf{\hat{x}}$ (another image which should be similar or related to $\mathbf{x}$). Compactly, this can be presented as $\mathbf{\hat{x}}=f(\mathbf{z_x}=g(\mathbf{x}))$, where $g$ is the encoder and $f$ is the decoder. This operation is deterministic: so, given the same $\mathbf{x}$, the same $\mathbf{z_x}$ and $\mathbf{\hat{x}}$ are produced.

In a VAE, given an input $\mathbf{x} \in X$ (e.g. an image), more than one latent vector, $\mathbf{z_{x}}^i \in Z$, can be produced, because the encoder attempts to learn the probability distribution $q_\phi(z \mid x)$, which can be e.g. $\mathcal{N}(\mu, \sigma)$, which we can sample from, where $\mu, \sigma = g_\theta(\mathbf{x})$. In practice, $g_\theta$ is a neural network with weights $\phi$. We can sample latent vectors $\mathbf{z_{x}}^i$ from $\mathcal{N}(\mu, \sigma)$, which should be "good" representations of a given $\mathbf{x}$.

Why is it useful to learn $q_\phi(z \mid x)$? There are many uses cases. For example, given multiple corrupted/noisy versions of an image, you can reconstruct the original uncorrupted image. However, note that you can use the AE also for denoising. Here you have a TensorFlow example that illustrates this. The difference is is that, again, given the same noisy image, the model will always produce the same reconstructed image. You can also use the VAE for drug design [1]. See also this post.

score 0 · Answer 2 · answered May 24 '19 at 06:33

VAE's are not used for classification. They are used for inference or as Generative Models, while AE's can be used as data re-constructors (as you described above), de-noisers, classifiers. So the difference is generation of new data vs re-construction of data.

VAE's map the inputs to a hidden space, where each variable is enforced to have probability distribution which is given by $N(0,1)$ i.e. the standard Normal Distribution. Once we have trained a VAE we will now, use only the decoder part to generate new models.

Example:

Source: Stanford University CS231n slides

Assume there is an x_axis and a y_axis on the bottom and the left. Let the x_axis represent $x_1$ and y_axis $x_2$ which are our hidden variables. By varying $x_1$ and $x_2$ you can see what happens. Increasing $x_1$ changes face angle while increasing $x_2$ changes eye droop. Thus we can generate new data by varying the features in a latent representation.

For better understanding I highly recommend you check out these links:

Variational autoencoders.

Variational Autoencoders - Brian Keng

VAE - Ali Ghodsi

Generative Models - CS231n