4

As you know, PCA usually utilizes explained variance ratio in order to choose the dimension of reduction. I am currently thinking of using VAE as a method to reduce the dimension of my data. I am having a hard time choosing the dimensions for the model.

I was wondering if there is a criteria to choose the appropriate dimension for a latent space?

Thank you for your advice in advance

1 Answers1

2

VAE's latent space is learned as part of the model and its dimension is a hyperparameter you must tune based on several factors.

Empirically you can first use PCA on the data with, say, 95% explained variance to find a heuristic intrinsic dimension as a starting point. Then try grid search to train multiple VAEs with varying latent dimensions around the heuristic and evaluate their performances on an independent validation set or cross-validation when you have limited data and choose the one that provides a good tradeoff between low reconstruction error and a suitable KL divergence. For example, a significant drop in MSE for continuous data like images with increased dimensions suggests ongoing underfitting; stabilization indicates sufficiency. And high KL values may signal inadequate number of dimensions for latent capacity, while very low values suggest redundancy and posterior collapse. In summary a plateau in ELBO improvement suggests diminishing returns of latent dimensions.

If you plan to use the trained VAE for a specific downstream task such as clustering or classification, weigh the latent dimension that maximizes performance on that task in your final decision besides above empirical hyperparameter tuning.

In fact you can dynamically increase the latent dimension during training as discussed in $β$-VAE that aim for disentangled representations as a variant of VAE.

Taking a rate-distortion theory perspective, we show the circumstances under which representations aligned with the underlying generative factors of variation of data emerge when optimising the modified ELBO bound in β-VAE, as training progresses. From these insights, we propose a modification to the training regime of β-VAE, that progressively increases the information capacity of the latent code during training. This modification facilitates the robust learning of disentangled representations in β-VAE, without the previous trade-off in reconstruction accuracy.

cinch
  • 11,000
  • 3
  • 8
  • 17