VAE's latent space is learned as part of the model and its dimension is a hyperparameter you must tune based on several factors.
Empirically you can first use PCA on the data with, say, 95% explained variance to find a heuristic intrinsic dimension as a starting point. Then try grid search to train multiple VAEs with varying latent dimensions around the heuristic and evaluate their performances on an independent validation set or cross-validation when you have limited data and choose the one that provides a good tradeoff between low reconstruction error and a suitable KL divergence. For example, a significant drop in MSE for continuous data like images with increased dimensions suggests ongoing underfitting; stabilization indicates sufficiency. And high KL values may signal inadequate number of dimensions for latent capacity, while very low values suggest redundancy and posterior collapse. In summary a plateau in ELBO improvement suggests diminishing returns of latent dimensions.
If you plan to use the trained VAE for a specific downstream task such as clustering or classification, weigh the latent dimension that maximizes performance on that task in your final decision besides above empirical hyperparameter tuning.
In fact you can dynamically increase the latent dimension during training as discussed in $β$-VAE that aim for disentangled representations as a variant of VAE.
Taking a rate-distortion theory
perspective, we show the circumstances under which representations aligned with
the underlying generative factors of variation of data emerge when optimising the
modified ELBO bound in β-VAE, as training progresses. From these insights,
we propose a modification to the training regime of β-VAE, that progressively
increases the information capacity of the latent code during training. This modification facilitates the robust learning of disentangled representations in β-VAE, without the previous trade-off in reconstruction accuracy.