In the context of variational encoders, even after the model is trained, people seem to shy away from trying to compute $$ p_{\theta}(x) = \int p_{\theta}(x|z)p(z)dz $$
I understand that this can pose significant problems especially when z is high dimensional (though I have even seen people not wanting to compute this quantity when z is two dimensional). So instead when people need to do stuff like model comparison via log likelihood ratio, they instead use upper and lower bounds (ELBO, EUBO, CUBO etc). What confuses me is why are these easier to calculate? Lets take for instance ELBO, even the simplified form from the original paper looks like
$$ ELBO(\theta, \phi) = E_{q_{\phi}(.|x)}[(x - D_{\theta}(z))^2] ~+~ ... $$
Which means it involves and integral using the conditional distribution $q_{\phi}(z|x)$. In fact people generally write the numerical form elbo which does a Monte Carlo integration with this conditional distribution via sampling $z = \mu_{\phi}(x) + \epsilon \sigma_{\phi}(x)$. This is how you would numerically compute $p_{\theta}(x)$. One might even write it as the expectation $E_{p(z)}(p_{\theta}(x|z))$ and I can not think of any reason why the density or the integrand here is simpler than the one in ELBO. Infact ELBO's density would involve evaluating the encoder part of the network for generating each sample where as for p(x) you just sample from a normal distribution p(z).
1- How is the Monte Carlo integration required to compute the ELBO simpler than that required for $p_{\theta}(x)$
2- Given than z are generally lower dimensional representations, is it still intractable to compute $p_{\theta}(x)$ via Monte Carlo sampling when say z is 2 dimensional?
I have seen some other topics on this but non seems to have received an answer that explains why the expectation is easier to calculated.
Why is the variational lower bound is easier to compute than the original marginal distribution?