I was reading the article on medium, where the author cites this equation for Variational Inference: \begin{align*} \text{KL}(q(z|x^{(i)})||p(z|x^{(i)})) &= \int_z q(z|x^{(i)})\text{log}\frac{q(z|x^{(i)})}{p(z|x^{(i)})} dz \\ &= \mathbb{E}_{||}[\text{log}(q(z|x^{(i)}))] - \mathbb{E}_{||}[\text{log}(p(z|x^{(i)}))]\\ &= \mathbb{E}_{||}[\text{log}(q(z|x^{(i)}))] - \mathbb{E}_{||}[\text{log}(p(x^{(i)}, z))] + \mathbb{E}_q[\text{log}(p(x^{(i)}))]\\ &= \mathbb{E}_{||}[\text{log}(q(z|x^{(i)}))] - \mathbb{E}_{||}[\text{log}(p(x^{(i)}, z))] + \text{log}(p(x^{(i)}))\\ &= -\text{ELBO} + \text{log}(p(x^{(i)}))\\ \end{align*}
I understand all of the math behind this equation, but I was wondering what is the underlying intuition behind each of the terms in this equation (KL divergence, ELBO, and logp(x))?
The author claims that $\text{log} p(x)$ is a constant in this equation and I'm having a hard time understanding why. Is $p(x)$ considered to be the theoretical data generating distribution which created our $x$'s and not the model that we are training?