1

The E step on the EM algorithm asks us to set the value of the variational lower bound to be equal to the posterior probability of the latent variable, given the data points and parameters. Clearly we are not taking any expectations here, then why is it called the Expectation step? Am I missing something here?

1 Answers1

1

In expectation step, firstly we calculate the posterior of latent variable $Z$ and then the $Q(θ | θ^{(t)})$ is defined as the expected value of the log likelihood of $θ$, with respect to the current conditional contribution of $Z$ given $X$ and the current estimates of $θ^{(t)}$. In maximization step, we update $θ$ using the argmax on $Q$, with respect to $θ$.

$$Q(θ | θ^{(t)}) = E_{Z|X,θ^{(t)}}[logL(θ;Χ,Z)]$$

To be more intuitive, think of k-means as a special case of EM, where in expectation step the $Z$ variables are defined, that is the latent variables indicating the membership in a cluster, and calculated in a hard assignment way. In maximization step the $μ$s of the clusters are updated. If you want to see the corresponding relation for $Q$ in k-means, I suggest you read the chapter 9.3.2 in C.Bishop's book: Pattern Recognition and Machine Learning.

ddaedalus
  • 947
  • 1
  • 7
  • 21