1

I'm studying the chapter of Information theory from Haykin's deep learning book.

It says Mutual Information between two continuous random variables $X,Y$ is defined in terms of the differential entropies $h(\cdot)$ as $I(X;Y)=h(X)-h(X|Y) = -\int_{-\infty}^{\infty}p_X(x)\log p_X(x) + \int _{-\infty}^{\infty} p_{X|Y}\log p_{X|Y}p(x|y)$.

Also, is it correct to interpret mutual information as a measure of "reduction in uncertainty"? I mean, if the mutual information $I(X;Y)$ is high, then there's gain in information of the random variable $X$ by observing $Y$.

piero
  • 133
  • 5

1 Answers1

1

Yes, you can think of mutual information as entropy carried by that information $Y$

Say you have a dice $X$ and you want to predict the outcome, then your belief is a discrete uniform one with $p=1/6$ so the entropy is around 1.79

Now, suppose I tell you that the outcome is $<3$ (would be your $Y$), now your belief is 50% on 1&2 and 0 on the other outcomes, and your entropy is now 0.7

As you can see, $Y$ carries a lot of entropy, because knowing it reduces a lot the uncertainty, thus having a big mutual information with $X$

Obviously, you cannot have an information that increases entropy, as at most $X$ is conditional independent to $Y$, and the two terms of the MI are equal, so the difference is 0

Alberto
  • 2,863
  • 5
  • 12