I'm studying the chapter of Information theory from Haykin's deep learning book.
It says Mutual Information between two continuous random variables $X,Y$ is defined in terms of the differential entropies $h(\cdot)$ as $I(X;Y)=h(X)-h(X|Y) = -\int_{-\infty}^{\infty}p_X(x)\log p_X(x) + \int _{-\infty}^{\infty} p_{X|Y}\log p_{X|Y}p(x|y)$.
Also, is it correct to interpret mutual information as a measure of "reduction in uncertainty"? I mean, if the mutual information $I(X;Y)$ is high, then there's gain in information of the random variable $X$ by observing $Y$.