I am currently reading a paper about Negative Sampling in Knowledge Graphs, and ran into the objective function of KG Representation Learning. There are 2 nodes $u, v$ which composes a triple. And there are negative samples of $u$ given $v$, denoted as $\bar{u}$. We optimize the objective function by maximizing the similarity of positive samples(left side of +) & minimizing the similarity of negative samples(right side of +).
What I don't understand is how the 2nd line of the equation below becomes the 3rd line of the equation.
$$ \begin{align} J^{(v)}&=\mathbb{E}_{u \sim P_{d}(u|v)}~log~\sigma({\bf{u}}^{\top}{\bf{v}})+ \mathbb{E}_{\bar{u} \sim P_{d}(\bar{u}|v)}~log~\sigma({-\bf{\bar{u}}}^{\top}{\bf{v}})\\ &=\sum\limits_{u}P_{d}(u|v)~log~\sigma({\bf{u}}^{\top}{\bf{v}})+k\sum\limits_{\bar{u}}P_{n}(\bar{u}|v)~log~\sigma{({-\bf{\bar{u}}}^{\top}{\bf{v}})}\\ &=\sum\limits_{u}[P_{d}(u|v)~log~\sigma({\bf{u}}^{\top}{\bf{v}})+kP_{n}(u|v)~log~\sigma{({-\bf{u}}^{\top}{\bf{v}})}]\\ &=\sum\limits_{u}[P_{d}(u|v)~log~\sigma({\bf{u}}^{\top}{\bf{v}})+kP_{n}(u|v)~log~(1 - \sigma{({\bf{u}}^{\top}{\bf{v}})})]\\ \end{align} $$
From what I understand, the negative distribution $P_{n}(\bar{u}|v)$ is defined among the negative sample set, which won't contain the positive sample $u$. And even if it does, the value of $P_{n}(\bar{u}|v)$ and $P_{n}(u|v)$ won't be the same.