0

From Bishop's Pattern Recognition and Machine Learning:

$t_n = y_n + \epsilon_n$, where $\epsilon_n$ is a random noise variable whose value is chosen independently for each observation $n$. Consider

$$p(t_n|y_n)= {\cal N}(t_n|y_n,\beta^{-1})$$

Because noise is independent for each data point, we have

$$p(\textbf t | \textbf{y}) = {\cal N} (\textbf t|\textbf y,\beta^{-1}I_N)$$

where $\textbf t = (t_1,...,t_N)^T$, and $\textbf y = > (y_1,....,y_N)^T$

Question: Why the covariance matrix is equal to $\beta^{-1}I_N$? I know if two random variables are marginal independent then their covariance is zero (identity in this case), but it only gives $\epsilon_n$ are independent of each other, not $t_n$ are independent of each other. I hope someone could clarify this question, please.

piero
  • 133
  • 5

1 Answers1

1

It is not saying that the $t_n$ are independent of one another but that $t_n|y_n$ are independent.

The only variation in the target values $t_n$ once you've supplied the $y_n$ is given by $\epsilon_n$. In other words, the target values are independent given the $y$ values. The $y$ values are not necessarily independent. Their covariance is determined by $\textbf{K}$, the Gram matrix derived from the kernel function. The $t$ values are also not independent in general. It is only after you've supplied the appropriate $y$ that a $t$ becomes independent of its fellows.

Notes:

  • $\beta$ is a scalar hyperparameter representing the shared precision of the noise. $\beta^{-1}$ is another way of writing $\frac{1}{\beta}$, the shared variance of the noise.
  • I'm using the 2006 edition of Bishop. This comes from section 6.4.2 (Gaussian processes for regression).
Eponymous
  • 146
  • 6