4

Intuitively, I understand that having an unbiased estimate of a policy is important because being biased just means that our estimate is distant from the truth value.

However, I don't understand clearly why having lower variance is important. Is that because, in offline policy evaluation, we can have only 'one' estimate with a stream of data, and we don't know if it is because of variance or bias when our estimate is far from the truth value? Basically, variance acts like bias.

Also, if that is the case, why having variance is preferable to having a bias?

nbro
  • 42,615
  • 12
  • 119
  • 217
Hunnam
  • 227
  • 1
  • 6

2 Answers2

1

Having low variance is important in general as it reduces the number of samples needed to obtain accurate estimates. This is the case for all statistical machine learning, not just reinforcement learning.

In general, if you are estimating a mean or expected quantity by taking many samples, the variation in the error is proportional to $\frac{\sigma}{\sqrt{N}}$ for a direct arithemtic mean of all samples, and behaves similarly for other averaging approaches (such as recency-weighted means using a learning rate). The bounds on accuracy can be made better by either increasing $N$ i.e. taking more samples, or by decreasing the variance $\sigma^2$.

So anything you can do to reduce variance in your measurements has a direct consequence of reducing the number of samples required to achieve the same degree of accuracy.

In the case of off-policy reinforcement learning, there is added variance - compared to on-policy learning - due to different probabilities of taking an action in behaviour and target policies. This is due to the need to adjust reward signals using importance sampling - multiplying by the importance sampling ratio will make the reward signal vary more (in fact it can become unbounded). This is not really any more of a challenge than any other source of variance, but as it interferes with the goal of speedy learning, a lot of research effort has been put into methods that reduce the variance.

Neil Slater
  • 33,739
  • 3
  • 47
  • 66
1

Bias is not necessarily bad, even though the term bias usually has a negative connotation. In fact, in machine learning, inductive bias is quite important and necessary. For example, if you want to learn a function $f(x) = y$, where $x \in \mathcal{X}$ and $y \in \mathcal{Y}$, you often just have a finite dataset $\mathcal{D} = \{ (x_i, y_i)\}_{i=1}^N$, which may not contain all possible $(x, y)$ pairs associated with $f$. In that case, $\mathcal{D}$ may not contain enough information to learn $f$, so you need to assume that $f$ behaves in a certain way or that the input and output spaces have certain characteristics. A typical way of dealing with finite datasets is to introduce noise during the learning process (which is a regularization technique).

However, bias can lead to sub-optimal solutions. For example, you could assume that $f$ is a lot more complex than the function $\hat{f}$ that maps $x_i$ to $y_i$ (of $\mathcal{D}$), for $i=1, \dots, N$. So, to solve this issue, you could introduce a lot of noise, while, in reality, $\hat{f}$ may be extremely similar to $f$, even though not exactly the same, so, in reality, you may not need all this noise.

Why is low variance desirable? Essentially, while you are learning something, it is easier to learn regular patterns as opposed to more irregular ones. For example, $1, 2, 1, 2, 1, 2$ is a relatively regular sequence compared to $8, 2, 5, 6, 1, 7, 99$, which is thus harder to learn (or memorise) than the former.

nbro
  • 42,615
  • 12
  • 119
  • 217