In UCB, is the actual upper bound an upper bound of an one-sided or two-sided confidence interval?

Question

I'm a bit confused about the visualization of the upper bound (following the notation of (c.f. Sutton & Barto (2018))

$$Q_t(a)+C\sqrt{\frac{\mathrm{ln}(t)}{N_t(a)}}$$

In many blog posts about the UCB(1)-algorithm, such as visualized in the following image (c.f. Link ):

Text

Isn't the upper (confidence) bound simply the upper bound of a one-sided confidence interval instead of a two-sided confidence interval as shown in the image above? A lower bound of the interval is completely useless in this case, or am I wrong?

score 1 · Answer 1 · answered Jan 31 '21 at 18:52

The upper bound used here is derived from Hoeffding's inequality, which provides a symmetric, two-sided confidence interval. A good pair of blog posts on how this bound used in UCB for bandits is derived can be found here:

Indeed, in practice when using this UCB for bandits, we do not actually care about the lower bound. We only need the upper found for the exploration mechanism. But the lower bound does still exist, even if we don't use it.

In UCB, is the actual upper bound an upper bound of an one-sided or two-sided confidence interval?

1 Answers1