2

I'm a bit confused about the visualization of the upper bound (following the notation of (c.f. Sutton & Barto (2018))

$$Q_t(a)+C\sqrt{\frac{\mathrm{ln}(t)}{N_t(a)}}$$

In many blog posts about the UCB(1)-algorithm, such as visualized in the following image (c.f. Link ):

Text

Isn't the upper (confidence) bound simply the upper bound of a one-sided confidence interval instead of a two-sided confidence interval as shown in the image above? A lower bound of the interval is completely useless in this case, or am I wrong?

nbro
  • 42,615
  • 12
  • 119
  • 217
D. B.
  • 101
  • 1
  • 7

1 Answers1

1

The upper bound used here is derived from Hoeffding's inequality, which provides a symmetric, two-sided confidence interval. A good pair of blog posts on how this bound used in UCB for bandits is derived can be found here:

  1. First steps: Explore-then-Commit
  2. The Upper Confidence Bound Algorithm

Indeed, in practice when using this UCB for bandits, we do not actually care about the lower bound. We only need the upper found for the exploration mechanism. But the lower bound does still exist, even if we don't use it.

Dennis Soemers
  • 10,519
  • 2
  • 29
  • 70