Questions tagged [regret]
3 questions
4
votes
2 answers
Why is regret so defined in MABs?
Consider a multi-armed bandit(MAB). There are $k$ arms, with reward distributions $R_i$ where $1 \leq i \leq k$. Let $\mu_i$ denote the mean of the $i^{th}$ distribution.
If we run the multi-armed bandit experiment for $T$ rounds, the "pseudo…
stoic-santiago
- 1,201
- 9
- 22
2
votes
0 answers
Is there any reasonable notion of regret for infinite horizon discounted MDPs?
I am thinking about episodic MDPs. Usually, in episodic MDPs, it seems that we have a finite fixed horizon per episode and no discount factor. Then, a very intuitive notion of regret after $T$ episodes is to sum over the difference of optimal…
Felix P.
- 295
- 1
- 7
0
votes
1 answer
The complexity order of regret (especially in online reinforcement learning)?
In online reinforcement learning theory, how to judge the complexity order of regret, if there are two or more terms in there?
For example, the state space is $X$, the action space is $A$, the episode number is $K$, and the horizon number is $H$. If…
white
- 1