The complexity order of regret (especially in online reinforcement learning)?

Question

In online reinforcement learning theory, how to judge the complexity order of regret, if there are two or more terms in there?

For example, the state space is $X$, the action space is $A$, the episode number is $K$, and the horizon number is $H$. If we have an algorithm with regret $R_K=2X^2A H \ln K \ln H + 3XAH^2\ln K$.

How can I decide the order of this algorithm? Should we look at which term is dominant with respect to the episode number $K$ or total steps $T=HK$? Then it might be $O(XAH^2\ln K)$

Or also consider the relationship with the state and action space? Then it might be $O(X^2AH^2\ln K)$, here I choose the highest order of each variable in both terms.

score 1 · Answer 1 · answered Jun 09 '23 at 15:13

I assume your algorithm to loop over $K$ policies (or episodes), for $H$ steps, on each state and action pairs (where $X=|\mathcal S|$ and $A=|\mathcal A|$ denote the size of the state and action spaces; assuming them to be discrete for simplicity.)

If so, and assuming an algorithm $R_K=2X^2AH \ln K\ln H + 3XA H^2 \ln K$ which makes such amount of steps both in the worst and best case (I'll clarify that later), it's computational complexity is about: $$\begin{align} O(R_K) &= O(2X^2AH \ln K\ln H + 3XA H^2) \\ &= O(X^2AH \ln K\ln H + XA H^2) & (1) \\ &= O\big(XAH \ln K (X\ln H + H)\big) & (2) \\ &= O\big(n^3\ln n\ (n\ln n + n)\big) & (3) \\ &= O\big(n^4\ln^2 n + n^4\ln n\big) \\ &= O(n^4\ln^2 n) & (4)\\ &< O(n^5) & (5) \end{align}$$

Explaination. In line (1) I discard the constants $2$ and $3$ because they don't affect the complexity. Then, in line (2) I group by the common factor $XAH\ln K$. Next in (3), I rewrite $n=XAH\ln K$ note that this is not mathematically exact but for the purpose of computational complexity you can assume $n$ to be your overall input size, i.e. the complexity of the algo $R_K$ is a function of $n$. In line (4) the term $n^4\ln^2 n$ dominates $n^4\ln n$, so you can remove the latter. Lastly, your final complexity is $O(n^4\ln^2 n)$ but if you're not sure of its scale you can further bound with a degree-five function, i.e. $n^4\ln^2 n$ doesn't grow as fast as $n^5$.

(Last point: if your algo behaves the same both in the worst $O(\cdot)$ and best $\Omega(\cdot)$ case, your average complexity is $\Theta(n^4\ln^2 n)$.)

The complexity order of regret (especially in online reinforcement learning)?

1 Answers1