I assume your algorithm to loop over $K$ policies (or episodes), for $H$ steps, on each state and action pairs (where $X=|\mathcal S|$ and $A=|\mathcal A|$ denote the size of the state and action spaces; assuming them to be discrete for simplicity.)
If so, and assuming an algorithm $R_K=2X^2AH \ln K\ln H + 3XA H^2 \ln K$ which makes such amount of steps both in the worst and best case (I'll clarify that later), it's computational complexity is about:
$$\begin{align}
O(R_K) &= O(2X^2AH \ln K\ln H + 3XA H^2) \\
&= O(X^2AH \ln K\ln H + XA H^2) & (1) \\
&= O\big(XAH \ln K (X\ln H + H)\big) & (2) \\
&= O\big(n^3\ln n\ (n\ln n + n)\big) & (3) \\
&= O\big(n^4\ln^2 n + n^4\ln n\big) \\
&= O(n^4\ln^2 n) & (4)\\
&< O(n^5) & (5)
\end{align}$$
Explaination. In line (1) I discard the constants $2$ and $3$ because they don't affect the complexity. Then, in line (2) I group by the common factor $XAH\ln K$. Next in (3), I rewrite $n=XAH\ln K$ note that this is not mathematically exact but for the purpose of computational complexity you can assume $n$ to be your overall input size, i.e. the complexity of the algo $R_K$ is a function of $n$. In line (4) the term $n^4\ln^2 n$ dominates $n^4\ln n$, so you can remove the latter. Lastly, your final complexity is $O(n^4\ln^2 n)$ but if you're not sure of its scale you can further bound with a degree-five function, i.e. $n^4\ln^2 n$ doesn't grow as fast as $n^5$.
(Last point: if your algo behaves the same both in the worst $O(\cdot)$ and best $\Omega(\cdot)$ case, your average complexity is $\Theta(n^4\ln^2 n)$.)