Please see slide 32 in the following lecture slides on DP:
Let $m$ the size size of action space in MPD. Why are there up to $m^2$ action values when we consider the complexity of DP based on $q(x,u)$ ?
Please see slide 32 in the following lecture slides on DP:
Let $m$ the size size of action space in MPD. Why are there up to $m^2$ action values when we consider the complexity of DP based on $q(x,u)$ ?
To fully understand generically the transition dynamics from state $x$ to state $x′$ in terms of action values without state values involvement, we need to evaluate all possible combinations of actions for both states via their respective action values $q(x,u),q(x',u')$. Since for each state there are at most $m$ possible actions, this means there are at most $m^2$ possible combinations of actions for every pair of states $x$ and $x′$s. Furthermore there're at most $n^2$ state transition pairs obviously, therefore you arrive at complexity is inferior with $O(m^2·n^2)$.