1

Please see slide 32 in the following lecture slides on DP:

https://groups.uni-paderborn.de/lea/share/lehre/reinforcementlearning/lecture_slides/built/Lecture03.pdf

Let $m$ the size size of action space in MPD. Why are there up to $m^2$ action values when we consider the complexity of DP based on $q(x,u)$ ?

DSPinfinity
  • 1,223
  • 4
  • 10

1 Answers1

2

To fully understand generically the transition dynamics from state $x$ to state $x′$ in terms of action values without state values involvement, we need to evaluate all possible combinations of actions for both states via their respective action values $q(x,u),q(x',u')$. Since for each state there are at most $m$ possible actions, this means there are at most $m^2$ possible combinations of actions for every pair of states $x$ and $x′$s. Furthermore there're at most $n^2$ state transition pairs obviously, therefore you arrive at complexity is inferior with $O(m^2·n^2)$.

cinch
  • 11,000
  • 3
  • 8
  • 17