I am looking at this formula which breaks down the gradient of $P(\tau |\theta)$ the first part is clear as is the derivative of $\log(x)$, but I do not see how the first formula is rearranged into the second.
1 Answers
The identity $$\nabla_{\theta} P(\tau \mid \theta) = P(\tau \mid \theta) \nabla_{\theta} \log P(\tau \mid \theta)\tag{1}\label{1},$$
which can also be written as
\begin{align} \nabla_{\theta} \log P(\tau \mid \theta) &= \frac{\nabla_{\theta} P(\tau \mid \theta)}{P(\tau \mid \theta)}\\ &=\frac{1}{P(\tau \mid \theta)} \nabla_{\theta} P(\tau \mid \theta) \end{align}
directly comes from the general rule to derive the logarithm of a function and the chain rule \begin{align} \frac{d \log f(x)}{d x} &= \frac{1}{f(x)} \frac{d f}{dx}. \end{align} Note that $\log f(x)$ is a composite function and that's why we apply the chain rule and that the derivative of $\log x = \frac{1}{x}$, as your text says.
People shouldn't call this a trick. There's no trick here. It's just basic calculus.
Why do you need identity \ref{1}? Because that identity tells you that the derivative of the probability of the trajectory given the parameter $\theta$ with respect to $\theta$ is $P(\tau \mid \theta)$ times the gradient of the logarithm of that same probability. How is this useful? Because the logarithm will turn your product into a sum (and the derivative of a sum is the sum of the derivatives of the elements of the sum), Essentially, the identity \ref{1} will help you to compute the gradient is an easier way (at least, conceptually).
- 42,615
- 12
- 119
- 217
