4

I am looking at this formula which breaks down the gradient of $P(\tau |\theta)$ the first part is clear as is the derivative of $\log(x)$, but I do not see how the first formula is rearranged into the second.

enter image description here

nbro
  • 42,615
  • 12
  • 119
  • 217
Jacob B
  • 279
  • 2
  • 7

1 Answers1

3

The identity $$\nabla_{\theta} P(\tau \mid \theta) = P(\tau \mid \theta) \nabla_{\theta} \log P(\tau \mid \theta)\tag{1}\label{1},$$

which can also be written as

\begin{align} \nabla_{\theta} \log P(\tau \mid \theta) &= \frac{\nabla_{\theta} P(\tau \mid \theta)}{P(\tau \mid \theta)}\\ &=\frac{1}{P(\tau \mid \theta)} \nabla_{\theta} P(\tau \mid \theta) \end{align}

directly comes from the general rule to derive the logarithm of a function and the chain rule \begin{align} \frac{d \log f(x)}{d x} &= \frac{1}{f(x)} \frac{d f}{dx}. \end{align} Note that $\log f(x)$ is a composite function and that's why we apply the chain rule and that the derivative of $\log x = \frac{1}{x}$, as your text says.

People shouldn't call this a trick. There's no trick here. It's just basic calculus.

Why do you need identity \ref{1}? Because that identity tells you that the derivative of the probability of the trajectory given the parameter $\theta$ with respect to $\theta$ is $P(\tau \mid \theta)$ times the gradient of the logarithm of that same probability. How is this useful? Because the logarithm will turn your product into a sum (and the derivative of a sum is the sum of the derivatives of the elements of the sum), Essentially, the identity \ref{1} will help you to compute the gradient is an easier way (at least, conceptually).

nbro
  • 42,615
  • 12
  • 119
  • 217