I have been studying policy gradients recently but found different expositions from different sources, which greatly confused me. From the book "Reinforcement Learning: an Introduction (Sutton & Barto Chapter 13)", we get the following policy gradient: $$ \nabla J(\theta) = \mathbb E_\pi\left[G_t\nabla\log\pi(A_t | S_t, \theta)\right]. $$ As we can observe from the equation, it does not relate to trajectory distributions. However, a more intuitive and widely-used introduction to policy gradient starts from defining the distribution of trajectories: $p(\tau)$. For example, in OpenAI Spinning Up, the policy gradient has the form similar to the following equation: $$ \nabla J(\theta) = \mathbb E_{\tau \sim \pi}\left[\sum_{t=0}^{T}G_t\nabla_\theta\log\pi_\theta(a_t | s_t)\right]. $$ The confusion comes from the fact that the first policy gradient does not have a summation over timestamps and is not sampling from trajectories, but the second samples from trajectories and has a summation.
I did find some relevant questions about this confusion, but none of them seemed to have a good answer. Also, I could not identify any source that explained the difference/connection between the two forms.
My question is why are there two different ways to describe the policy gradient and are the two forms mathematically equivalent?
Update
I found a great RL theory book (draft) written by some expert Professors in this field that shows two different formulations: https://rltheorybook.github.io. Also, one of Nan's lecture notes shows the correlation between the two forms: https://nanjiang.cs.illinois.edu/files/cs598/note6.pdf.