Derivative with respect to the spacetime derivative of a field $\phi$

Question

I've encountered the following notation several times (for example, when discussing Noether's Theorem): $$\frac{\partial L}{\partial(\partial_\mu \phi)}$$ And it's not immediately clear to me what this operator $\frac{\partial}{\partial(\partial_\mu \phi)}$ refers to. Based on the answer to this question and the structure of the covariant derivative, I'm guessing it's just a short-hand for the following operator: $$\frac{\partial}{\partial(\partial_\mu \phi)}\equiv\pm(\frac{\partial}{\partial\dot{\phi}},-\frac{\partial}{\partial(\nabla\phi)})$$ I.e. $$\frac{\partial L}{\partial(\partial_\mu \phi)}\equiv\pm(\frac{\partial L}{\partial\dot{\phi}},-\frac{\partial L}{\partial(\nabla\phi)})$$ where the $\pm$ comes from your convention for the Minkowski metric.

This is just a guess. Can someone verify, perhaps with a source?

yngabl · Answer 1 · 2016-09-12T08:18:40.047

In field theory usually you have a Lagrangian density function of fields and first derivative of fields or $$\mathscr{L}(\phi,\partial_\mu\phi).$$ It is well know that higher derivative of field than first are in some way problematic (Hamiltonian not bounded from below). Field equations follow from Euler-Lagrange equations in which you treat $\phi$ and $\partial_\mu\phi$ as independent variables. The same happen in Classical Mechanic where your Lagrangian is a function of two variables $q$ and $\dot{q}$. So if you are familiar with $$\frac{\partial L}{\partial\dot{q}}$$ you should become familiar with the field theoretic version.

aquirdturtle · Answer 2 · 2016-10-10T05:05:01.483

This article is a useful start. https://en.wikipedia.org/wiki/Matrix_calculus

So throughout the discussions with the kind folks here I think that I've sorted out my confusion, although there wasn't any particular answer that did it. Here I'll analyze my confusion over this notation in case anyone comes to this thread with similar confusions.

The most straightforward answer (IMO) to the question "how do I interpret $\frac{\partial}{\partial(\partial_\mu\phi)}$?" is to interpret as four distinct operators. I.e. $$ \frac{\partial L}{\partial(\partial_\mu\phi)} = [\frac{\partial L}{\partial(\partial_0\phi)}, \frac{\partial L}{\partial(\partial_1\phi)}, \frac{\partial L}{\partial(\partial_2\phi)}, \frac{\partial L}{\partial(\partial_3\phi)}] $$

This is essentially what I suggested in my question, but it wasn't obvious to me that it was necessarily correct. The reason I didn't immediately jump to this conclusion when I saw this is that we often (usually?) talk about four-vectors (using the $\mu$ notation $\partial_\mu$) as composite objects, not as elements of composite objects. For example, whether it's strictly speaking correct or not, I have often seen people refer to $\partial_\mu$ as the covariant derivative, not the $\mu$th element of the covariant derivative. Similarly, I see the notation $x_\mu = [t,x,y,z]$ and not $x_\mu = [t,x,y,z]_\mu$, which would more strongly reinforce the idea that $x_\mu$ is referring to a single but arbitrary element of the collection $[t,x,y,z]$. All of these things encouraged me to think of the term $x_\mu$ as the entire collection, not as an arbitrary element of said collection. In fact these seem like incompatible interpretations to me. For example, if I interpret these objects as arbitrary elements of said collections, then the term $x_\nu g_{\nu\mu}$ (which I have directly seen in work before) is perfectly well-defined, while if I interpret $x_\nu$ as a vector and $g_{\nu\mu}$ as a matrix, then it clearly isn't.

However there does seem to be another consistent way to work with these derivatives which does abstract out the $\partial_\mu\phi$ as an entire object, like $x$, for the derivative to be taken with respect to. Sean Carroll briefly describes how to work with this in his book "Spacetime and Geometry". It seems a bit tricky though. He gives the guidelines that, to be careful, to make sure that the "derivitee" is expressed with tensors with subscripts in the same places as the parameter the derivative is with respect to, and to use different subscripts for the two as well. For example,

$$ \frac{\partial}{\partial(\partial_\xi \phi)} (\frac{1}{2}\partial^\mu\phi\partial_\mu\phi) = \frac{\partial}{\partial(\partial_\xi\phi)} (\frac{1}{2}g^{\nu\mu}\partial_\nu\phi\partial_\mu\phi) = \frac{1}{2}g^{\nu\mu}(\frac{\partial}{\partial(\partial_\xi\phi)} (\partial_\nu\phi)\partial_\mu\phi + \partial_\nu\phi \frac{\partial}{\partial(\partial_\xi\phi)} (\partial_\mu\phi)) = \frac{1}{2}g^{\nu\mu}(\delta^\xi_\nu\partial_\mu\phi + \partial_\nu\phi \delta^\xi_\mu) = g^{\mu\nu}\partial_\nu\phi = \partial^\nu\phi $$

It seems to me like there might be cases in which this sort of approach breaks down, such as if I ever had a Lagrangian (or other thing that I wanted to take the derivative of) like $L = \nabla^2\phi$. Maybe dealing with that's trivial and I just don't see it, but at this point I'd return to my previous interpretation, as while less compact, it seems a little more trustworthy.

Sorry if I seem a bit thick as I try to grapple with this notation but I've had an unusually hard time working with it and this result was not obvious to me. Let me know if I've still got something wrong in here. Thanks all.

Derivative with respect to the spacetime derivative of a field $\phi$

2 Answers2