Interval preserving transformations are linear in special relativity

Question

In almost all proofs I've seen of the Lorentz transformations one starts on the assumption that the required transformations are linear. I'm wondering if there is a way to prove the linearity:

Prove that any spacetime transformation $\left(y^0,y^1,y^2,y^3\right)\leftrightarrow \left(x^0,x^1,x^2,x^3\right)$ that preserves intervals, that is, such that

$$\left(dy^0\right)^2-\left(dy^1\right)^2-\left(dy^2\right)^2-\left(dy^3\right)^2=\left(dx^0\right)^2-\left(dx^1\right)^2-\left(dx^2\right)^2-\left(dx^3\right)^2$$

is linear (assuming that the origins of both coordinates coincide). That is, show that $\frac{\partial y^i}{\partial x^j}=L_j^i$ is constant throughout spacetime (that is, show that $\frac{\partial L_j^i}{\partial x^k}=0$).

Thus far all I've been able to prove is that $g_{ij}L_p^iL_q^j=g_{pq}$ (where $g_{ij}$ is the metric tensor of special relativity) and that $\frac{\partial L_j^i}{\partial x^k}=\frac{\partial L_k^i}{\partial x^j}$. Any further ideas?

Qmechanic · Accepted Answer · 2011-07-28T13:08:09.500

In hindsight, here is a short proof.

The metric $g_{\mu\nu}$ is the flat constant metric $\eta_{\mu\nu}$ in both coordinate systems. Therefore, the corresponding (uniquely defined) Levi-Civita Christoffel symbols

$$ \Gamma^{\lambda}_{\mu\nu}~=~0$$

are zero in both coordinate systems. It is well-known that the Christoffel symbol does not transform as a tensor under a local coordinate transformation $x^{\mu} \to y^{\rho}=y^{\rho}(x)$, but rather with an inhomogeneous term, which is built from the second derivative of the coordinate transformation,

$$\frac{\partial y^{\tau}}{\partial x^{\lambda}} \Gamma^{(x)\lambda}_{\mu\nu} ~=~\frac{\partial y^{\rho}}{\partial x^{\mu}}\, \frac{\partial y^{\sigma}}{\partial x^{\nu}}\, \Gamma^{(y)\tau}_{\rho\sigma}+ \frac{\partial^2 y^{\tau}}{\partial x^{\mu} \partial x^{\nu}}. $$

Hence all the second derivatives are zero,

$$ \frac{\partial^2 y^{\tau}}{\partial x^{\mu} \partial x^{\nu}}~=~0, $$

i.e. the transformation $x^{\mu} \to y^{\rho}=y^{\rho}(x)$ is affine.

a06e · Answer 2 · 2016-05-04T21:06:36.887

I had the feeling that a direct proof would be possible using only the relation $\eta _{ij}\frac{\partial y^i}{\partial x^p}\frac{\partial y^j}{\partial x^q}=\eta _{pq}$, assuming simple smoothness properties of the transformation and then using some algebra maneuvers. I found the following lovely argument in the book Gravitation and Cosmology by Steven Weinberg.

We start from the relation

$$\eta _{ij}\frac{\partial y^i}{\partial x^p}\frac{\partial y^j}{\partial x^q}=\eta _{pq}$$

Differentiating with respect to $x^k$ we obtain

$$\eta _{ij}\frac{\partial ^2y^i}{\partial x^p\partial x^k}\frac{\partial y^j}{\partial x^q}+\eta _{ij}\frac{\partial y^i}{\partial x^p}\frac{\partial ^2y^j}{\partial x^q\partial x^k}=0$$

We add to this the same equation with $p$ and $k$ interchanged, and subtract the same with $q$ and $k$ interchanged; that is,

$$\eta _{ij}\left(\frac{\partial ^2y^i}{\partial x^p\partial x^k}\frac{\partial y^j}{\partial x^q}+\frac{\partial y^i}{\partial x^p}\frac{\partial ^2y^j}{\partial x^q\partial x^k}+\frac{\partial ^2y^i}{\partial x^k\partial x^p}\frac{\partial y^j}{\partial x^q}+\frac{\partial y^i}{\partial x^k}\frac{\partial ^2y^j}{\partial x^q\partial x^p}-\frac{\partial ^2y^i}{\partial x^p\partial x^q}\frac{\partial y^j}{\partial x^k}-\frac{\partial y^i}{\partial x^p}\frac{\partial ^2y^j}{\partial x^k\partial x^q}\right)=0$$

This simplifies to

$$2\eta _{ij}\frac{\partial ^2y^i}{\partial x^p\partial x^k}\frac{\partial y^j}{\partial x^q}=0$$

Since the tensors $\frac{\partial y^i}{\partial x^j}$ and $\eta _{ij}$ are invertible, this implies that

$$\frac{\partial ^2y^i}{\partial x^p\partial x^k}=0$$

score 7 · Answer 3 · answered Jul 23 '11 at 23:51

Here I just want to mention that there exists a direct proof in $1+1$ dimensions using elementary arguments. Let the two coordinate patches $U_x$ and $U_y$ (which are, say, both convex sets in $\mathbb{R}^2$, containing the origin) have light-cone coordinates $x^{\pm}$ and $y^{\pm}$, respectively. The metric reads

$$ dy^{+}dy^{-} ~=~ dx^{+}dx^{-}. $$

This leads to three PDE's

$$ \frac{\partial y^{+}}{\partial x^{+}} \frac{\partial y^{-}}{\partial x^{+}} ~=~0 \qquad \qquad\Leftrightarrow\qquad \qquad\frac{\partial y^{+}}{\partial x^{+}}~=~0 \qquad\mathrm{or}\qquad \frac{\partial y^{-}}{\partial x^{+}} ~=~0 ,$$ $$ \frac{\partial y^{+}}{\partial x^{-}} \frac{\partial y^{-}}{\partial x^{-}} ~=~0\qquad \qquad\Leftrightarrow \qquad\qquad\frac{\partial y^{+}}{\partial x^{-}}~=~0 \qquad\mathrm{or}\qquad \frac{\partial y^{-}}{\partial x^{-}} ~=~0,$$ $$ \frac{\partial y^{+}}{\partial x^{+}} \frac{\partial y^{-}}{\partial x^{-}} +\frac{\partial y^{+}}{\partial x^{-}} \frac{\partial y^{-}}{\partial x^{+}} ~=~1.$$

Since $\det \frac{\partial y}{\partial x}\neq 0$, there are really only two possibilities. Either

$$\frac{\partial y^{-}}{\partial x^{+}}~=~0 ~=~\frac{\partial y^{+}}{\partial x^{-}},$$

or

$$\frac{\partial y^{+}}{\partial x^{+}}~=~0 ~=~\frac{\partial y^{-}}{\partial x^{-}}.$$

By possibly relabeling $x^{+} \leftrightarrow x^{-}$, we may assume the former. So

$$y^{+}~=~f^{+}(x^{+})\qquad \mathrm{and} \qquad y^{-}~=~f^{-}(x^{-}).$$

From the third PDE, we conclude that

$$ \frac{\partial f^{+}}{\partial x^{+}}\frac{\partial f^{-}}{\partial x^{-}} ~=~1. $$

By separation of variables, this is only possible if $\frac{\partial f^{\pm}}{\partial x^{\pm}}$ is independent of $x^{\pm}$. It follows that $y^{\pm}~=~f^{\pm}(x^{\pm})$ are affine functions. Q.E.D.

Marek · Answer 4 · 2011-07-24T16:12:55.670

Let's first assume that the scalar product that is preserved has positive signature to show the main idea. Also, you say you don't want to assume homogeneity but this is already implicit in your equation since to form intervals differences of space-time points are used so we might as well take one of those points to be zero of a vector space (equivalently, you might be talking about the preservation of a scalar product on a tangent space to a point but this is also linear, not affine).

Let $$f : \mathbb R^2 \to \mathbb R^2, \quad (x,y) \mapsto (A(x,y), B(x,y))$$ be length-preserving and suppose $f$ is analytic with $$A(x,y) = \sum_{n,m=0}^{\infty} {a_{n,m} \over n! m!} x^n y^m, \quad B(x,y) = \sum_{n,m=0}^{\infty} {b_{n,m} \over n! m!} x^n y^m.$$

Then we have $x^2 + y^2 = A(x,y)^2 + B(x,y)^2$ for all $x,y \in \mathbb R^2$ or explicitly first $$ x^2 + y^2 = \left(\sum_{n,m=0}^{\infty} {a_{n,m} \over n! m!} x^n y^m \right)^2 + \left( \sum_{n,m=0}^{\infty} {b_{n,m} \over n! m!} x^n y^m \right)^2.$$ This immediately shows that the only non-vanishing coefficients occur when $n+m \leq 1$. We just need to investigate $n=m=0$ case but this is trivial since $f(0,0) = (a_{0,0}, b_{0,0})$.

For the $n$D case the discussion is completely analogous. For arbitrary signature some care needs to be taken since we can't use $x^2 + y^2 = 0 \rightarrow x = y =0$ anymore (perhaps one can work in $\mathbb C$ instead of $\mathbb R$ and use analytic continuation).

The last remaining ingredient of this argument is the analyticity of $f$. But this is trivial since $||f(x,y)||^2 = x^2 + y^2$ and $||\cdot||^2$ are analytic around any $(x,y) \in \mathbb R^2$.

score 5 · Answer 5 · edited May 21 '23 at 11:39

The proof actually turns out to be a very simple exercise in linear algebra. I find this algebraic proof very satisfactory since it uses very little machinery. It also proofs that rotations and (with a slight rewording) unitary maps are linear.

Theorem: Let $U$ and $V$ be vector spaces over a field $F$ equipped with bilinear form $g$ and $h$ respectively. Further assume that $h$ is non-degenerate and we have a surjective map $f:U\rightarrow V$ such that $h(f(u),f(v))=g(u,v)$ for all $u,v\in V$. Then $f$ is linear.

Proof: Let $u,v,w\in V$ and $k\in F$. Then

\begin{align*} h(f(ku+v)-kf(u)-f(v),f(w))&=h(f(ku+v),f(w))-kh(f(u),f(w))-h(f(v),f(w)) \\ &=g(ku+v,w)-kg(u,w)-g(v,w) \\& =kg(u,w)+g(v,w)-kg(u,w)-g(v,w)=0. \end{align*}

So that

\begin{align*} h(f(ku+v),f(w)) &= kh(f(u),f(w))+h(f(v),f(w)) \\ &=h(kf(u)+f(v),f(w)) \end{align*}

Since $w$ was arbitrary and $f$ is surjective, non-degeneracy of $h$ guarantees that $f(ku+v)=kf(u)+f(v)$. Therefore, $f$ is linear.

By taking $U=V=M$ to be the vector space underlying Minkowski space and $g=h=\eta$ as its metric, we obtain that metric preserving transformations are linear. Lorentz transformations (distance preserving transformations) are metric preserving transformations due to the polarization formula in @Brian Moths answer. I think that one has to however include surjectiveness in the definition of a Lorentz transformation. Compare to the Mazur-Ulam theorem.

Qmechanic · Answer 6 · 2011-07-27T12:54:46.987

Let us reformulate OP's question as follows:

Give a proof that a local coordinate transformation $x^{\mu} \to y^{\rho}=y^{\rho}(x)$ between two local coordinate systems (on a 3+1 dimensional Lorentzian manifold) must be affine if the metric $g_{\mu\nu}$ in both coordinate systems happen to be on constant flat Minkowski form $\eta_{\mu\nu}$.

Here we will present a proof that works both with Minkowski and Euclidean signature; in fact for any signature and for any finite non-zero number of dimensions, as long as the metric $g_{\mu\nu}$ is invertible.

1) Let us first recall the transformation property of the inverse metric $g^{\mu\nu}$, which is a contravariant $(2,0)$ symmetric tensor,

$$ \frac{\partial y^{\rho}}{\partial x^{\mu}} g^{\mu\nu}_{(x)}\frac{\partial y^{\sigma}}{\partial x^{\nu}}~=~g^{\rho\sigma}_{(y)}, $$

where $x^{\mu} \to y^{\rho}=y^{\rho}(x)$ is a local coordinate transformation. Recall that the metric $g_{\mu\nu}=\eta_{\mu\nu}$ is the flat constant metric in both coordinate systems. So we can write

$$ \frac{\partial y^{\rho}}{\partial x^{\mu}} \eta^{\mu\nu}\frac{\partial y^{\sigma}}{\partial x^{\nu}}~=~\eta^{\rho\sigma}. \qquad (1) $$

2) Let us assume that the local coordinate transformation is real analytic

$$y^{\rho} ~=~ a^{(0)\rho} + a^{(1)\rho}_{\mu} x^{\mu} + \frac{1}{2} a^{(2)\rho}_{\mu\nu}x^{\mu}x^{\nu} + \frac{1}{3!} a^{(3)\rho}_{\mu\nu\lambda}x^{\mu} x^{\nu} x^{\lambda} + \ldots. $$

By possibly performing an appropriate translation we will from now on assume without loss of generality that the constant shift $ a^{(0)\rho} =0 $ is zero.

3) To the zeroth order in $x$, the equation $(1)$ reads

$$ a^{(1)\rho}_{\mu} \eta^{\mu\nu}a^{(1)\sigma}_{\nu}~=~\eta^{\rho\sigma}, $$

which not surprisingly says that the matrix $a^{(1)\rho}_{\mu}$ is a Lorentz (or an orthogonal) matrix, respectively. By possibly performing an appropriate "rotation", we will from now on assume without loss of generality that the constant matrix

$$ a^{(1)\rho}_{\mu}~=~\delta^{\rho}_{\mu} $$

is the unit matrix.

4) In the following, it will be convenient to lower the index of the $y^{\sigma}$ coordinate as

$$y_{\rho}~:=~\eta_{\rho\sigma}y^{\sigma}.$$

Then the local coordinate transformation becomes

$$y_{\rho} ~=~ \eta_{\rho\mu} x^{\mu} + \frac{1}{2} a^{(2)}_{\rho,\mu\nu}x^{\mu}x^{\nu} + \frac{1}{3!} a^{(3)}_{\rho,\mu\nu\lambda}x^{\mu} x^{\nu} x^{\lambda}+ \ldots$$ $$+\frac{1}{n!} a^{(n)}_{\rho,\mu_1\ldots\mu_n}x^{\mu_1} \cdots x^{\mu_n}+ \ldots. $$

5) To the first order in $x$, the equation $(1)$ reads

$$ a^{(2)}_{\rho,\sigma\mu}+a^{(2)}_{\sigma,\rho\mu}~=~0.$$

That is, $a^{(2)}_{\rho,\mu\nu}$ is symmetric in $\mu\leftrightarrow \nu$, but antisymmetric in $\rho\leftrightarrow \mu$. It is not hard to see (by applying the symmetry and the antisymmetry property in alternating order three times each), that the second order coefficients $a^{(2)}_{\rho,\mu\nu}=0$ must vanish.

6) To the second order in $x$, the equation $(1)$ reads

$$ a^{(3)}_{\rho,\sigma\mu\nu}+a^{(3)}_{\sigma,\rho\mu\nu}~=~0.$$

That is, $a^{(3)}_{\rho,\mu\nu\lambda}$ is symmetric in $\mu\leftrightarrow \nu\leftrightarrow \lambda $, but antisymmetric in $\rho\leftrightarrow \mu$. For fixed $\lambda$, we can again reach the conclusion $a^{(3)}_{\rho,\mu\nu\lambda}=0$.

7) Similarly, we conclude inductively that the higher order coefficients $a^{(n)}_{\rho,\mu_1\ldots\mu_n}=0$ must vanish as well. So $y^{\mu}= x^{\mu}$. Q.E.D.

David Bar Moshe · Answer 7 · 2011-07-25T09:22:00.030

The first condition implies that the Jacobian matrix $L^i_j=\frac{\partial y^i}{\partial x^j}$ is a Lorentz transformation. By substitution of the definition of the Jacobian in this condition, we obtain:

$g^{ij}\frac{\partial y^k}{\partial x^i}\frac{\partial y^l}{\partial x^j} = g^{kl}$

In particular, taking the diagonal equations equating $l=k$, we have

$g^{ij}\frac{\partial y^k}{\partial x^i}\frac{\partial y^k}{\partial x^j} = g^{kk}= \pm 1 $

(The plus sign for the time coordinate and the minus sign for the space coordinates).

But this is just the Hamilton-Jacobi equation for a free relativistic particle, whose unique solution can be obtained by separation of variables:

$y^k = \sum_i f^{(k)}_i(x^i)$

By Substitution, we obtain:

$\frac {df^{(k)}_i(x^i)}{dx^i} = const$

Thus, the new coordinates are linear functions of the old coordinates. The constant coefficients are not independent, since the Jacobian matrix must be a Lorentz transformation.

Update:

Upon lurscher's suggestion, here are two references containing the Hamilton-jacobi equation of a relativistic particle. (Both references refer to a particle in an external electromagnetic field. In order to obtain the Hamilton-Jacobi equation for the free particle one needs the particualr case with a vanishing vector potential): reference-1 (by A. granik), reference-2

(The needed version appears in equation (33) of the first reference, the second reference contains the (proper) time dependent version).

In addition, I'll give here an other derivation based on the WKB approximation of the Klein- Gordon equation:

$\frac {1}{c^2}\frac {\partial^2\psi}{\partial t^2}-\nabla^2 \psi + \frac{m^2 C^2}{\hbar^2}\psi = 0$

The plane wave solutions are given by:

$\psi = C \exp(i\frac{\mathbf{p}.\mathbf{x}-\sqrt{m^2c^4+p^2c^2}t}{\hbar})$

To perform a WKB approximation, we seek a solution of the form:

$\psi = A(x,t)\exp(\frac{iS(x,t)}{\hbar})$

and take the leading terms in the limit $\hbar \rightarrow 0$. ($S$ is sometimes called the Hamilton-Jacobi phase function)

By subtitution, we obtain:

$((\frac {1}{c^2}\frac {\partial^2A}{\partial t^2}-\nabla^2A)+\frac{2i}{\hbar}(\frac {1}{c^2}\frac {\partial A}{\partial t} \frac {\partial S}{\partial t} -\mathbf{\nabla}A.\mathbf{\nabla}S) -\frac{A}{\hbar^2}(\frac {1}{c^2}\frac {\partial^2S}{\partial t^2}-\nabla^2S - m^2 c^2 )) = 0$

The leading term is the hamilton-Jacobi equation:

$\frac {1}{c^2}\frac {\partial^2S}{\partial t^2}-\nabla^2S - m^2 c^2 = 0$

Which can be seen to be equivalent to each equation on the main diagonal of the matrix equation written in the original answer.

Now, it is also easy to see the uniqueness of the solution. For the free particle, one can see that the non-leading terms actually vanish. i.e., the WKB approximation is exact.

The Hamilton-Jacobi phase function $S$ is just the phase of the plane wave solutions of the Klein-Gordon equation:

$ S = \mathbf{p}.\mathbf{x}-\sqrt{m^2c^4+p^2c^2}t$

On $\mathbb{R}^4$, all solutions of the free Klein-Gordon equation in Cartesian coordinates are of the form of the plane waves, which implies that the Hamilton-Jacobi phase function is linear in the Cartesian coordinates.

score 0 · Answer 8 · answered Dec 21 '16 at 02:43

First, notice that if $\Lambda$ is an isometry then it preserves dot products, since if $p'=\Lambda(p)$, $q'=\Lambda(q)$, and $r'=\Lambda(r)$ then $$\left( \langle r'-p' ,r'-p' \rangle -\langle r'-q' ,r'-q' \rangle - \langle q'-p' ,q'-p' \rangle \right)/2 = \langle r'-q' ,q'-p' \rangle, $$ and since the LHS is preserved so must the right hand side be.

Let's start with minkowski space and pick an origin $p$, and an orthnormal basis $\mathbf{e}_\mu$ satisfying $\langle \mathbf{e}_\mu , \mathbf{e}_\nu \rangle = \eta _{\mu \nu}$. Now any point $x$ in minkowski space can be written $p + x^\mu \mathbf{e}_\mu$, where $x_\mu = \langle x-p,\mathbf{e}_\mu\rangle$.

Now what about $x'=\Lambda(x)$? Well of course we want to say it has the same coordinates. So let's define the new basis. $\mathbf{e}'_\mu = \Lambda(p+\mathbf{e}_\mu)-\Lambda(p)$. Since $\Lambda$ perserves products of differences, we know that the $\mathbf{e}'_\mu$ are orthonormal and so $x'$ can be written $p' + x'^\mu \mathbf{e}'_\mu$, where $x'_\mu = \langle x'-p',\mathbf{e}'_\mu\rangle$.

But since $\Lambda$ preserves products, we have that $x'^\mu=x^\mu$. Therefore, since $$\Lambda(p + x^\mu \mathbf{e}_\mu)=\Lambda(p) +x^\mu\left(\Lambda(p + \mathbf{e}_\mu)-\Lambda(p)\right),$$ $\Lambda$ is affine.

Interval preserving transformations are linear in special relativity

8 Answers8

Linked

Related