4

In most places I've looked, I see that 4-vectors are defined as 4-element vectors that transform like the 4-position under lorentz transformation. This is typically accompanied by generally, $$\widetilde{A} \ ^\mu = \Lambda^\mu\ _\nu A^\nu$$

This is strange to me and seems circular. How do you "transform like the 4-position?" I can transform $x^\mu$, but how would that compare to other 4-element vectors? Couldn't I just slap $\Lambda$ on anything and say "oh look it transformed?" Clearly not, but I still don't understand.

Where am I going with this? I'd like to show that if $V^\mu U_\mu$ is lorentz invariant and $V$ is a 4-vector, then so must $U$. At first glance it seems like almost like I'm proving a definition. If the scalar is lorentz invariant, it's unchanged if transform both vectors. So am I done just by showing that $V^\mu U_\mu = \widetilde{V} \ ^\mu \widetilde{U} \ _\mu$? This seems too trivial...

Some elaboration and more: From what I know 4-vectors are norm invariant in all inertial frames. That is, for the 4-vector $V$ as an example, $V^\mu V_\mu = \widetilde{V} \ ^\mu \widetilde{V} \ _\mu$. Consider $\sigma \equiv V^\mu U_\mu$, where U isn't necessarily a 4-vector. If I state $\sigma$ is a lorentz scalar, I should find $U$ must be a 4-vector as well.

4 Answers4

3

Couldn't I just slap $Λ$ on anything and say "oh look it transformed"?

You could, but that's not what that statement is doing. Using the example you chose, $A^\mu$ isn't just some tuplet of real numbers - it is a very specific combination $(\phi,\vec A)$ of pre-existing concepts, namely the electrostatic scalar and vector potentials.

When you say that $A^\mu$ transforms like a vector, you are making a nontrivial statement about the scalar and vector potentials that a moving observer will require to explain a given field configuration, and you're making the affirmative claim that the scalar potential will acquire terms of the form $\gamma\vec v \cdot \vec A$ and vice versa, the same way that a four-position does.

Of course, that needs to be proved separately, with the details of the proof depending on what definition you've chosen for $A^\mu$, but if the object itself carries nontrivial meaning then its transformation will do so too.

Emilio Pisanty
  • 137,480
3

Suppose you have $4$ physical quantities $U_0$, $U_1$, $U_2$, and $U_3$. Given a well defined physical system $S$, these quantities are well defined. If you then perform a Lorentz transform, the system is considered from the point of view of another observer. Interpreted this way, we call this a passive transform. But you can just as well attribute the change due to the Lorentz transform to a change in the system. We call this an active transform. The two interpretations are equivalent, because what the other observer in the passive case sees is going to be assessed by that observer in the same way as how the original observer would have assessed it, had he/she seen the same thing.

Then, since the $U_j$ are well defined functions of the system, any change in the system induced by the Lorentz transform interpreted in the active way, defines how the functions $U_j$ will change. The way the $U_j$ will change is thus well defined, there is no a priori assumption that these quantities will transform like a 4-vector. We could e.g. have chosen $4$ quantities that each transform like scalar.

The proof of the quotient theorem that states that if $V^{\mu}U_{\mu}$ transforms like a scalar for any arbitrary four vector $V^{\mu}$ involves writing down just that fact:

$$V'^{\mu}U'_{\mu} = V^{\mu}U_{\mu}$$

And then you insert the Lorentz transform rule for the transform of $V^{\mu}$:

$$\Lambda^{\mu}_{\hphantom{\nu}\nu}V^{\nu}U'_{\mu} = V^{\mu}U_{\mu}$$

Then since this must hold for any arbitrary $4$-vector $V^{\mu}$, we can consider this for the particular case where $V^{\mu}$ is the unit vector pointing in some arbitrary $\rho$-direction, i.e. $V^{\mu} = \delta^{\mu}_{\hphantom{\rho}\rho}$:

$$\Lambda^{\mu}_{\hphantom{\nu}\nu}\delta^{\nu}_{\hphantom{\rho}\rho}U'_{\mu} = \delta^{\mu}_{\hphantom{\rho}\rho}U_{\mu}$$

Simplifying both sides, yields:

$$\Lambda^{\mu}_{\hphantom{\nu}\rho}U'_{\mu} = U_{\rho}$$

This is then the inverse transform, the transform from $U_{\mu}$ to $U'_{\mu}$ is given by:

$$U'_{\mu} = \Lambda_{\mu}^{\hphantom{\nu}\rho}U_{\rho}$$

Count Iblis
  • 10,396
  • 1
  • 25
  • 49
1

Vectors are elements of linear spaces. And every linear space has a basis. A 4-vector simply means a vector in a 4-dimensional space. The expression

$$ A^{\mu\,'} = \Lambda^{\mu}_{\nu} A^{\nu} $$

is a change of basis.

For completeness, let me elaborate on this last point. Fix two basis in this 4-dimensional vector space, $e_{\mu}$ and $\tilde{e}_{\alpha}$. This means that every vector $A$ can be written as

$$ A = A^{\mu} e_{\mu} \quad \text{or} \quad A = \tilde{A}^{\alpha}\tilde{e}_{\alpha} $$

The real (or complex) numbers $A^{\mu}$ and $\tilde{A}^{\alpha}$ are called coordinate representations of $A$. They represent the same object $A$, each just written in a different basis.

You can now consider a change of basis, that is a linear transformation on this vector space that takes the basis $\tilde{e}$ to the basis $e$:

$$ \tilde{e}_{\alpha} = \Lambda^{\phantom{\alpha}\mu}_{\alpha} e_{\mu} $$

so that the following equality holds

$$ A^{\mu} e_{\mu} = \tilde{A}^{\alpha} \tilde{e}_{\alpha} = \tilde{A}^{\alpha}\Lambda^{\phantom{\alpha}\mu}_{\alpha} e_{\mu} $$

implying

$$ A^{\mu} = \Lambda^{\phantom{\alpha}\mu}_{\alpha} \tilde{A}^{\alpha} $$

Comment. I think a good reference for this might be Schutz's book on general relativity, chapter 2.

OkThen
  • 824
1

So I know you already accepted an answer but in my opinion this is very important and is not discussed with our undergraduates enough:

The notion of "is a tensor" as we use it in physics is generally syntactic, not semantic.

That means that it is not a physical object which is a four-vector or not a four-vector, rather it is a vector equation which is either covariant or non-covariant, and the easiest way to write it in a covariant way is if all of the constituent entities "are tensors."

Here's what I mean in more detail: technically you have a geometrical space, and the inhabitants of that space are the true, semantic, $[m, n]$-tensors. There are a set of "scalars"1 and atop that are defined your "vectors"2 and atop that you can define coordinate systems3 and covectors and $[m,n]$-tensors in general4. That's where the "real" tensors live.

But when I say "this is a tensor" in physics what I mean is that this expression singles out one and exactly one tensor in the geometrical space. If it does, then that physical quantity "is a tensor," and if it does not, then it is not.

This is why we can say "a vector is anything that transforms like a vector." We mean "if you shift from coordinates $C$ to coordinates $C'$ in the geometrical space, we know how its vectors' components mix together. If an assortment of measurable numbers happens to mix together in the same way, then it can be associated with exactly one of these tensors and in that sense the assortment "is a tensor."

So the easiest example, though it may reach into a course you have not yet had, is a Christoffel symbol. A Christoffel symbol is a part of differential geometry which helps us take derivatives in curved spaces. A symbol like $\Gamma^a_{bc}$ certainly looks like a [1, 2]-tensor. It has numeric components just like one! Why is it famously "not a tensor"?

It's because: there does exist a tensor which has those components in the present coordinate system, and you can calculate what those components of that tensor must be in a transformed coordinate system, but if you derive the Christoffel symbol of that other coordinate system, it will not have those transformed components. So yes, the Christoffel symbol in some coordinate system $A$ happens to coincide with a tensor, but if you shift to a different coordinate system $B$ then you will discover that it was indeed just a coincidence that that particular geometrical entity was your $\Gamma$. The abstract notion of "Christoffel symbol" is defined in a way such that it might be embodied by a number of different tensors depending on the coordinate system, and that is why it is "not a tensor".

Do you see what I mean when I say that it is a syntactic concept? The equation does single out a set of numbers and that set of numbers is some tensor, the problem is that in different coordinate systems the same equation singles out a different entity and hence that expression is not a tensor expression.

So special relativity says when you accelerate towards a clock it appears to tick faster, proportional to both its distance to you and your acceleration. This is the only fundamental fact which special relativity adds to our physics; everything else can be derived from it. We happen to have a 4D Minkowski space where the abstract geometrical entities obey Lorentz transforms preserving a metric $\operatorname{diag}(1, -1, -1, -1)$. And if we work it out, the assembly of components $(ct, x, y, z)$ will, if you control for the fact that the geometric space doesn't know what "units" are, correspond to a single geometric entity in that space: if you transform those coordinates with this rule from special relativity, you will find that the new position and time components match the relativistic components. And thus we say that these components "are a four-vector."

  1. You can get special relativity from general relativity in a very boring limit. In general relativity you have some abstract space of "points" $\mathcal M$ and you must define a set of real-valued scalar fields $\mathcal S \subset \mathcal M \to \mathbb R$, which must be "smooth" in the sense that the set must be closed under what I call "$k$-functors", these are functions from $C^\infty(\mathbb R^k, \mathbb R)$ interpreted as acting "pointwise" on the output, e.g. for $k=2$ we'd have $f[s_1, s_2](p) = f\big(s_1(p),~s_2(p)\big)$. This set also defines your topology, hence how the space is connected. Note that this gives closure under pointwise addition and multiplication.
  2. In general relativity the vector fields are the Leibniz-linear maps $\mathcal V\subset \mathcal S\to\mathcal S$. So if $V$ is a vector field, this "Leibniz-linear" term means that for any $k$-functor, using $\bullet^{(i)}$ to mean "partial derivative of $\bullet$ with respect to its $i^\text{th}$ argument", we would say $$V\Big(f[s_1, s_2, \dots s_k]\Big) = \sum_{i=1}^n f^{(i)}[s_1\dots s_k]\cdot V(s_i).$$
  3. You technically need to assume a coordinate system in GR. Formally the axiom says that around any point there exists a neighborhood and $n$ scalar fields $c_{1,2,\dots n}$ such that any vector field can, within that neighborhood, be written as an $n$-functor $f[c_1,\dots c_n].$ Then one can uniquely identify a vector as a directional derivative with components $v_i = V(c_i).$ Those components are always scalar fields, mind.
  4. A covector is a linear map from vectors to scalars, $\operatorname{Hom}(\mathcal V \to \mathcal S)$ or however you want to notate it. An $[m, n]$-tensor is a multilinear map from $m$ covectors and $n$ vectors to a scalar. There is an axiom stating that there exists a metric $[0,2]$-tensor and a $[2, 0]$-tensor inverse to it, providing a bijection between the vector space and the covector space and more generally between all $[m,n]$-tensors with the same $m+n$. In addition to this, one needs an axiom that any $[n, 0]$-tensor can be written as some big sum of products of vectors, so that the space of tensors is not substantially more interesting than the products of the spaces of vectors and covectors.
CR Drost
  • 39,588