116

This is a question I've been asked several times by students and I tend to have a hard time phrasing it in terms they can understand. This is a natural question to ask and it is not usually well covered in textbooks, so I would like to know of various perspectives and explanations that I can use when teaching.

The question comes up naturally in what is usually students' second course in quantum physics / quantum mechanics. At that stage one is fairly comfortable with the concept of wavefunctions and with the Schrödinger equation, and has had some limited exposure to operators. One common case, for example, is to explain that some operators commute and that this means the corresponding observables are 'compatible' and that there exists a mutual eigenbasis; the commutation relation is usually expressed as $[A,B]=0$ but no more is said about that object.

This naturally leaves students wondering

what is, exactly, the physical significance of the object $[A,B]$ itself?

and this is not an easy question. I would like answers to address this directly, ideally at a variety of levels of abstraction and required background. Note also that I'm much more interested in the object $[A,B]$ itself than what the consequences and interpretations are when it is zero, as those are far easier and explored in much more depth in most resources.


One reason this is a hard question (and that commutators are such confusing objects for students) is that they serve a variety of purposes, with only thin connecting threads between them (at least as seen from the bottom-up perspective).

  • Commutation relations are usually expressed in the form $[A,B]=0$ even though, a priori, there appears to be little motivation for the introduction of such terminology.

  • A lot of stock is placed behind the canonical commutation relation $[x,p]=i \hbar$, though it is not always clear what it means.

    (In my view, the fundamental principle that this encodes is essentially de Broglie's relation $\lambda=h/p$; this is made rigorous by the Stone-von Neumann uniqueness theorem but that's quite a bit to expect a student to grasp at a first go.)

  • From this there is a natural extension to the Heisenberg Uncertainty Principle, which in its general form includes a commutator (and an anticommutator, to make things worse). Canonically-conjugate pairs of observables are often introduced, and this is often aided by observations on commutators. (On the other hand, the energy-time and angle-angular momentum conjugacy relations cannot be expressed in terms of commutators, making things even fuzzier.)

  • Commutators are used very frequently, for example, when studying the angular momentum algebra of quantum mechanics. It is clear they play a big role in encoding symmetries in quantum mechanics but it is hardly made clear how and why, and particularly why the combination $AB-BA$ should be important for symmetry considerations.

    This becomes even more important in more rigorous treatments of quantum mechanics, where the specifics of the Hilbert space become less important and the algebra of observable operators takes centre stage. The commutator is the central operation of that algebra, but again it's not very clear why that combination should be special.

  • An analogy is occasionally made to the Poisson brackets of hamiltonian mechanics, but this hardly helps - Poisson brackets are equally mysterious. This also ties the commutator in with time evolution, both on the classical side and via the Heisenberg equation of motion.

I can't think of any more at the moment but they are a huge number of opposing directions which can make everything very confusing, and there is rarely a uniting thread. So: what are commutators, exactly, and why are they so important?

Emilio Pisanty
  • 137,480

6 Answers6

62

Self adjoint operators enter QM, described in complex Hilbert spaces, through two logically distinct ways. This leads to a corresponding pair of meanings of the commutator.

The former way is in common with the two other possible Hilbert space formulations (real and quaternionic one): Self-adjoint operators describe observables.

Two observables can be compatible or incompatible, in the sense that they can or cannot be measured simultaneously (corresponding measurements disturb each other when looking at the outcomes). Up to some mathematical technicalities, the commutator is a measure of incompatibility, in view of the generalizations of Heisenberg principle you mention in your question. Roughly speaking, the more the commutator is different form $0$, the more the observables are mutually incompatible. (Think of inequalities like $\Delta A_\psi \Delta B_\psi \geq \frac{1}{2} |\langle \psi | [A,B] \psi\rangle|$. It prevents the existence of a common eigenvector $\psi$ of $A$ and $B$ - the observables are simultaneously defined - since such an eigenvector would verify $\Delta A_\psi =\Delta B_\psi =0$.)

The other way self-adjoint operators enter the formalism of QM (here real and quaternionic versions differ from the complex case) regards the mathematical description of continuous symmetries. In fact, they appear to be generators of unitary groups representing (strongly continuous) physical transformations of the physical system. Such a continuous transformation is represented by a unitary one-parameter group $\mathbb R \ni a \mapsto U_a$. A celebrated theorem by Stone indeed establishes that $U_a = e^{iaA}$ for a unique self-adjoint operator $A$ and all reals $a$. This approach to describe continuous transformations leads to the quantum version of Noether theorem just in view of the (distinct!) fact that $A$ also is an observable.

The action of a symmetry group $U_a$ on an observable $B$ is made explicit by the well-known formula in Heisenberg picture:

$$B_a := U^\dagger_a B U_a$$

For instance, if $U_a$ describes rotations of the angle $a$ around the $z$ axis, $B_a$ is the analog of the observable $B$ measured with physical instruments rotated of $a$ around $z$.

The commutator here is a first-order evaluation of the action of the transformation on the observable $B$, since (again up to mathematical subtleties especially regarding domains):

$$B_a = B -ia [A,B] +O(a^2) \:.$$

Usually, information encompassed in commutation relations is very deep. When dealing with Lie groups of symmetries, it permits to reconstruct the whole representation (there is a wonderful theory by Nelson on this fundamental topic) under some quite mild mathematical hypotheses. Therefore commutators play a crucial role in the analysis of symmetries.

24

I'd like to expand a little bit on the interpretation of commutators as a measure of disturbance (related to incompatibility, as touched on in the other answers). My interpretation of the commutator is that $[A,B]$ quantifies the extent to which the action of $B$ changes the value of the dynamical variable $A$, and vice versa.

Let's assume that $A$ is a self-adjoint operator with a discrete non-degenerate spectrum of eigenvalues $\{a\}$ with associated eigenkets $\lvert a\rangle$. Then you can show that, for any operator $B$, the following decomposition exists $$ B = \sum_{\Delta} B(\Delta),$$ such that $$[A,B(\Delta)] = \Delta B(\Delta),$$ where $B(\Delta)$ is defined below. Viewing the commutator $[A,.]$ as a linear operator, this has the form of an eigenvalue equation. The eigenvalues $\Delta$ are given by differences between pairs of eigenvalues of $A$, e.g. $\Delta = a'-a$. The specific form of the eigenoperators $B(\Delta)$ is $$ B(\Delta) = \sum_{a} \langle a+\Delta\rvert B\lvert a\rangle \;\lvert a+\Delta\rangle\langle a\rvert.$$ This demonstrates that the $B(\Delta)$ are "ladder operators" which act to increase the value of the variable $A$ by an amount $\Delta$. The commutator thus induces a natural decomposition of $B$ into contributions that change the value of $A$ by a given amount. A simple example is the well known commutation relation between spin$-1/2$ operators: $$[\sigma^z,\sigma^x] = \mathrm{i}2\sigma^y = +2\sigma^+ - 2\sigma^-.$$ This tells you that $\sigma^x$ has two parts, which either increase or decrease the spin projection onto the $z$ axis by two "units", which in this case means $\pm 2\times\frac{\hbar}{2} = \pm \hbar$.

In general, the full commutator is $$ [A,B] = \sum_{\Delta} \Delta B(\Delta). $$ The $B(\Delta)$ are linearly independent$^{\ast}$, therefore the commutator vanishes only if $B(\Delta) = 0$ for all $\Delta \neq 0$, i.e. if $B$ does not change the value of $A$. If $[A,B]\neq 0$, one can get a measure of how much $B$ changes $A$ by computing the Hilbert-Schmidt norm (squared) of the commutator: $$\mathrm{Tr}\left\{[A,B]^{\dagger}[A,B]\right\} = \sum_{a,a'}(a-a')^2\lvert\langle a\rvert B \lvert a' \rangle\rvert^2. $$ This is the sum of the (squared) matrix elements of $B$ which link different eigenstates of $A$, weighted by the corresponding change in eigenvalues (squared). So this clearly quantifies the change in $A$ brought about by applying $B$.

Now the not-so-obvious part: what does "changing $A$ by applying $B$" mean physically? As noted by Valter, evolution and transformations in QM are carried out formally by applying unitary operators generated by observables, not by applying the observables themselves. This relates to the above decomposition in the following way. Suppose that we take $A$ to be the Hamiltonian $H$. Then it is straightforward to show that the evolution of $B$ in the Heisenberg picture is given by $$B(t) = e^{i H t} B e^{-i H t} = \sum_{\Delta} e^{i\Delta t} B(\Delta), $$ where here $\Delta$ are the Bohr frequencies of the system under consideration. The jump operators $B(\Delta)$ can be interpreted as the Fourier components of the operator-valued function $B(t)$. In the context of perturbation theory, we often approximate the effect of unitary evolution by the application of a Hermitian operator (the perturbing Hamiltonian), in which case the interpretation of the jump operators is clear: they describe the transitions between energy eigenstates caused by the perturbation $B$. The oscillating time dependence ultimately leads to energy conservation as a frequency-matching condition.

This is hardly a complete answer to the rather optimistic question of "what do commutators mean physically". However it might provide some food for thought for the curious student.


$^{\ast}$This follows since the $B(\Delta)$ are orthogonal with respect to the Hilbert-Schmidt inner product: $$ \mathrm{Tr}\left\{ B(\Delta)^{\dagger} B(\Delta') \right\} = \delta_{\Delta,\Delta'} \sum_a \lvert \langle a \rvert B \lvert a+\Delta \rangle \rvert^2,$$ where the Kronecker delta symbol $\delta_{\Delta,\Delta'}$ equals 1 if $\Delta = \Delta'$, and 0 otherwise.

14

At a basic level :

1) if $[A,B]=0$, and if $A$ and $B$ are infinitesimal generators of a symmetry (so also conserved quantities), this means that both $A$ is invariant by $B$, and $B$ is invariant by $A$.

For instance, $[H,J_z]=0$, means that the angular momentum is conserved during time evolution, and that the hamiltonian is invariant by rotation.

As @Valter Moretti says, a non-null commutator $[A,B]$ measures the deviation from (both) symmetries.

2) Commutators of type $[A, B] = \pm B$, if $A$ is associated to a discrete spectrum, means that $B$ is a raising/lowering operator, with a " $A$-charge" $\pm 1$.

An obvious example is $[J_z, J_\pm]= \pm J_\pm$

3) Commutation relations of type $[\hat A, \hat B]= i \lambda$, if $ \hat A$ and $\hat B$ are observables, corresponding to classical quantities $a$ and $b$, could be interpreted by considering the quantities $I = \int a \,db$ or $J = \int b \,da$. These classical quantities cannot be traduced in quantum observables, because the uncertainty on these quantities is always around $\lambda$.

For instance, $[\hat x,\hat p] = i \hbar$ shows that there is no quantum observable corresponding to the action $S =\int (\vec p\,d \vec x - E\, dt)$.

Trimok
  • 18,043
10

While this explanation isn't very "physical" and isn't likely to be useful to a beginning QM student, I think that all the important physics contained within the commutator ultimately springs from the Zassenhaus formula $$e^{-it \left( \hat{A} + \hat{B} \right)} = e^{-it\hat{A}} e^{-it\hat{B}} e^{\frac{1}{2} it^2 \left[\hat{A}, \hat{B} \right]} \cdots,$$ where the "$\cdots$" contains terms cubic and higher in $\hat{A}$ and $\hat{B}$ and can be expressed as a product of exponentials of linear combinations of nested commutators. If we think of $\hat{A}$ and $\hat{B}$ as Hermitian operators (which is what almost always goes into commutators) corresponding to physical observables, then this formula concretely shows us their failure to commute causes them to "interfere with each other" in a subtle way, so that their physical effects can't be separated. That is, the unitary transformation that their sum generates (e.g. a time-translation or symmetry operator) isn't simply the combined effect of each individual "piece" of the generator acting alone. All the strangeness of quantum mechanics follows from this simple fact. Moreover, the commutator is the leading-order deviation from the classical result.

tparker
  • 51,104
5

Let's start with the Schrödinger equation: $$\mathrm i\hbar\frac{\partial}{\partial t}\left|\psi\right> = H\left|\psi\right>$$ Since $H$ is self-adjunct, this also implies $$\mathrm -i\hbar\frac{\partial}{\partial t}\left<\psi\right| = \left<\psi\right|H$$ Now consider the most general quantum state, expressed by a density matrix $$\rho = \sum_k p_k\left|\psi_k\middle>\middle<\psi_k\right|$$ We want to know the time derivative of the density matrix. Obviously the time derivative is linear, and we can also use the product rule to obtain $$\begin{aligned}\frac{\partial\rho}{\partial t} &= \sum_k p_k\left(\left(\frac{\partial}{\partial t}\left|\psi\right>\right) \left<\psi\right| + \left|\psi\right>\left(\frac{\partial}{\partial t}\left<\psi\right|\right)\right)\\ &= \sum_k p_k\frac{1}{\mathrm i\hbar}\left(H\left|\psi_k\middle>\middle<\psi_k\right|-\left|\psi_k\middle>\middle<\psi_k\right|H\right)\\ &= \frac{1}{\mathrm i\hbar}(H\rho - \rho H)\\ &= \frac{1}{\mathrm i\hbar}[H,\rho] \end{aligned}$$ So you see that here the commutator enters quite naturally.

Next, consider an observable $A$, and let's look at the time dependence of its expectation value $\left<A\right>=\operatorname{tr}(A\rho)$.

Using the linearity and cyclic invariance of the trace, we get $$\begin{aligned} \frac{\partial}{\partial t}\left<A\right> &= \frac{\partial}{\partial t}\operatorname{tr}(A\rho)\\ &= \operatorname{tr}\left(\frac{\partial A}{\partial t}\rho\right) + \operatorname{tr}\left(A\frac{\partial\rho}{\partial t}\right)\\ &= \left<\frac{\partial A}{\partial t}\right>+\frac{1}{\mathrm i\hbar}\operatorname{tr}(A[H,\rho])\\ &= \left<\frac{\partial A}{\partial t}\right>+\frac{1}{\mathrm i\hbar}\left(\operatorname{tr}(AH\rho) - \operatorname{tr}(A\rho H)\right)\\ &= \left<\frac{\partial A}{\partial t}\right>+\frac{1}{\mathrm i\hbar}\left(\operatorname{tr}(AH\rho) - \operatorname{tr}(HA\rho)\right)\\ &= \left<\frac{\partial A}{\partial t}\right>+\frac{1}{\mathrm i\hbar}\operatorname{tr}([A,H]\rho)\\ &= \left<\frac{\partial A}{\partial t}\right> + \frac{1}{\mathrm i\hbar}\left<[A,H]\right> \end{aligned}$$ Now consider especially a conserved quantity that does not explicitly depend on time (that is, $\partial A/\partial t=0$). Of course if the quantity is conserved, it means its expectation value is conserved. The above equation then immediately gives $\left<[A,H]\right>=0$, and since this must be true for arbitrary $\rho$, we get $[A,H]=0$. That is, a conserved quantity commutes with the Hamiltonian. Note that all we've done here is shuffling around the commutator in the trace.

Now let's take a closer look at the Hamiltonian. In classical mechanics, for non-relativistic problems we can write the Hamiltonian as $$H = \frac{p^2}{2m} + V(x)$$ and get the equation of motion $$\begin{aligned} \dot x &= \frac{\partial H}{\partial p} = \frac{p}{m}\\ \dot p &= -\frac{\partial H}{\partial x} = -V'(x) \end{aligned}$$ Now let's try if we can get that at least on average with quantum mechanics. With the equation for averages, we have (since neither $x$ nor $p$ depend explicitly on time) $$\begin{aligned} \frac{\partial}{\partial t}\left<x\right> &= \frac{1}{\mathrm i\hbar}\left<[x,H]\right>\\ &= \frac{1}{\mathrm i\hbar}\frac{1}{2m}\left<[x,p^2]\right> + \frac{1}{\mathrm i\hbar}\underbrace{\left<[x,V(x)]\right>}_{=0}\\ &= \frac{1}{2m\mathrm i\hbar}\left(\left<[x,p]p\right> + \left<p[x,p]\right>\right) \stackrel{!}{=} \frac{1}{m}\left<p\right> \end{aligned}$$ Now it is obvious that you get the right result if $[x,p]=\mathrm i\hbar$. Also, $$\begin{aligned} \frac{\partial}{\partial t}\left<p\right> &= \left<[p,H]\right>\\ &= \frac{1}{2m\mathrm i\hbar}\underbrace{\left<[p,p^2]\right>}_{=0} + \frac{1}{\mathrm i\hbar}\left<[p,V(x)]\right> \stackrel!= \left<-V'(x)\right> \end{aligned}$$ It is not hard to check that this result is obtained if $p=-\mathrm i\hbar\partial/\partial x$, which also gives the commutation relation we just derived.

About the connection with symmetries and uncertainty relations you've already gotten answers (and it's quite late in the night now), so I'll stop here.

celtschk
  • 2,253
3

It may be helpful to assign the students following HW problem :

Suppose $A$ and $B$ be two observables

i) What is the necessary condition that $A$ and $B$ can be simultaneously measured in an experiment without any uncertainty ?

ii) Write down all the second degree polynomials in $A$ and $B$ which are again observables.

iii) Suppose A be the Hamiltonian**. Time evolve a state $|\psi\rangle$ for a time $t$ under $A$, and denote the state so obtained as $|\psi(t)\rangle$. Can we express $\displaystyle\frac{d\langle\psi(t)|B|\psi(t)\rangle}{dt}$ as $\langle\psi(t)|\mathcal{O}|\psi(t)\rangle$ for some observable $\mathcal{O}$ ? If yes, find $\mathcal{O}$.

** In this problem we may also take $A$ to be some other symmetry generator other than Hamiltonian.


Added later :

  • When the commutator vanishes the two observables can be simultaneously measured in an experiment without uncertainty (this follows from the axioms of QM).
  • The expectation value of commutator $i[H,A]$ (where H is the Hamiltonian) in a state tells the time rate of change of the expectation value of $A$ in that state. More generally, the expectation value of the commutator $i[B,A]$ in a state is related to the infinitesimal change in the expectation value of $A$ in that state, under the one parameter symmetry generated by $B$.
  • For two given observables $A$, and $B$, their (i*) commutator $i[A,B]$ and anticommutator $\{A,B\}$ are again observables. However, the commutator appears more often in QM problems (and perhaps is more significant) than anticommutator because of the above two points.
user10001
  • 2,097