Is the Born rule usually regarded an axiom in quantum mechanics?

Question

The statement

For simplicity, let's consider a finite-dimensional Hilbert space. (The question can probably be generalized, but I don't know enough about mathematical QM to properly do so.)

Let $A\colon H\to H$ be some observable (self-adjoint operator) with eigenvalues $\lambda_1,\ldots,\lambda_n$. Recall that $H$ is the direct sum of the eigenspaces: \begin{equation}\tag{1} H=\bigoplus_{i=1}^n H_i \end{equation} In other words, \begin{align} \Phi\colon H_1\times\cdots\times H_n&\to H\\ (\Psi_1,\ldots,\Psi_n)&\mapsto \Psi_1+\ldots+\Psi_n \end{align} is a bijection and we can consider the projection $$P_i\colon H\to H_i$$ for each $i=1,\ldots,n$.

Born's rule says that$$p_i:=\frac{\langle P_i\Psi|\Psi\rangle}{\langle\Psi|\Psi\rangle}=\frac{\langle \Psi|P_i\Psi\rangle}{\langle\Psi|\Psi\rangle}=\frac{\langle P_i\Psi|P_i\Psi\rangle}{\langle\Psi|\Psi\rangle}\in[0,1]$$ is the probability to measure $\lambda_i$ if our system is in the state $\Psi$.

My question

Would the formulation of Born's rule above typically be regarded as an axiom or as a result of some more fundamental assumptions?

score 12 · Accepted Answer · 2021-09-11T12:36:32.043

The correct statement is that the probability that a measurement of an observable represented by a Hermitian operator $A$ (with non-degenerate spectrum) over a state $\vert\psi\rangle$ would yield an eigenvalue $\lambda_i$ is given by

\begin{align}p_i=\frac{\langle\psi\vert\psi_i\rangle\langle\psi_i\vert\psi\rangle}{\langle\psi\vert\psi\rangle}\end{align} where $\vert\psi_i\rangle$ is the normalized eigenstate of the operator $A$ corresponding to the eigenvalue $\lambda_i$. However, this does not require that $\vert\psi\rangle=\sum_i\vert\psi_i\rangle$. The state vector $\vert\psi\rangle$ can be the most generic normalizable state and thus, would be represented, in general, as a generic linear combination $\vert\psi\rangle=\sum_ic_i\vert\psi_i\rangle$ where $c_i\in\mathbb{C}$.

This statement is called the Born rule.

It is needed to be supplied with a closely related axiom that goes by the name of the collapse postulate or the wavepacket reduction postulate to give a "complete" picture of what happens when you perform a measurement. It says that the aforementioned measurement evolves the state $\vert\psi\rangle$ to an eigenstate $\vert\psi_i\rangle$ corresponding to the outcome $\lambda_i$.

All of this can be made a bit more general to take care of measurements of operators with degenerate spectra using the projection operators, but the basic idea is already captured here. In the case of the measurement of an operator $A$ with distinct eigenvalues $\lambda_i$ such that $A=\sum_i\lambda_i\mathbb{P}_i$ where the $\mathbb{P}_i$s are the projection operators corresponding to the $i^\mathrm{th}$ eigensubspace, the probability of the outcome of the measurement yielding $\lambda_i$ is given by

\begin{align}p_i=\frac{\langle\psi\vert\mathbb{P}_i\vert\psi\rangle}{\langle\psi\vert\psi\rangle}\end{align}

The wavepacket reduction postulate now says that the aforementioned measurement evolves the state $\vert\psi\rangle$ to the state $\frac{\mathbb{P}_i\vert\psi_i\rangle}{\langle\psi\vert\mathbb{P}_i\vert\psi\rangle}$ corresponding to the measurement outcome being $\lambda_i$. Notice that the denominator here is needed to ensure that the resultant state is normalized.

In standard textbook quantum mechanics, both of these are always, as far as I know, taken to be basic axioms. One can formulate their quantum mechanics using a different mathematical formalism but they still have to provide some translation of these axioms as axioms in their framework as well -- as long they really are just another formulation of the standard textbook quantum mechanics in their physical content.

Having said that, there have been attempts, starting in 1957 and continuing to this day, to derive the Born rule. There have been mainly three approaches to attempt the derivation:

Measure-Theoretic/Frequentist Approaches
- The rather measure-theoretic and mathematical Gleason's theorem of 1957 states that in a Hilbert space with $d>3$, the only probability measure consistent with the other axioms of quantum mechanics (linearity, etc.) is the one given by the Born rule.
- The 1968 proof of the Born rule by Hartle, in effect, shows that the Born rule is the quantum mechanical version of the weak law of large numbers.
- The 1989 proof of the Born rule by Fahri, Goldstone, and Gutman shows that the Born is the quantum mechanical version of the strong law of large numbers.
Symmetry-Based Approaches
- The 2005 paper by Zurek derives the Born rule using an argument based on envariance which is an invariance that systems entangled with an environment exhibit.
- The 2015 paper by Carroll and Sebens derives the Born rule in the context of many-worlds formulation of quantum mechanics. They use the "epistemic separability principle" which is just a weird/fancy way of saying that the probability of a measurement outcome shouldn't depend on the evolution of the environment that is decoupled and unentangled from the system.
Decision-Theoretic Approaches
- I simply mention them for the sake of completeness and to invite a more informed reader to feel free to edit the answer and fill in the details.

Now, none of these attempts have been accepted, at least so far, by the community as true derivations of the Born rule. Basically, in standard quantum mechanics, there is no plausible way to do away with the wave-packet reduction axiom (which ought to accompany the Born rule for probabilities to make sense, otherwise there would simply be deterministic evolution according to the Schrodinger equation). So, even if one shows that the Born rule is the only consistent probability measure for the Hilbert spaces of quantum mechanics, it does not come in contact with the physical claims made by the standard axioms. Another approach, in particular, the papers by Carroll and Deutsch (the latter of whom has worked on decision-theoretic approaches) are in the framework of the many-words formulation. There, you can make sense of wavepacket reduction as the reduction of the relative state of a system with respect to an observer without violating underlying unitarity. However, it is conceptually difficult to derive the Born rule there. One reason is that the naive branch-counting leads to a contradiction with the Born rule. And the more sophisticated epistemic approaches have been criticized for either being circular or sloppy.

You can see the critiques of the derivations of the Born rule in papers by Adrian Kent, 1997 and 2014. I would also recommend having a look at this answer to my recent question by @ChiralAnomaly for some general comments on the derivations of the Born rule.

reductionista · Answer 2 · 2023-03-09T15:49:08.387

Many philosophers, and even some physicists still claim that the Born Rule is an independent postulate, rather than a consequence of the more fundamental postulates. I definitely think that's wrong.

There's no question it's derivable from the other postulates, as I'll show explicitly below. The objections you'll find to it mostly have to do with confusion over the meaning of probability. In my opinion, the people who don't accept the derivation are failing to understand what probability is.

If any of the objections Adrian Kent or others have made to it were true, then you wouldn't be able to derive probability in classical mechanics either... since both of them rely on exactly the same technique.

In classical statistical mechanics, you postulate the existence of phase space as a vector space with a Euclidean norm (or equivalently an "inner product"). And you also assume ergodicity (that equal volumes in phase space are equally likely).

The situation is exactly the same in quantum mechanics (which is really nothing more than a slight generalization of statistical mechanics). You postulate the existence of a Hilbert space, which is mathematically equivalent to phase space anyway. (In fact, you can use either the Hilbert space formalism or the phase space formalism for either classical stat mech or quantum mechanics, and there is no difference whatsoever in the predictions each of them make--one just happens to be more convenient in one case, while the other is more convenient in the other case). You postulate that it has an inner product defined on it, which gives you a Euclidean metric and a Euclidean measure... all exactly the same as in classical physics. The postulates lead in exactly the same way to the Born rule, and the derivation is no different from how probability is derived/defined in classical physics. In other words, the standards of the critics are demonstrably inconsistent/hypocritical.

I'll go through the derivation carefully here, so that anyone familiar with undergraduate level quantum mechanics, and is comfortable with college level mathematics should be able to see that it's a valid proof.

I start by stating all the postulates you do need explicitly, so it's clear there is no "circular logic" going on here, as some critics have also claimed.

Basic Postulates of Quantum Mechanics

Quantum mechanical states are represented by vectors in a complex Hilbert space, while observables are represented by self-adjoint linear operators on those vectors.
If a quantum system is in an eigenstate of some observable, then measuring that observable will yield the eigenvalue of the operator representing that observable.
The time evolution of a quantum state is given by a unitary operator U(t, t’)
The states of a composite system are represented by linear combinations of tensor products of the states of each of its subsystems.

Postulate 1 defines what a state and an observable is. Postulate 2 defines what a measurement is. Postulate 3 places a general constraint on time-evolution: it must be unitary. This is equivalent to the (general) Schrödinger Equation: $\displaystyle i\frac{\partial}{\partial t} \psi = H \psi $ which follows from the definition of H as the self-adjoint operator satisfying $U(t, t’) = \exp{\Big(-iH(t-t’)\Big)}$.

Postulate 4 defines how to build a complex system out of simpler subsystems, or how multi-particle systems relate to single-particle systems.

Notice there is nothing about probability in any of these postulates. In fact, they state nothing about how to predict measurement outcomes for states which aren’t eigenstates of the observable being measured. (Or equivalently, how to compute the expected value of an operator in an arbitrary state.) That’s what we will derive here.

Notation and Setup

Because a Hilbert space is a vector space, an immediate corollary of the postulate 1 is the superposition principle: if $|a\rangle$ and $|b\rangle$ are both states in the Hilbert space, then so is any linear combination of them (this is implied by the fact that it’s a vector space):

$| \psi \rangle = \alpha |a \rangle + \beta |b \rangle$

Also because it’s a Hilbert space, there is an inner product defined. We can write this as:

$$\langle \psi|a\rangle = \alpha$$

$$\langle \psi|b\rangle = \beta$$

There is also an adjoint operation defined on each element of a Hilbert space, which has a distributive property which reverses the order of any product while conjugating all complex numbers involved:

$$\langle a|\psi\rangle = \alpha^*$$

$$\langle b|\psi\rangle = \beta^*$$

Postulate 4 (unitary time evolution) implies that if we start with a state normalized to 1, we’ll always end up with a state normalized to 1:

$\langle\psi|\psi\rangle = 1$

Since none of the observable results of experiments depend on the normalization, we are free to normalize all states to 1, and they will stay that way at all times. If the two states $|a\rangle$ and $|b\rangle$ defined above are orthogonal (their inner product is zero), then the superposition principle implies:

$|\alpha|^2 + |\beta|^2 = 1$.

Now we have a formula for the state of any 2-state system in terms of an orthogonal basis $|a\rangle$ and $|b\rangle$, with a relationship between the components.

Consider the observable $P_A$ , which we define as the operator with eigenvalue 1 for the eigenstate $|a\rangle$ and eigenvalue 0 for state $|b\rangle$, including state. Physically, this corresponds to an experiment which results in a yes or no answer to the question “is the system in state $|a\rangle$?” 1 means yes, 0 means no.

Since the state $|\psi\rangle$ above is not an eigenstate of this operator, we can make no definite prediction for the outcome. What should we expect the outcome to be then?

Science only deals with repeatable experiments. No matter whether the outcome gives us 1 or 0, a single experiment isn’t enough to confirm or disprove the rules of quantum mechanics—both outcomes are entirely consistent with all postulates. (This is true in all versions of quantum mechanics, regardless of whether the Born rule is included as an explicit postulate or derived from the others.) So the meaningful question is: if we prepare a large number of particles all in the same initial state $|\psi\rangle$, and repeated the same measurement on each of them, what fraction of them would we find in state$|a\rangle$?

Our rule for composing subsystems tells us how to construct the state of the entire system out of the states of all the individual particles/qubits. So we have all the postulates we need in order to calculate the answer in the limit as the number of particles N approaches infinity. Spoiler alert: it works out to be: $|\alpha|^2$

The Proof Itself

First, note that:

$\langle \psi | P | \psi \rangle = |\alpha|^2$

This follows trivially from the first 3 postulates. However, there is still no probability involved yet. What we need to show is that this corresponds to the expected value for the result of measuring property $P_A$ in state $\psi$, ie that:

$\langle P_A \rangle_\psi = \langle \psi | P_A | \psi \rangle$

This will be the difficult part, the non-trivial part of the Born Rule. To prove it, we will need to use postulate 4, which tells us how to represent a collection of N particles, where each is in the same state:

$| \displaystyle \chi \rangle = \bigotimes_{i=0..N} |\psi\rangle_i = (\alpha |a\rangle + \beta |b\rangle)^N$

Now consider the operator $P$ defined by:

$P = \displaystyle\frac{1}{N}\sum_i P_{A_i}$

where $P_{Ai}$ is the operator representing the observable “is the i’th particle in state $|a\rangle$?”

The action of $P$ on a state where some fraction $p = \frac{k}{N}$ of the N particles are in state $|a\rangle$ and the rest ($1-p = \frac{N-k}{N}$) are in state $|b\rangle$ is to count the number of particles in state $|a\rangle$ and divide by N:

$P |k\rangle = \frac{k}{N} | k \rangle = p | k \rangle$

Our entire Hilbert space consists of $2^N$ orthogonal states, and when we expand out the product we’ll have one term for each of these states. We could label them by a binary string $z \in Z$, where $Z$ is the set of all binary strings of length $N$.

Better, we can group these basis vectors into subsets, where each subset $Z_k$ contains only the states where exactly $k$ particles are in the $|a\rangle$ state and $N-k$ are in the $|b\rangle$ state. Then there are $\frac{N!}{k!(N-k)!}$ states $z \in Z_k$ in the subset labelled by k.

Summing over all of those for all values of k recovers the full set of $2^N$ states:

$\displaystyle I = \sum_{z \in Z} |z\rangle \langle z | = \sum_k\sum_{z \in Z_k}| z \rangle\langle z |$

Since $\chi$ is symmetric under permuation of particle indices, we know that it exists in a subspace of symmetrized states. This subspace is spanned by a basis of only N states, which I’ll refer to as the “k basis”:

$|k\rangle = \sqrt{\frac{k!(N-k)!}{N!}} \sum_{z \in Z_k} |z\rangle$

The normalization has been chosen so that:

$\langle k | k \rangle = \langle z | z \rangle = 1$

This means the outer product of k states looks like:

$|k\rangle \langle k| = \frac{1}{C(N,k)} \sum_{z \in Z_k} |z\rangle\langle z|$

Writing out the components of $ |\chi\rangle $ in the k-basis ends up looking a lot like the binomial theorem:

$\displaystyle |\chi \rangle = \sum_{k=0}^N\alpha^k \beta^{N-k} \sum_{z \in Z_k} |z\rangle = \sum_{k=0}^N \sqrt{\frac{N!}{k!(N-k)!}} \alpha^k \beta^{(N-k)}|k\rangle$

The operator P is diagonal in this basis, since all of these k-vectors are mutually orthogonal eigenstates of P. Since we now have an expression for $\chi$ in this basis, that makes it easy to see what happens when we operate on it with P:

$\displaystyle |\chi^{'} \rangle = P|\chi \rangle = \frac{1}{N}\sum_{k=0}^N \sqrt{\frac{N!}{k!(N-k)!}} \alpha^k \beta^{N-k} k|k\rangle$

It sure doesn’t look like $|\chi \rangle$ is an eigenvector, since each component got multiplied by a different value of k. But let’s see how close we got. We can do this by computing the angle by which P has rotated.

$\langle \chi|\chi\rangle$ is an eigenvector of P if and only if the angle between them is 0, where the angle between them is related to their inner product:

$\cos \theta = \frac{\langle \chi | \chi^{‘} \rangle}{\langle \chi | \chi \rangle \langle \chi^{‘} | \chi^{‘} \rangle}$

The length of the $|\chi \rangle$ vector should already be normalized to 1 ($\langle \chi | \chi \rangle = 1,$ since all we did was take a tensor product of N states, each of which were already normalized. But to find $\theta$, we still need both $\langle \chi^{‘}| \chi^{‘}\rangle$ and $\langle \chi|\chi^{'}\rangle.$

Starting with the first:

$\displaystyle \langle \chi^{'} | = \langle \chi | P = \frac{1}{N}\sum_{j=0}^N \langle j|j(\alpha^{*})^j(\beta^{*})^{N-j} \sqrt{\frac{N!}{j!(N-j)!}}$

$\displaystyle \langle \chi^{’} | \chi^{’} \rangle = \frac{1}{N^2}\sum_{k=0}^N \sum_{j=0}^N \sqrt{\frac{N!}{k!(N-k)!}}\sqrt{\frac{N!}{j!(N-j!)}}(\alpha^{*})^j (\beta^{*})^{N-j} \alpha^{k} \beta^{N-k} jk\langle j | k\rangle $

Using the orthogonality of the k-basis vectors $\langle k | j \rangle = \delta_{jk}$, this reduces to:

$\displaystyle \langle \chi^{’} | \chi^{’} \rangle = \frac{1}{N^2}\sum_{k=0}^N \frac{N!}{k!(N-k)!} (|\alpha|^2)^k (|\beta|^2)^{N-k} k^2$

Or in terms of $p = k/N, \delta p = 1/N$:

$\displaystyle \langle \chi^{’} | \chi^{’} \rangle = \sum_{p=0}^1 \frac{N!}{(pN)!((1-p)N)!}(|\alpha|^2)^{pN} (|\beta|^2)^{(1-p)N} p^2 \delta p$

Where the sum over p takes the values $p = \{0, 1/N, {2/N}, {3/N},\dots N/N\} = \{0, \delta p, 2\delta p, 3 \delta p, \dots, N \delta p\}$.

For large N, we can replace $\delta p$ with $dp$ and write it as an integral: $\displaystyle \langle \chi^{’} | \chi^{’} \rangle = \int_0^1 \frac{N!}{(pN)!((1-p)N)!} (|\alpha|^2)^{pN} (|\beta|^2)^{(1-p)N} p^2 dp$

Also, for large N:

$\frac{N!}{(p N)!((1-p)N)!} \approx \frac{1}{\sqrt{2\pi N p(1-p)}} e^{-N(p \ln p + (1-p) \ln (1-p)}$

Making that substitution, as well as $(|\alpha|^2)^{pN} = e^{pN \ln{|\alpha|^2}}$ and the same for the $\beta$ allows us to write this as:

$\displaystyle \langle \chi^{‘} | \chi^{‘} \rangle = \int_0^1 \frac{1}{\sqrt{2\pi N p(1-p)}} e^{-N f(p)} p^2 dp $

where: $f(p) = p \ln p + (1-p) \ln (1-p) - p\ln{|\alpha|^2} - (1-p) \ln{|\beta|^2}$

The math for $\langle \chi | \chi^{‘} \rangle$ is very similar, the only difference is that there’s no P acting to the left on $\langle \chi|$, so instead of $\langle j |\frac{jk}{N^2}|k\rangle$ in the double sum we just have $\langle j | \frac{k}{N} | k \rangle$. In other words, instead of $p^2 dp$ we get $p ~ dp$:

$\displaystyle \langle \chi | \chi^{‘} \rangle = \int_0^1 \frac{1}{\sqrt{2\pi N p(1-p)}} e^{N f(p)} p ~ dp $

For large N, the arguments of these integrals approach a Gaussian shape (bell curve), becoming more and more peaked around the global minimum of $f(p)$. In fact, in the limit as N approaches infinity, the width of the Gaussian approaches 0 and we get a Dirac delta function:

$ \displaystyle\delta (p - p_0) = \lim_{N \rightarrow \infty} \sqrt{\frac{N}{2\pi \sigma^2}}\int \exp{\Big[\frac{-N}{2\sigma^2}(p-p_0)^2\Big]}$

Since part of the integrand is a Dirac delta function, we can evaluate the integral easily by using:

$\int g(p) \delta(p - p_0) dp = g(p_0)$

It doesn’t matter what $\sigma$ is (the curvature of the function near its minimum), because we’ll get the same thing on the top and bottom of the fraction we’re using to compute $\cos{\theta} $. All we need is the global minimum $p_0$ , and the rest will cancel out. We can find the minimum by setting the derivative of $f(p)$ to zero:

$f^{‘}(p) = \ln{p} - \ln{(1-p)} - \ln{|\alpha|^2} + \ln{|\beta|^2} = 0$

$\ln{\frac{p_0}{(1-p_0)}} = \ln{\frac{|\alpha|^2}{|\beta|^2}}$

$\frac{p_0}{1-p_0} = \frac{|\alpha|^2}{\sqrt{1 - |\alpha|^2}}$

$p_0 = |\alpha|^2$

Because the only difference between $\langle \chi^{‘} | \chi^{‘}\rangle$ and $\langle \chi^{‘} | \chi \rangle$ is that one is multiplied by $p$ and the other by $p^2$, the Dirac delta turns both of those into $p_0$, so after the rest cancels out we’re left with:

$\displaystyle cos{\theta} = \frac{p_0}{\sqrt{1 \cdot p_0^2}} = 1$

$\theta = 0$

In other words, we’ve proven now that $|\chi^{‘} \rangle $ and $|\chi$ point in exactly the same direction. Therefore, $\chi$ is an eigenstate of P, and furthermore:

$\langle \chi^{‘}|\chi\rangle = \langle \chi|P|\chi\rangle = p = p \langle \chi|\chi\rangle$

Ergo, in the large N limit the eigenvalue of P in the state $\chi$ is $p = |\alpha|^2$. P is the observable representing the fraction of particles in state $|a\rangle$, so we have proven that if we repeat an experiment where we measure the property “is this particle in state $|a\rangle$?” a large number of times on particles in state $|\psi\rangle$, the fraction of times we obtain the answer “yes” will approach $|\alpha|^2$.

Yes, that’s literally the definition of the statement “the probability of measuring a particle in state $|\psi\rangle$ as being in state $|a\rangle$ is $|\alpha|^2$”:

This is the Born Rule, and that’s how it’s derived from the other postulates.

There's nothing circular about it, and it's on every bit as solid ground as any classical statement about probability is.

Is the Born rule usually regarded an axiom in quantum mechanics?

The statement

My question

2 Answers2