Many philosophers, and even some physicists still claim that the Born Rule is an independent postulate, rather than a consequence of the more fundamental postulates. I definitely think that's wrong.
There's no question it's derivable from the other postulates, as I'll show explicitly below. The objections you'll find to it mostly have to do with confusion over the meaning of probability. In my opinion, the people who don't accept the derivation are failing to understand what probability is.
If any of the objections Adrian Kent or others have made to it were true, then you wouldn't be able to derive probability in classical mechanics either... since both of them rely on exactly the same technique.
In classical statistical mechanics, you postulate the existence of phase space as a vector space with a Euclidean norm (or equivalently an "inner product"). And you also assume ergodicity (that equal volumes in phase space are equally likely).
The situation is exactly the same in quantum mechanics (which is really nothing more than a slight generalization of statistical mechanics). You postulate the existence of a Hilbert space, which is mathematically equivalent to phase space anyway. (In fact, you can use either the Hilbert space formalism or the phase space formalism for either classical stat mech or quantum mechanics, and there is no difference whatsoever in the predictions each of them make--one just happens to be more convenient in one case, while the other is more convenient in the other case). You postulate that it has an inner product defined on it, which gives you a Euclidean metric and a Euclidean measure... all exactly the same as in classical physics. The postulates lead in exactly the same way to the Born rule, and the derivation is no different from how probability is derived/defined in classical physics. In other words, the standards of the critics are demonstrably inconsistent/hypocritical.
I'll go through the derivation carefully here, so that anyone familiar with undergraduate level quantum mechanics, and is comfortable with college level mathematics should be able to see that it's a valid proof.
I start by stating all the postulates you do need explicitly, so it's clear there is no "circular logic" going on here, as some critics have also claimed.
Basic Postulates of Quantum Mechanics
- Quantum mechanical states are represented by vectors in a complex Hilbert space, while observables are represented by self-adjoint linear operators on those vectors.
- If a quantum system is in an eigenstate of some observable, then measuring that observable will yield the eigenvalue of the operator representing that observable.
- The time evolution of a quantum state is given by a unitary operator U(t, t’)
- The states of a composite system are represented by linear combinations of tensor products of the states of each of its subsystems.
Postulate 1 defines what a state and an observable is.
Postulate 2 defines what a measurement is.
Postulate 3 places a general constraint on time-evolution: it must be unitary. This is equivalent to the (general) Schrödinger Equation:
$\displaystyle i\frac{\partial}{\partial t} \psi = H \psi $
which follows from the definition of H as the self-adjoint operator satisfying $U(t, t’) = \exp{\Big(-iH(t-t’)\Big)}$.
Postulate 4 defines how to build a complex system out of simpler subsystems, or how multi-particle systems relate to single-particle systems.
Notice there is nothing about probability in any of these postulates. In fact, they state nothing about how to predict measurement outcomes for states which aren’t eigenstates of the observable being measured. (Or equivalently, how to compute the expected value of an operator in an arbitrary state.) That’s what we will derive here.
Notation and Setup
Because a Hilbert space is a vector space, an immediate corollary of the postulate 1 is the superposition principle: if $|a\rangle$ and $|b\rangle$ are both states in the Hilbert space, then so is any linear combination of them (this is implied by the fact that it’s a vector space):
$| \psi \rangle = \alpha |a \rangle + \beta |b \rangle$
Also because it’s a Hilbert space, there is an inner product defined. We can write this as:
$$\langle \psi|a\rangle = \alpha$$
$$\langle \psi|b\rangle = \beta$$
There is also an adjoint operation defined on each element of a Hilbert space, which has a distributive property which reverses the order of any product while conjugating all complex numbers involved:
$$\langle a|\psi\rangle = \alpha^*$$
$$\langle b|\psi\rangle = \beta^*$$
Postulate 4 (unitary time evolution) implies that if we start with a state normalized to 1, we’ll always end up with a state normalized to 1:
$\langle\psi|\psi\rangle = 1$
Since none of the observable results of experiments depend on the normalization, we are free to normalize all states to 1, and they will stay that way at all times. If the two states $|a\rangle$ and $|b\rangle$ defined above are orthogonal (their inner product is zero), then the superposition principle implies:
$|\alpha|^2 + |\beta|^2 = 1$.
Now we have a formula for the state of any 2-state system in terms of an orthogonal basis $|a\rangle$ and $|b\rangle$, with a relationship between the components.
Consider the observable $P_A$ , which we define as the operator with eigenvalue 1 for the eigenstate $|a\rangle$ and eigenvalue 0 for state $|b\rangle$, including state. Physically, this corresponds to an experiment which results in a yes or no answer to the question “is the system in state $|a\rangle$?” 1 means yes, 0 means no.
Since the state $|\psi\rangle$ above is not an eigenstate of this operator, we can make no definite prediction for the outcome. What should we expect the outcome to be then?
Science only deals with repeatable experiments. No matter whether the outcome gives us 1 or 0, a single experiment isn’t enough to confirm or disprove the rules of quantum mechanics—both outcomes are entirely consistent with all postulates. (This is true in all versions of quantum mechanics, regardless of whether the Born rule is included as an explicit postulate or derived from the others.) So the meaningful question is: if we prepare a large number of particles all in the same initial state $|\psi\rangle$, and repeated the same measurement on each of them, what fraction of them would we find in state$|a\rangle$?
Our rule for composing subsystems tells us how to construct the state of the entire system out of the states of all the individual particles/qubits. So we have all the postulates we need in order to calculate the answer in the limit as the number of particles N approaches infinity. Spoiler alert: it works out to be: $|\alpha|^2$
The Proof Itself
First, note that:
$\langle \psi | P | \psi \rangle = |\alpha|^2$
This follows trivially from the first 3 postulates. However, there is still no probability involved yet. What we need to show is that this corresponds to the expected value for the result of measuring property $P_A$ in state $\psi$, ie that:
$\langle P_A \rangle_\psi = \langle \psi | P_A | \psi \rangle$
This will be the difficult part, the non-trivial part of the Born Rule. To prove it, we will need to use postulate 4, which tells us how to represent a collection of N particles, where each is in the same state:
$| \displaystyle \chi \rangle = \bigotimes_{i=0..N} |\psi\rangle_i = (\alpha |a\rangle + \beta |b\rangle)^N$
Now consider the operator $P$ defined by:
$P = \displaystyle\frac{1}{N}\sum_i P_{A_i}$
where $P_{Ai}$ is the operator representing the observable “is the i’th particle in state $|a\rangle$?”
The action of $P$ on a state where some fraction $p = \frac{k}{N}$ of the N particles are in state $|a\rangle$ and the rest ($1-p = \frac{N-k}{N}$) are in state $|b\rangle$ is to count the number of particles in state $|a\rangle$ and divide by N:
$P |k\rangle = \frac{k}{N} | k \rangle = p | k \rangle$
Our entire Hilbert space consists of $2^N$ orthogonal states, and when we expand out the product we’ll have one term for each of these states. We could label them by a binary string $z \in Z$, where $Z$ is the set of all binary strings of length $N$.
Better, we can group these basis vectors into subsets, where each subset $Z_k$ contains only the states where exactly $k$ particles are in the $|a\rangle$ state and $N-k$ are in the $|b\rangle$ state. Then there are $\frac{N!}{k!(N-k)!}$ states $z \in Z_k$ in the subset labelled by k.
Summing over all of those for all values of k recovers the full set of $2^N$ states:
$\displaystyle I = \sum_{z \in Z} |z\rangle \langle z | = \sum_k\sum_{z \in Z_k}| z \rangle\langle z |$
Since $\chi$ is symmetric under permuation of particle indices, we know that it exists in a subspace of symmetrized states. This subspace is spanned by a basis of only N states, which I’ll refer to as the “k basis”:
$|k\rangle = \sqrt{\frac{k!(N-k)!}{N!}} \sum_{z \in Z_k} |z\rangle$
The normalization has been chosen so that:
$\langle k | k \rangle = \langle z | z \rangle = 1$
This means the outer product of k states looks like:
$|k\rangle \langle k| = \frac{1}{C(N,k)} \sum_{z \in Z_k} |z\rangle\langle z|$
Writing out the components of $ |\chi\rangle $ in the k-basis ends up looking a lot like the binomial theorem:
$\displaystyle |\chi \rangle = \sum_{k=0}^N\alpha^k \beta^{N-k} \sum_{z \in Z_k} |z\rangle = \sum_{k=0}^N \sqrt{\frac{N!}{k!(N-k)!}} \alpha^k \beta^{(N-k)}|k\rangle$
The operator P is diagonal in this basis, since all of these k-vectors are mutually orthogonal eigenstates of P. Since we now have an expression for $\chi$ in this basis, that makes it easy to see what happens when we operate on it with P:
$\displaystyle |\chi^{'} \rangle = P|\chi \rangle = \frac{1}{N}\sum_{k=0}^N \sqrt{\frac{N!}{k!(N-k)!}} \alpha^k \beta^{N-k} k|k\rangle$
It sure doesn’t look like $|\chi \rangle$ is an eigenvector, since each component got multiplied by a different value of k. But let’s see how close we got. We can do this by computing the angle by which P has rotated.
$\langle \chi|\chi\rangle$ is an eigenvector of P if and only if the angle between them is 0, where the angle between them is related to their inner product:
$\cos \theta = \frac{\langle \chi | \chi^{‘} \rangle}{\langle \chi | \chi \rangle \langle \chi^{‘} | \chi^{‘} \rangle}$
The length of the $|\chi \rangle$ vector should already be normalized to 1 ($\langle \chi | \chi \rangle = 1,$ since all we did was take a tensor product of N states, each of which were already normalized. But to find $\theta$, we still need both $\langle \chi^{‘}| \chi^{‘}\rangle$ and $\langle \chi|\chi^{'}\rangle.$
Starting with the first:
$\displaystyle \langle \chi^{'} | = \langle \chi | P = \frac{1}{N}\sum_{j=0}^N \langle j|j(\alpha^{*})^j(\beta^{*})^{N-j} \sqrt{\frac{N!}{j!(N-j)!}}$
$\displaystyle \langle \chi^{’} | \chi^{’} \rangle = \frac{1}{N^2}\sum_{k=0}^N \sum_{j=0}^N \sqrt{\frac{N!}{k!(N-k)!}}\sqrt{\frac{N!}{j!(N-j!)}}(\alpha^{*})^j (\beta^{*})^{N-j} \alpha^{k} \beta^{N-k} jk\langle j | k\rangle $
Using the orthogonality of the k-basis vectors $\langle k | j \rangle = \delta_{jk}$, this reduces to:
$\displaystyle \langle \chi^{’} | \chi^{’} \rangle = \frac{1}{N^2}\sum_{k=0}^N \frac{N!}{k!(N-k)!} (|\alpha|^2)^k (|\beta|^2)^{N-k} k^2$
Or in terms of $p = k/N, \delta p = 1/N$:
$\displaystyle \langle \chi^{’} | \chi^{’} \rangle = \sum_{p=0}^1 \frac{N!}{(pN)!((1-p)N)!}(|\alpha|^2)^{pN} (|\beta|^2)^{(1-p)N} p^2 \delta p$
Where the sum over p takes the values $p = \{0, 1/N, {2/N}, {3/N},\dots N/N\} = \{0, \delta p, 2\delta p, 3 \delta p, \dots, N \delta p\}$.
For large N, we can replace $\delta p$ with $dp$ and write it as an integral:
$\displaystyle \langle \chi^{’} | \chi^{’} \rangle = \int_0^1 \frac{N!}{(pN)!((1-p)N)!} (|\alpha|^2)^{pN} (|\beta|^2)^{(1-p)N} p^2 dp$
Also, for large N:
$\frac{N!}{(p N)!((1-p)N)!} \approx \frac{1}{\sqrt{2\pi N p(1-p)}} e^{-N(p \ln p + (1-p) \ln (1-p)}$
Making that substitution, as well as $(|\alpha|^2)^{pN} = e^{pN \ln{|\alpha|^2}}$ and the same for the $\beta$ allows us to write this as:
$\displaystyle \langle \chi^{‘} | \chi^{‘} \rangle = \int_0^1 \frac{1}{\sqrt{2\pi N p(1-p)}} e^{-N f(p)} p^2 dp $
where:
$f(p) = p \ln p + (1-p) \ln (1-p) - p\ln{|\alpha|^2} - (1-p) \ln{|\beta|^2}$
The math for $\langle \chi | \chi^{‘} \rangle$ is very similar, the only difference is that there’s no P acting to the left on $\langle \chi|$, so instead of $\langle j |\frac{jk}{N^2}|k\rangle$ in the double sum we just have $\langle j | \frac{k}{N} | k \rangle$. In other words, instead of $p^2 dp$ we get $p ~ dp$:
$\displaystyle \langle \chi | \chi^{‘} \rangle = \int_0^1 \frac{1}{\sqrt{2\pi N p(1-p)}} e^{N f(p)} p ~ dp $
For large N, the arguments of these integrals approach a Gaussian shape (bell curve), becoming more and more peaked around the global minimum of $f(p)$. In fact, in the limit as N approaches infinity, the width of the Gaussian approaches 0 and we get a Dirac delta function:
$ \displaystyle\delta (p - p_0) = \lim_{N \rightarrow \infty} \sqrt{\frac{N}{2\pi \sigma^2}}\int \exp{\Big[\frac{-N}{2\sigma^2}(p-p_0)^2\Big]}$
Since part of the integrand is a Dirac delta function, we can evaluate the integral easily by using:
$\int g(p) \delta(p - p_0) dp = g(p_0)$
It doesn’t matter what $\sigma$ is (the curvature of the function near its minimum), because we’ll get the same thing on the top and bottom of the fraction we’re using to compute $\cos{\theta} $. All we need is the global minimum $p_0$ , and the rest will cancel out. We can find the minimum by setting the derivative of $f(p)$ to zero:
$f^{‘}(p) = \ln{p} - \ln{(1-p)} - \ln{|\alpha|^2} + \ln{|\beta|^2} = 0$
$\ln{\frac{p_0}{(1-p_0)}} = \ln{\frac{|\alpha|^2}{|\beta|^2}}$
$\frac{p_0}{1-p_0} = \frac{|\alpha|^2}{\sqrt{1 - |\alpha|^2}}$
$p_0 = |\alpha|^2$
Because the only difference between $\langle \chi^{‘} | \chi^{‘}\rangle$ and $\langle \chi^{‘} | \chi \rangle$ is that one is multiplied by $p$ and the other by $p^2$, the Dirac delta turns both of those into $p_0$, so after the rest cancels out we’re left with:
$\displaystyle cos{\theta} = \frac{p_0}{\sqrt{1 \cdot p_0^2}} = 1$
$\theta = 0$
In other words, we’ve proven now that $|\chi^{‘} \rangle $ and $|\chi$ point in exactly the same direction. Therefore, $\chi$ is an eigenstate of P, and furthermore:
$\langle \chi^{‘}|\chi\rangle = \langle \chi|P|\chi\rangle = p = p \langle \chi|\chi\rangle$
Ergo, in the large N limit the eigenvalue of P in the state $\chi$ is $p = |\alpha|^2$. P is the observable representing the fraction of particles in state $|a\rangle$, so we have proven that if we repeat an experiment where we measure the property “is this particle in state $|a\rangle$?” a large number of times on particles in state $|\psi\rangle$, the fraction of times we obtain the answer “yes” will approach $|\alpha|^2$.
Yes, that’s literally the definition of the statement “the probability of measuring a particle in state $|\psi\rangle$ as being in state $|a\rangle$ is $|\alpha|^2$”:
$\langle P_A \rangle_\psi = \langle \psi | P_A | \psi \rangle = |\alpha|^2 = |\langle a | \psi \rangle |^2$
This is the Born Rule, and that’s how it’s derived from the other postulates.
There's nothing circular about it, and it's on every bit as solid ground as any classical statement about probability is.