93

Why is the application of probability in Quantum Mechanics (QM) fundamentally different from its application in other areas? QM applies probability according to the same probability axioms as in other areas of physics, engineering, etc.

Why is there a difference?

Naively one would assume one of these possibilities:

  1. It is not the same probability (theory?)

  2. It is a matter of interpretation (of the formalism?)

  3. Something else?

Many answers (which I am still studying) focus on the fact that the combined probability of two mutually exclusive events in QM is not equal to the sum of the probabilities of each event (which holds classically by definition). This fact (appears to) makes the formulation of another probability (a quantum one) a necessity.

Yet this again breaks down to assumed independent or assumed mutually exclusive, if this is not so, the "classical probability" is applicable (as indeed in other areas). This is one of the main points of the question.

Nikos M.
  • 5,302

8 Answers8

151

The theory of probability used in QM is intrinsically different from the one commonly used for the following reason: The space of events is non-distributive (more properly non-Boolean), and this fact deeply affects the conditional probability theory. The probability that A happens if B happens is computed differently in classical probability theory and in quantum theory when A and B are quantum incompatible events. In both cases, probability is a measure on a lattice, but, in the classical case, the lattice is a Boolean one (a $\sigma$-algebra); in the quantum case, it is not.

More clearly, classical probability is a map $\mu: \Sigma(X) \to [0,1]$ such that $\Sigma(X)$ is a class of subsets of the set $X$ including $\emptyset$, closed with respect to the complement and the countable union, and such that $\mu(X)=1$ and: $$\mu(\cup_{n\in \mathbb N}E_n) = \sum_n \mu(E_n)\quad \mbox{if $E_k \in \Sigma(X)$ with $E_p\cap E_q= \emptyset$ for $p\neq q$.}$$ The elements of $\Sigma(X)$ are the events whose probability is $\mu$. In this view, for instance, if $E,F \in \Sigma(X)$, $E\cap F$ is logically interpreted as the event "$E$ AND $F$". Similarly $E\cup F$ corresponds to "$E$ OR $F$" and $X\setminus F$ has the meaning of "NOT $F$" and so on. The probability of $P$ when $Q$ is given verifies $$\mu(P|Q) = \frac{\mu(P \cap Q)}{\mu(Q)}\:.\tag{1}$$

If you instead consider a quantum system, there are "events", i.e. elementary "yes/no" propositions experimentally testable, that cannot be joined by logical operators AND and OR.

An example is $P=$" the $x$ component of the spin of this electron is $1/2$", and $Q=$" the $y$ component is $1/2$". There is no experimental device able to assign a truth value to $P$ and $Q$ simultaneously, so elementary propositions such as "$P$ and $Q$" make no sense. Pairs of propositions like $P$ and $Q$ above are physically incompatible.

In quantum theories (the most elementary version due to von Neumann), the events of a physical system are represented by the orthogonal projectors of a separable Hilbert space $H$. The set ${\cal P}(H)$ of those operators replaces the classical $\Sigma(X)$.

In general, the meaning of $P\in {\cal P}(H)$ is something like "the value of the observable $Z$ belongs to the subset $I \subset \mathbb R$" for some observable $Z$ and some set $I$. There is a procedure to integrate such a class of projectors labelled on real subsets to construct a self-adjoint operator $\hat{Z}$ associated with the observable $Z$, and this is nothing but the physical meaning of the spectral decomposition theorem.

If $P, Q \in {\cal P}(H)$, there are two possibilities: $P$ and $Q$ commute or they do not.

Von Neumann's fundamental axiom states that commutativity is the mathematically corresponding of physical compatibility.

When $P$ and $Q$ commutes, $PQ$ and $P+Q-PQ$ still are orthogonal projectors, that is elements of ${\cal P}(H)$.

In this situation, $PQ$ corresponds to "$P$ AND $Q$", whereas $P+Q-PQ$ corresponds to "$P$ OR $Q$" and so on, in particular, "NOT $P$" is always interpreted as the orthogonal projector onto $P(H)^\perp$ (the orthogonal subspace of $P(H)$), and all classical formalism holds true this way. As a matter of fact, a maximal set of pairwise commuting projectors has formal properties identical to those of classical logic: is a Boolean $\sigma$-algebra.

In this picture, a quantum state is a map assigning the probability $\mu(P)$ that $P$ is experimentally verified to every $P\in {\cal P}(H)$. It has to satisfy: $\mu(I)=1$ and $$\mu\left(\sum_{n\in \mathbb N}P_n\right) = \sum_n \mu(P_n)\quad \mbox{if $P_k \in {\cal P}(H)$ with $P_p P_q= P_qP_p =0$ for $p\neq q$.}$$

Celebrated Gleason's Theorem, establishes that, if $\text{dim}(H)\neq 2$, the measures $\mu$ are all of the form $\mu(P)= \text{tr}(\rho_\mu P)$ for some mixed state $\rho_\mu$ (a positive trace-class operator with unit trace), biunivocally determined by $\mu$. In the convex set of states, the extremal elements are the standard pure states. They are determined, up to a phase, by unit vectors $\psi \in H$, so that, with some trivial computation (completing $\psi_\mu$ to an orthonormal basis of $H$ and using that basis to compute the trace), $$\mu(P) = \langle \psi_\mu | P \psi_\mu \rangle = ||P \psi_\mu||^2\:.$$

(Nowadays, there is a generalized version of this picture, where the set ${\cal P}(H)$ is replaced by the class of bounded positive operators in $H$ (the so-called "effects") and Gleason's theorem is replaced by Busch's theorem with a very similar statement.)

Quantum probability is therefore given by the map for a given generally mixed state $\rho$, $${\cal P}(H) \ni P \mapsto \mu(P) =\text{tr}(\rho_\mu P) $$

It is clear that, as soon as one deals with physically incompatible propositions, $(1)$ cannot hold just because there is nothing like $P \cap Q$ in the set of physically sensible quantum propositions. All that is due to the fact that the space of events ${\cal P}(H)$ is now a non-commutative set of projectors, giving rise to a non-Boolean lattice.

The formula replacing $(1)$ is now:

$$\mu(P|Q) = \frac{\text{tr}(\rho_\mu QPQ)}{\text{tr}(\rho_\mu Q)}\tag{2}\:.$$

Therein, $QPQ$ is an orthogonal projector and can be interpreted as "$P$ AND $Q$" (i.e., $P\cap Q$) when $P$ and $Q$ are compatible. In this case, $(1)$ holds true again; $(2)$ gives rise to all "strange things" showing up in quantum experiments (as in the double-slit one). In particular, the fact that, in QM, probabilities are computed by combining complex probability amplitudes arises from $(2)$.

$(2)$ just relies upon the von Neumann-Luders reduction postulate stating that, if the outcome of the measurement of $P\in {\cal P}(H)$ is YES when the state was $\mu$ (i.e., $\rho_\mu$), the the state immediately after the measurement is $\mu'$ associated to $\rho_{\mu'}$ with

$$\rho_{\mu'} := \frac{P\rho_\mu P}{\text{tr}(\rho_\mu P)}\:.$$

ADDENDUM. Actually, it is possible to extend the notion of logical operators AND and OR for all pairs of elements in ${\cal P}(H)$, and that was the program of von Neumann and Birkhoff (the quantum logic). In fact, just the lattice structure of ${\cal P}(H)$ permits it, or better is it. With this extended notion of AND and OR, "$P$ AND $Q$" is the orthogonal projector onto $P(H)\cap Q(H)$ whereas "$P$ OR $Q$" is the orthogonal projector onto the closure of the space $P(H)+Q(H)$. When $P$ and $Q$ commute, these notions of AND and OR reduce to the standard ones. However, with the extended definitions, ${\cal P}(H)$ becomes a lattice in the proper mathematical sense, where the partial order relation is given by the standard inclusion of closed subspaces ($P \geq Q$ means $P(H) \supset Q(H)$). The point is that the physical interpretation of this extension of AND and OR is not clear. The resulting lattice is, however, non-Boolean. In other words, for instance, these extended AND and OR are not distributive as the standard AND and OR are (this reveals their quantum nature). However, also keeping the definition of "NOT $P$" as the orthogonal projector onto $P(H)^\perp$, the found structure of ${\cal P}(H)$ is well known: A $\sigma$-complete, bounded, orthomodular, separable, atomic, irreducible and verifying the covering property, lattice. Around 1995, it was definitely proved by Solér, a conjecture due to von Neumann stating that there are only three possibilities for practically realizing such lattices: The lattice of orthogonal projectors in a separable complex Hilbert space, the lattice of orthogonal projectors in a separable real Hilbert space, the lattice of orthogonal projectors in a separable quaternionic Hilbert space.

Gleason's theorem is valid in the three cases. The extension to the quaternionic case was obtained by Varadarajan in his famous book 1 on the geometry of quantum theory. However, a gap in his proof has been fixed in this published paper I have co-authored 2.

Assuming Poincaré symmetry, at least for elementary systems (elementary particles), the case of real and quaternionic Hilbert spaces can be ruled out (here is a pair of published works I have co-authored on the subject: 3 and 4).

ADDENDUM2. After a discussion with Harry Johnston, I think that an interpretative remark is worth mentioning about the probabilistic content of the state $\mu$ within the picture I illustrated above. In QM, $\mu(P)$ is the probability that, if I performed a certain experiment (in order to check $P$), $P$ would turn out to be true. It seems that there is here a difference with respect to the classical notion of probability applied to classical systems. There, probability mainly refers to something already existent (and to our incomplete knowledge of it). In the formulation of QM I presented above, probability instead refers to that which will happen if...

ADDENDUM3. For $n=1$, the theorem of Gleason is valid and trivial. For $n=2$, there is a known counterexample. $\mu_\nu(P)= \frac{1}{2}(1+ (v \cdot n_P)^3)$ where $v$ is a unit vector in $\mathbb R^3$ and $n_P$ is the unit vector in $\mathbb R^3$ associated to the orthogonal projector $P: \mathbb C^2 \to \mathbb C^2$ in the Bloch sphere: $P= \frac{1}{2} \left(I+\sum\limits_{j=1}^3 n_j \sigma_j \right)$.

ADDENDUM4. From the perspective of quantum probability, the von Neumann-Luders reduction postulate has a very natural interpretation. Suppose that $\mu$ is a probability measure over the quantum lattice ${\cal P}(H)$ representing a quantum state and assume that the measurement of $P \in {\cal P}(H)$, on that state, has outcome $1$. The post-measurement state is therefore represented by $\mu_P(\cdot) = \mu(P \cdot P)$, just in view of the aforementioned postulate.

It is easy to prove that $\mu_P : {\cal P}(H) \to [0,1]$ is the only probability measure such that $$\mu_P(Q) = \frac{\mu(Q)}{\mu(P)} \quad \mbox{if $Q \leq P$}\:.$$

M. A.
  • 2,039
  • 5
  • 11
  • 28
20

Having given it some more thought, there is an unambiguous philosophical difference, with practical implications. The two-slit experiment provides a good example of this.

In a classical universe, any particular photon that hits the screen either went through slit A or through slit B. Even if we didn't bother to measure this, one or the other still happened, and we can meaningfully define $P(A)$ and $P(B)$.

In a quantum universe, if we didn't bother to measure which slit a photon went through, then it isn't true that it went through one slit or the other. You might say it went through both, though even that isn't entirely true; all we can really say is that it "went though the slits".

(Asking which slit a photon went through in the two-slit experiment is like asking what the photon's religion is. It simply isn't a meaningful question.)

That means that $P(A)$ and $P(B)$ just don't exist. Here's where one of the practical implications comes in: if you don't understand QM properly [I'm lying a bit here; I'll come back to it] then you can still calculate a probability that the particle went through slit A and a probability that it went through slit B. And then when you try to apply the usual mathematics to those probabilities, it doesn't work, and then you start saying that quantum probability doesn't follow the same rules as classical probability.

(Actually what you're really doing is calculating what the probabilities for those events would have been if you had chosen to measure them. Since you didn't, they're meaningless, and the mathematics doesn't apply.)

So: the philosophical difference is that when studying quantum systems, unlike classical systems, the probability that something would have happened if you had measured it is not in general meaningful unless you actually did; the practical implication is that you have to keep track of what you did or did not measure in order to avoid doing an invalid calculation.

(In classical systems most syntactically valid questions are meaningful; it took me some time to come up with the counter-example given above. In quantum mechanics most questions are not meaningful and you have to know what you're doing to find the ones that are.)

Note that keeping track of whether you've measured something or not is not an abstract exercise restricted to cases where you are trying to apply probability theory. It has a direct and concrete impact on the experiment: in the case of the two-slit experiment, if you measure which slit each photon went through, the interference pattern disappears.

(Trickier still: if you measure which slit each photon went through, and then properly erase the results of that measurement before looking at the film, the interference pattern comes back again.)

PS: it may be unfair to say that calculating a "would-have" probability means that you don't understand QM properly. It may simply mean that you're consciously choosing to use a different interpretation of it, and prefer to modify or generalize your conception of probability as necessary. V. Moretti's answer goes into some detail about how you might go about doing this. However, while this sort of thing is interesting, it does not appear to me to be of any obvious use. (It isn't clear that it gives any insight into the disappearance and reappearance of the interference pattern as described above, for example.)

Addendum: that has become clearer following the discussion in the comments. It seems that it is thought that the alternative formulation may have advantages when dealing with more complicated scenarios (QFT on curved spacetime was mentioned as one example). That is entirely plausible, and I certainly don't mean to imply that the work lacks value; however, it is still not clear to me that it is pedagogically useful as an alternative to the conventional approach when learning basic QM.

PPS: depending on interpretation, there may be other philosophical differences related to the nature or origin of randomness. Bayesian statistics is broad enough, I believe, that these differences are not of any great importance, and even from a frequentist viewpoint I don't think they have any practical implications.

4

The probabilities in QM are given by the square amplitudes of the relevant terms in the wavefunction, or by by the expectation value of the relevant projector or POVM. However, it is not the case that those numbers always act in a way that is consistent with the calculus of probability.

For example, if there are two mutually exclusive ways for an event to happen then the calculus of probability would say that the probability for that event is the sum of the probabilities of it happening in each of those ways. But in single photon interference experiments this doesn't seem to work. There are two routes through the interferometer, the photon cannot be detected on both routes at once, so they are mutually exclusive, right? So then to get the probability of the photon emerging from a particular port on the other end you should just add the probability of it going along each route. But that calculation gives the wrong answer: you can get any probability you like by changing the path lengths see:

http://arxiv.org/abs/math/9911150.

So then you have the problem of explaining under what circumstances the calculus of probability applies.

You ask about frequentist approaches to quantum probability. There are some such approaches, e.g. - Hugh Everett's 1957 paper and his PhD. thesis:

http://www-tc.pbs.org/wgbh/nova/manyworlds/pdf/dissertation.pdf.

I think these arguments don't work because the frequency approach itself doesn't work. Why would the relative frequency over an infinite number of samples have anything to do with what is observed in a laboratory? And if there is some explanation, then why are we bothering with this relative frequency stuff rather than using the actual explanation? The best explanation of why it is applicable is the decision theoretic approach:

http://arxiv.org/abs/quant-ph/9906015

http://arxiv.org/abs/0906.2718.

The best attempt at explaining the circumstances under which it holds is given by the requirements that quantum mechanics imposes on the circumstances under which information can be copied:

http://arxiv.org/abs/1212.3245.

alanf
  • 11,359
  • 1
  • 16
  • 33
3

The application of probability in areas other than quantum mechanics is a clever way to model situations that are complex enough so that the exact analysis is non feasible, or at least highly tedious.

On the other hand in QM nature is inherently probabilistic. When you make an observation the quantum state your system is in has a probability for each possible outcome. It is no more a trick to make calculations. It is a feature of nature. That is the difference.

Yossarian
  • 6,235
3

There is an important difference, but it is not fundamental.

In both cases, probability arises from the need to compare the results from two incompatible models operating at different scales, the microscopic and the macroscopic.

Darwin and Fowler long ago showed how to derive Classical Statistical Mechanics, the main place in classical physics where probabilities occur, from Quantum Mechanics. So in a sense, Quantum Mechanics is fundamental and there is no problem deriving the Classical case from it. Fowler, Statistical Mechanics

But I will present them in the other order, anyway. In Classical physics, if one is analysing, say, an ideal gas, the system of $10^{23}$ particles is deterministic. And the number of variables is 6 times $10^{23}$. This is the microscopic view of the system as a whole. But one can also study certain properties of this gas in terms of a very few thermodynamic variables, temperature pressure and volume, which describe a macro-state. But in terms of this description, the system is probabilistic: one only knows the probabilities with which its molecules will possess a given energy, etc. Furthermore, the connection between the two levels of description of the system, the micro-level and the macro-level, is via measurement. The measurement of the velocity of a molecule is modelled by the long-time average over its trajectory of its velocity. Then it turns out that for all normal molecules, this procedure, provided the system is in equilibrium, yields the same answer almost without regard to which molecule or which trajectory you study, and Einstein defined this as the probabilistic expectation of the energy of a molecule. See Jan von Plato, Creating Modern Probability. So only the results of measurements are assigned probabilities.

Now, according to Feynman and others, something parallel is true in Quantum Mechanics. The probabilities arise from the necessity to amplify micro-phenomena up to the macro-level where we can see the measuring apparatus, see a needle on a dial pointing to a number on the dial. (Schrödinger's equation is itself a deterministic equation and probabilities only come in to the measurement axioms.) The only "events" in the sense of mathematical probability theory, i.e., things which are assigned probabilities, are the results of measurements. And here, too, the measurement has something to do with describing in a reduced fashion the state of a micro-system in terms of macro-states instead of its micro-states. The needle on the dial really obeys the laws of quantum mechanics: it has a wave function, it is in an entangled state, etc., but when we say "the result of the measurement was that the needle pointed to 3" we are describing the measurement apparatus in classical terms, which are macro-terms. The passage from the micro-description of the particle in terms of quantum concepts to this reduced description brings in probabilities.

What probabilities are not

It is a myth that the probabilities in classical statistical mechanics are due to ignorance or are subjective. They come about only because one restricts one's attention to the normal cell of micro-states (normal cell in the sense of Darwin and Fowler) and ignores exceptional states. The definition of "normal" is an objective one: states can be grouped into cells of states: all those states which possess the same time-average properties as each other. The normal cell is the largest cell. In the thermodynamic limit, the normal cell is not only the largest, it is is the only one with positive volume, all the other cells are mere boundaries with lower dimension.

It is a myth that the probabilities in Quantum Mechanics are somehow "non-commutative". The problem is not that there are non-commuting observables. If you are measuring momentum, the experimental setup is quite definite, and the space of events depends on the physical setup, and only has the results of measuring momentum. If the measuring apparatus is one suitable to measure momentum, then results for position are not events. The setup excludes measuring position, so measurements of position are impossible in this setup. And conversely. There is no one, over-arching, probability space with both kinds of events in it, as mathematicians who study so-called "Quantum Probability" or "Non-commutative Probability" naively suppose. Bohr taught us that if you setup the apparatus for one type of measurement (e.g., momentum), you physically exclude the possibility of the other type of complementary measurement (e.g., position). That means that you either work in one probability space with events and normal measures of their probability, or you are in a totally other probability space with its own events and its own measure. Now, no one would say that an operator on space A either commuted with or did not commute with an operator on a totally different space B, and that is what we have here.

2

Maybe you will find the essay Quantum Theory From Five Reasonable Axioms by Lucien Hardy interesting. In the abstract it says:

In this paper it is shown that quantum theory can be derived from five very reasonable axioms. The first four of these axioms are obviously consistent with both quantum theory and classical probability theory. Axiom 5 (which requires that there exist continuous reversible transformations between pure states) rules out classical probability theory.

doetoe
  • 9,484
1

ANSWER: Mutually-exclusive events cannot exist before measurement, in the probabilistic formulation of quantum mechanics (Copenhagen interpretation-CIQM), because, maximally, CIQM is required to violate local realism and, minimally, it might break the principle of locality. And after measurement, the problem you mentioned does not exist because it is eradicated by a much bigger challenge, i.e. simultaneity of two spatially-separated events or quantum mechanically separated events (the two not being necessarily equivalent). Please start from the map in https://en.wikipedia.org/wiki/Principle_of_locality.

Building Blocks

  • ($0000$)-First of all the concept of probability is a design of the Copenhagen interpretation of quantum mechanics in which one corresponds a wave function (with all the characteristics of a wave) to a particle; through this, one builds a direct mathematical channel between particle and wave behaviour. In this picture, you cannot separate these natures. This very important first step is expressed by the "principle of complementarity".

  • ($000$) Now this picture is not complete and in order to attach this picture to tangible experience, they "correspond" the square of the amplitude of the wave function to the probability of the particle being at a specific point in space and time.

WARNING: Your question is related to this correspondence not, directly, to the notion of probability.

Now, I would point out two other building blocks of Copenhagen QM which complete your probabilistic correspondence:

Quantum Mechanical Probability

  • ($00$) IN QM, just like classical mechanics, space and time are continuous and momentum is responsible for (the generator of) displacement (translation). But translation of what in what? Translation of complex vectors from the Hilbert space (Ket states), which are fundamental abstract mathematical representations required for this new indirect illustration of the physical phenomena, in the ordinary three-dimensional space. These Ket states build a vector space which has a dual, i.e. the Bra space. Quantum mechanical probabilities are defined based on the inner product of elements from bra and ket spaces. For instance $|\alpha\rangle$ is a ket state, the bra state $\langle \alpha|$ is its dual, and the inner product of $\langle \beta|$ and $|\alpha\rangle$ is denoted by $\langle \beta|\alpha\rangle$. The probability of finding $|\beta\rangle$ in $|\alpha\rangle$ which, based on the fundamental principles of the probability theory, is equivalent to finding $|\alpha\rangle$ in $|\beta\rangle$ is then given by

$$\left| \langle \beta | \alpha \rangle \right|^2 = \left| \langle \alpha | \beta \rangle \right|^2$$

This is equivalent to one of the two important postulates of the inner products in the Hilbert space:

$$\langle \beta | \alpha \rangle = \langle \alpha | \beta \rangle^*$$

The second postulate is called the postulate of positive definite metric according to which

$$\langle \alpha| \alpha \rangle \geq 0$$

Another important characteristic is related to the conservation of probability when translating the ket states; this is how one extract the unitarity of translation operators. I believe this is probably the most important postulate regarding the quantum mechanical probability. It must be equivalent with an assumption regarding the fabric of space-time.

  • ($0$) Now if two quantum mechanical states, describing the initial state of an event, are orthogonal, they will remain orthogonal evolving in time; because the time evolution operator is unitary. Therefore, two disjoint quantum mechanical events would not mix evolving in time, whatsoever. However this simple Hilbert space representation would not be so straightforward when projected in space-time. For instance, wherever the wave function of a particle is zero (usually there are several such points), the square of the amplitude would also be zero, i.e. the probability of finding the particle in that point of space-time would be absolute zero while, for instance, in the neigberhood points the particle can be found. It is as if some points of space-time would be singular, probability-wise. The reason I call it singular is that a zero probability is a matter of absolutely not being there.

And the final point: every theory that violates Bell's inequality would not be locally invariant and would produce predictions that no local realism would.

moha
  • 180
-2

Classical probability theory is a degenerate limit of quantum probability theory. So, there is an asymmetric relation between the two, you can completely derive classical probability theory from quantum probability theory, but not the other way around. It's actually the case that the probabilities themselves that occur in the real world, even when they are firmly within the classical domain, are always given by the squared amplitude of a quantum mechanical state vector that describes the physics. As pointed out here, there are no known examples of classical probabilities that do not have such a quantum mechanical origin.

As pointed out in the article, whether you consider coin throws, betting on the digits of pi etc., the probabilities can always be shown to be purely of quantum mechanical origin, arising from the Born rule and not from the invoked classical reasoning based on insufficient knowledge. Classical probability theory is thus not fundamental, it should be derived as an appropriate approximation from quantum mechanics.

However, the mathematics of classical probability theory does work fundamentally different from the way the mathematics of quantum probability theory works. So, how can there then not be a fundamental difference? The answer is that the classical theory is a degenerate limit of quantum mechanical theory, in the classical limit, commutators of observable vanish allowing you to use mathematical reasoning that's not allowed within quantum theory. But you can do classical probability theory without problems within quantum probability theory and then take the classical limit.

Count Iblis
  • 10,396
  • 1
  • 25
  • 49