Where does the "error propagation formula" $(\Delta \theta)^2=(\Delta M)^2/|\partial_\theta\langle M\rangle|^2$ come from, in estimation theory?

Question

Consider the single parameter estimation setting, where we have a distribution depending on $\theta$ and we're looking for a "good" estimator for $\theta$. A commonly mentioned strategy, found e.g. in Eq. (7) of [TA2014], is to measure some observable $M$, thus obtaining an estimator for $\theta$ with variance $$(\Delta \theta)^2 = \frac{(\Delta M)^2}{\lvert\partial_\theta\langle M\rangle\rvert^2} \equiv \frac{\operatorname{tr}(M^2\rho_\theta)-\operatorname{tr}(M\rho_\theta)^2}{\lvert\operatorname{tr}(M \partial_\theta\rho_\theta)\rvert^2}.\tag{7}$$ To some degree, I can see the idea behind this formula: thinking of $f(\theta)=\operatorname{tr}(M\rho_\theta)$ as a function of $\theta$, and defining the "estimator" as $\theta=f^{-1}(\operatorname{tr}(M\rho_\theta))$, using the naive error propagation formula we'd get $$(\Delta\theta)^2 = \frac{(\Delta M)^2}{|f'(\theta)|^2} = \frac{(\Delta M)^2}{|\partial_\theta \operatorname{tr}(M\rho_\theta)|^2}.$$ This is using the standard error propagation formula: if $A=f(B)$ for two physical quantities $A,B$, then $\Delta A=f'(B)\Delta B$.

However, how would we formalise more precisely the estimation strategy underlying this formula in the quantum case? In particular, how exactly would the estimator associated to this strategy be defined? My naive attempt would be to interpret $\theta=f^{-1}(\langle M\rangle)$ in the single-shot regime as $$\hat\theta(k) = f^{-1}(\lambda_k),$$ where $k$ labels the possible measurement outcomes measuring in the eigenbasis of $M=\sum_k \lambda_k |u_k\rangle\!\langle u_k|$, and $\lambda_k$ the eigenvalues. I'm not totally sure it's exactly the same thing, but this strategy seems in essence an application of the method of moments. Another possibility is to interpret the strategy in a fashion more similar to the method of moments, which would mean to first estimate the empirical mean estimator for $\langle M\rangle$, write it as $\bar M$, and then define the estimator as $$\hat\theta(X) = f^{-1}(\bar M),$$ where $X$ is the collected statistic used to estimate the parameter.

Assuming this is actually the strategy used in this context, how do we get the result $$\operatorname{Var}[\hat\theta|\theta_0] = \frac{\operatorname{Var}[M|\theta_0]}{\lvert \partial_\theta\operatorname{tr}(M\rho_\theta)|_{\theta=\theta_0} \rvert^2}, \quad \operatorname{Var}[M|\theta_0]\equiv \operatorname{tr}(M^2\rho_{\theta_0})-\operatorname{tr}(M\rho_{\theta_0})^2,$$ which should be the formally precise way to state Eq. (7) in the above reference?

Quantum Mechanic · Answer 1 · 2024-04-18T01:27:34.697

The basic starting point will be to distinguish between "a measurement" in quantum theory and "the result of a measurement" or "the measurement of an operator." When one talks about measuring an operator $M$, one is really talking about measuring its expectation value (let's call it $m=\langle M\rangle=\mathrm{Tr}(\rho M)$ to avoid carets on operators like $\hat{M}$, in case we want to use carets to refer to estimators). But quantum theory describes measurements by POVM elements! How can we reconcile this?

A "measurement of $M$" $\mathcal{M}$ actually has the POVM elements $M_i$ such that $$M=\sum_{i}m_i M_i,\quad M_i=M_i^\dagger, M_i\succeq0, \sum_i M_i=\mathbb{I},$$ where $\{m_i\}$ are the eigenvalues of $M$. According to the Born rule, the probability of the $i$th measurement outcome is $$p_i=\langle M_i\rangle.$$ Then, finding the expectation value of $M$ really means that you have performed $\mathcal{M}$, collected the probabilities of each result "$m_i$" and weighed them appropriately as $$m=\sum_i p_i m_i.$$ Note that this equals $\langle M\rangle$, as required.

So the basic starting point is actually that one is performing a specific POVM $\mathcal{M}$ to measure/estimate the quantity $m$. From that, one may try to infer another quantity (or parameter) $\theta$. Propagation of uncertainty, change of variables, etc. all assume that the measurement itself doesn't change (of course they do - they are classical, so they usually take the probability distribution as the starting point). It essentially asks "given the same POVM $\mathcal{M}$, what is the uncertainty on a measurement of $\theta$?" And the answer to that question has very little to do with quantum theory: it is just $$\Delta^2\theta=\left(\frac{\partial m}{\partial \theta}\right)^{-2}\Delta^2 m.$$

Can we make this about quantum theory, just because we like to? We write the Fisher information that our POVM $\mathcal{M}$ yields about the parameter of interest $m$ (that we recall is the expectation value of an operator $M$ that we colloquially say that we are measuring) $$F(m)=\sum_i \frac{1}{p_i}\left(\frac{\partial p_i}{\partial m}\right)^2.$$ The Cramér-Rao bound says that any uncertainty on $m$ will be bound by $$\Delta^2 m\geq 1/F(m).$$ What if we want to express the Fisher information in terms of a new parameter $\theta$, such that $\Delta^2 \theta\geq 1/F(\theta)$? Well, we can do chain rule on the partial derivatives to find $$F(m)=\sum_i \frac{1}{p_i}\left(\frac{\partial p_i}{\partial \theta}\frac{\partial \theta}{\partial m}\right)^2=F(\theta)\left(\frac{\partial \theta}{\partial m}\right)^2$$ and so $\Delta^2 \theta\geq 1/F(\theta)=\left(\frac{\partial \theta}{\partial m}\right)^2/F(m)=\left(\frac{\partial m}{\partial \theta}\right)^{-2}/F(m)$.

What is the missing piece, to make these equations the same and answer the question? That an estimator saturating the bound for $m$ will attain equality $\Delta^2 m=1/F(m)$. With that exact same estimator, we have $$\Delta^2 \theta\geq \left(\frac{\partial m}{\partial \theta}\right)^{-2}/F(m)=\left(\frac{\partial m}{\partial \theta}\right)^{-2}\Delta^2 m.$$ Immediately, we see that there is still an inequality! The estimator that saturated the bound for $m$ only provides a lower bound for $\Delta^2\theta$; it is possible that the estimator will produce a value of $\Delta^2\theta$ that is, in fact, greater than the bound $\left(\frac{\partial m}{\partial \theta}\right)^{-2}\Delta^2 m$. However, for any Cramér-Rao bound for sufficiently regular probability distributions, there exists some estimator that will saturate the bound, so in practice there will always be some estimator that achieves $\Delta^2\theta=\left(\frac{\partial m}{\partial \theta}\right)^{-2}\Delta^2 m$. Since this is all achieved without changing the POVM and only by changing the classical postprocessing of the measured probability distribution, one can normally ignore these nuances.

Note, finally, if one wants to use the same estimator to get the minimum uncertainty on both of $m$ and $\theta$, that is probably not always a good idea (just like how the extremum $\theta_0$ of $\theta(m)$ does not guarantee the presence of an extremum of $m(\theta)$ at $m(\theta_0)$). However, if one is using an invariant estimator like the maximum likelihood estimator (the latter can saturate the Cramér-Rao bound asymptotically), then the extrema will coincide. The estimator of the function will be the same thing as the function of the estimator. Thus, classical statistics guarantees, under the correct conditions, that the Cramér-Rao lower bounds for the uncertainties of functionally related parameters will be related via the error propagation formula, and that the bounds can be simultaneously saturated with the same estimator.

The postscript is that this was all classical, with quantum providing the probability distribution. Uncertainties can thus be computed using the state and the POVM elements in the usual way, but I did not do that anywhere in the answer because it does not depend on those computations.

Where does the "error propagation formula" $(\Delta \theta)^2=(\Delta M)^2/|\partial_\theta\langle M\rangle|^2$ come from, in estimation theory?

1 Answers1

Linked