3

If I have $N$ repeated occurrences of a measurement $x$ with uncorrelated errors and identical uncertainties $u_x$, and take the mean $\langle x\rangle$, the uncertainty on the mean becomes:

$$u_{\langle x\rangle} = \frac{u_x}{\sqrt{N}}$$

where $N$ is the number of measurements I have taken. This is derived from the law of propagation of uncertainties (for example, see this answer).

If I take the median instead of the mean, I'm mathematically only propagating information from one or two data points, which would mean $N=1$ or $N=2$. But that seems unfair, for surely the principle should hold that if I repeat the measurement many times and take the median, the resulting uncertainty goes down.

Is there any established way of propagating the uncertainty when taking the median of a series of measurements?

gerrit
  • 3,035

1 Answers1

4

I do not think you are estimating the uncertainty of your median correctly.

The ratio of the variance of the mean to the variance of the median is given by $4n/[\pi(2n+1)]$, where $N=(2n+1)$ is the total number of data points in the sample you have used to construct the median. I think this approximation is only true if your outliers are symmetric.

Thus the uncertainty in the median is given by $$ \Delta x_{Med} = \Delta \bar{x} \sqrt{\pi(2n+1)/4n},$$

In the limit of large $N$ (and hence large $n$), this tends to $$ \Delta x_{Med} = \Delta \bar{x} \sqrt{\pi/2},$$

The uncertainty in the mean $\Delta \bar{x}$ is the standard deviation of the points divided by $\sqrt{N}$ and therefore the precision of your estimate will improve as $\sqrt{N}$.

This information is given at http://mathworld.wolfram.com/StatisticalMedian.html

If you have outliers, then the formula that you begin your question with is incorrect (see the first sentence of the answer you refer to). The data points will not be normally distributed according to your estimate of their uncertainties and the standard error of the mean that I quote above will be larger than $u_x/\sqrt{N}$ because the standard deviation of the data will be larger than $u_x$ (I refer you also to the last paragraph of my answer to that question).

ProfRob
  • 141,325