Why does Law of Large Numbers work?

Question

Often I see books and professors reasoning that, in order to make a good experiment, many measurements are necessary because then the average value of a quantity is closer to the expected value because of the Law of Large Numbers.

But the actual law (either the weak or the strong version) only gives the limit as $n\to\infty$ where $n$ is the number of measurements. In reality we only deal with finite amounts, so what can we say about this case?

Also, I read through this post, but it fails to answer my question. To paraphrase Littlewood, what can we say about the rate of convergence?

score 6 · Accepted Answer · edited Apr 13 '17 at 12:40

This is the exact reason why we do statistical hypothesis confidence testing. In essence, the confidence interval we get from this test is a quantitative measure of "how far we've converged". For example, consider an experiment to test whether a coin is imbalanced or not. Our null hypothesis is that it is not: in symbols, our hypothesis is ${\rm Pr}(H) = \frac{1}{2}$: the probability of a head is a half.

Now we work out from the binomial distribution the limits on the number of heads you will see in an experiment with $N$ tosses, given the null hypothesis and check whether the observed number falls within it. The interval wherein the observed number of heads falls with a probability of, say 0.999, is then calculated: for small $N$, you'll need to calculate this brute force with the binomial distribution. As $N$ gets bigger, we use Stirling's approximation to the factorial, which shows that the binomial distribution becomes the normal distribution. Your 0.999 confidence interval, as a proportion of $N$, gets smaller and smaller as $N\to\infty$, and these calculations are exactly what you use to see how fast it does so.

I like to call the law of large numbers the "law of pointier and pointier distributions" because this aspect of the convergence shows us why a weak form of the second law of thermodynamics is true, as I discuss in the linked answer: the law of large numbers says that in the large number limit, there are samples that look almost like the maximum likelihood sample, and almost nothing else. In different words: there are microstates which look almost exactly the same as the maximum entropy ones, and almost nothing else. Therefore, almost certainly, a system will be found near its maximum entropy microstate, and, if by chance a system is found at one of the seldom, significantly-less-than-maximum entropy states, it will almost certainly progress towards the maximum entropy microstate, just from a "random walk".

Jens · Answer 2 · 2014-10-13T09:21:18.430

In general, we can say nothing about finite $n$, but most of the time, we can safely assume some "niceness" of the distributions in question.

If, for example, we assume a finite Variance $\sigma^2$ (a quite common feature), we could use Chebyshev's inequality for a rough error estimation of the form

$$P(|\bar{X_n} - µ| > \alpha) \leq \frac{\sigma^2}{\alpha^2n}. $$

Stronger (but still reasonable) assumptions lead to stronger inequalities, see e.g. Cramér theorem (the second one).

nabla · Answer 3 · 2014-10-13T10:03:36.700

What you can say is that you have a distribution of the results you are going to get (be it a discrete or continuous random variable), and when you calculate the average of a large sample, you are adding the random variables and multiplying by a constant. The addition of random variables translates into a convolution of the probability density functions, which when $n \rightarrow \infty$ will converge into a normal random variable (that is, somehow, a way to prove the LLN, although you could call it overkill). And for slightly stronger hypothesis than those in the central limit theorem, you have the Berry-Esseen theorem, which gives you a convergence rate of $n^{-1/2}$ to the Normal distribution using the Kolmogorov-Smirnov distance (sup norm).

In any case, if you want the exact "confidence" in a particular case, your only option is to convolve $n$ times the particular distribution you are using, and get the confidence margins.

Why does Law of Large Numbers work?

3 Answers3