Law of Large Numbers

We turn next to two theorems, the law of large numbers and the central limit theorem, which describe the behavior of the sample mean of i.i.d. r.v.s as the sample size grows. Let $X_1, X_2, X_3,\dots$ be i.i.d. with finite mean $\mu$ and finite variance $\sigma^2$ . For all positive integers $n$ , let

\bar{X}_n = \frac{X_1 + \dots + X_n}{n}

be the sample mean of $X_1$ through $X_n$ . The sample mean is itself an r.v., with mean $\mu$ and variance $\sigma^2/n$ :

\begin{align*} E(\bar{X}_n) &= \frac{1}{n} E(X_1 + \dots + X_n) = \frac{1}{n} (E(X_1) + \dots + E(X_n)) = \mu,\\ \textrm{Var}(\bar{X}_n) &= \frac{1}{n^2} \textrm{Var}(X_1 + \dots + X_n) = \frac{1}{n^2} (\textrm{Var}(X_1) + \dots + \textrm{Var}(X_n)) = \frac{\sigma^2}{n}. \end{align*}

The law of large numbers (LLN) says that as $n$ grows, the sample mean $\bar{X}_n$ converges to the true mean $\mu$ (in a sense that is explained below). LLN comes in two versions, which use slightly different definitions of what it means for a sequence of random variables to converge to a number. We will state both versions.

Theorem: Strong Law of Large Numbers

The sample mean $\bar{X}_n$ converges to the true mean $\mu$ pointwise as $n \to \infty$ , with probability 1. In other words, the event $\bar{X}_n \to \mu$ has probability $1$ .

Theorem: Weak Law of Large Numbers

For all $\epsilon > 0$ , $P(|\bar{X}_n - \mu| > \epsilon) \to 0$ as $n \to \infty$ . (This form of convergence is called convergence in probability.)

The law of large numbers is essential for simulations, statistics, and science. Consider generating ''data" from a large number of independent replications of an experiment, performed either by computer simulation or in the real world. Every time we use the proportion of times that something happened as an approximation to its probability, we are implicitly appealing to LLN. Every time we use the average value in the replications of some quantity to approximate its theoretical average, we are implicitly appealing to LLN.

Example Running Proportion of Heads

Let $X_1, X_2, \dots$ be i.i.d. $\textrm{Bern}(1/2)$ . Interpreting the $X_j$ as indicators of Heads in a string of fair coin tosses, $\bar{X}_n$ is the proportion of Heads after $n$ tosses. SLLN says that with probability $1$ , when the sequence of r.v.s $\bar{X}_1, \bar{X}_2, \bar{X}_3, \dots$ crystallizes into a sequence of numbers, the sequence of numbers will converge to $1/2$ . Mathematically, there are bizarre outcomes such as HHHHHH... and HHTHHTHHTHHT..., but collectively they have zero probability of occurring. WLLN says that for any $\epsilon > 0$ , the probability of $\bar{X}_n$ being more than $\epsilon$ away from $1/2$ can be made as small as we like by letting $n$ grow.

As an illustration, we simulated six sequences of fair coin tosses and, for each sequence, computed $\bar{X}_n$ as a function of $n$ . Of course, in real life we cannot simulate infinitely many coin tosses, so we stopped after 300 tosses. The figure below plots $\bar{X}_n$ as a function of $n$ for each sequence.

At the beginning, we can see that there is quite a bit of fluctuation in the running proportion of Heads. As the number of coin tosses increases, however, $\textrm{Var}(\bar{X}_n)$ gets smaller and smaller, and $\bar{X}_n$ approaches $1/2$ .

Central Limit Theorem

Let $X_1,X_2,X_3,\dots$ be i.i.d. with mean $\mu$ and variance $\sigma^2$ . The law of large numbers says that as $n \to \infty$ , $\bar{X}_n$ converges to the constant $\mu$ (with probability $1$ ). But what is its distribution along the way to becoming a constant? This is addressed by the central limit theorem (CLT), which, as its name suggests, is a limit theorem of central importance in statistics.

The CLT states that for large $n$ , the distribution of $\bar{X}_n$ after standardization approaches a standard Normal distribution. By standardization, we mean that we subtract $\mu$ , the expected value of $\bar{X}_n$ , and divide by $\sigma/\sqrt{n}$ , the standard deviation of $\bar{X}_n$ .

Theorem: Central Limit Theorem

As $n \to \infty$ ,
$\sqrt{n}\left(\frac{\bar{X}_n - \mu}{\sigma}\right) \to \mathcal{N}(0,1) \textrm{ in distribution.}$
In words, the CDF of the left-hand side approaches $\Phi$ , the CDF of the standard Normal distribution.

The CLT is an asymptotic result, telling us about the limiting distribution of $\bar{X}_n$ as $n \to \infty$ , but it also suggests an approximation for the distribution of $\bar{X}_n$ when $n$ is a finite large number.

Central limit theorem, approximation form.

For large $n$ , the distribution of $\bar{X}_n$ is approximately $\mathcal{N}(\mu,\sigma^2/n)$ . Of course, we already knew from properties of expectation and variance that $\bar{X}_n$ has mean $\mu$ and variance $\sigma^2/n$ ; the central limit theorem gives us the additional information that $\bar{X}_n$ is approximately Normal with said mean and variance.

Let's take a moment to admire the generality of this result. The distribution of the individual $X_j$ can be anything in the world, as long as the mean and variance are finite. We could have a discrete distribution like the Binomial, a bounded continuous distribution, or a distribution with multiple peaks and valleys. No matter what, the act of averaging will cause Normality to emerge. In the figure below we show histograms of the distribution of $\bar{X}_n$ for four different starting distributions and for $n=1,5,30,100$ . We can see that as $n$ increases, the distribution of $\bar{X}_n$ starts to look Normal, regardless of the distribution of the $X_j$ .

This does not mean that the distribution of the $X_j$ is irrelevant, however. If the $X_j$ have a highly skewed or multimodal distribution, we may need $n$ to be very large before the Normal approximation becomes accurate; at the other extreme, if the $X_j$ are already i.i.d. Normals, the distribution of $\bar{X}_n$ is exactly $\mathcal{N}(\mu,\sigma^2/n)$ for all $n$ . Since there are no infinite datasets in the real world, the quality of the Normal approximation for finite $n$ is an important consideration.

The CLT says that the sample mean $\bar{X}_n$ is approximately Normal, but since the sum $W_n = X_1 + \dots + X_n = n \bar{X}_n$ is just a scaled version of $\bar{X}_n$ , the CLT also implies $W_n$ is approximately Normal. If the $X_j$ have mean $\mu$ and variance $\sigma^2$ , $W_n$ has mean $\sigma^2$ and variance $n\sigma^2$ . The CLT then states that for large $n$ ,

W_n \overset{\cdot}\sim \mathcal{N}(n\mu,n\sigma^2).

This is completely equivalent to the approximation for $\bar{X}_n$ , but it can be useful to state it in this form because many of the named distributions we have studied can be considered as a sum of i.i.d. r.v.s.

Example Poisson Convergence to Normal

屏幕截图 2023-07-04 193828.png

Example Binomial Convergence to Normal

屏幕截图 2023-07-04 194040.png

Averages, Law of Large Numbers, and Central Limit Theorem 7

Law of Large Numbers

Example Running Proportion of Heads

Central Limit Theorem

Example Poisson Convergence to Normal

Example Binomial Convergence to Normal