Averages, Law of Large Numbers, and Central Limit Theorem 7

60 阅读3分钟

Law of Large Numbers

We turn next to two theorems, the law of large numbers and the central limit theorem, which describe the behavior of the sample mean of i.i.d. r.v.s as the sample size grows. Let X1,X2,X3,X_1, X_2, X_3,\dots be i.i.d. with finite mean μ\mu and finite variance σ2\sigma^2. For all positive integers nn, let

Xˉn=X1++Xnn\bar{X}_n = \frac{X_1 + \dots + X_n}{n}

be the sample mean of X1X_1 through XnX_n. The sample mean is itself an r.v., with mean μ\mu and variance σ2/n\sigma^2/n:

E(Xˉn)=1nE(X1++Xn)=1n(E(X1)++E(Xn))=μ,Var(Xˉn)=1n2Var(X1++Xn)=1n2(Var(X1)++Var(Xn))=σ2n.\begin{align*} E(\bar{X}_n) &= \frac{1}{n} E(X_1 + \dots + X_n) = \frac{1}{n} (E(X_1) + \dots + E(X_n)) = \mu,\\ \textrm{Var}(\bar{X}_n) &= \frac{1}{n^2} \textrm{Var}(X_1 + \dots + X_n) = \frac{1}{n^2} (\textrm{Var}(X_1) + \dots + \textrm{Var}(X_n)) = \frac{\sigma^2}{n}. \end{align*}

The law of large numbers (LLN) says that as nn grows, the sample mean Xˉn\bar{X}_n converges to the true mean μ\mu (in a sense that is explained below). LLN comes in two versions, which use slightly different definitions of what it means for a sequence of random variables to converge to a number. We will state both versions.

Theorem: Strong Law of Large Numbers

The sample mean Xˉn\bar{X}_n converges to the true mean μ\mu pointwise as nn \to \infty, with probability 1. In other words, the event Xˉnμ\bar{X}_n \to \mu has probability 11.

Theorem: Weak Law of Large Numbers

For all ϵ>0\epsilon > 0, P(Xˉnμ>ϵ)0P(|\bar{X}_n - \mu| > \epsilon) \to 0 as nn \to \infty. (This form of convergence is called convergence in probability.)

The law of large numbers is essential for simulations, statistics, and science. Consider generating ''data" from a large number of independent replications of an experiment, performed either by computer simulation or in the real world. Every time we use the proportion of times that something happened as an approximation to its probability, we are implicitly appealing to LLN. Every time we use the average value in the replications of some quantity to approximate its theoretical average, we are implicitly appealing to LLN.

Example Running Proportion of Heads

Let X1,X2,X_1, X_2, \dots be i.i.d. Bern(1/2)\textrm{Bern}(1/2). Interpreting the XjX_j as indicators of Heads in a string of fair coin tosses, Xˉn\bar{X}_n is the proportion of Heads after nn tosses. SLLN says that with probability 11, when the sequence of r.v.s Xˉ1,Xˉ2,Xˉ3,\bar{X}_1, \bar{X}_2, \bar{X}_3, \dots crystallizes into a sequence of numbers, the sequence of numbers will converge to 1/21/2. Mathematically, there are bizarre outcomes such as HHHHHH... and HHTHHTHHTHHT..., but collectively they have zero probability of occurring. WLLN says that for any ϵ>0\epsilon > 0, the probability of Xˉn\bar{X}_n being more than ϵ\epsilon away from 1/21/2 can be made as small as we like by letting nn grow.

As an illustration, we simulated six sequences of fair coin tosses and, for each sequence, computed Xˉn\bar{X}_n as a function of nn. Of course, in real life we cannot simulate infinitely many coin tosses, so we stopped after 300 tosses. The figure below plots Xˉn\bar{X}_n as a function of nn for each sequence.

5_lln.png

At the beginning, we can see that there is quite a bit of fluctuation in the running proportion of Heads. As the number of coin tosses increases, however, Var(Xˉn)\textrm{Var}(\bar{X}_n) gets smaller and smaller, and Xˉn\bar{X}_n approaches 1/21/2.

Central Limit Theorem

Let X1,X2,X3,X_1,X_2,X_3,\dots be i.i.d. with mean μ\mu and variance σ2\sigma^2. The law of large numbers says that as nn \to \infty, Xˉn\bar{X}_n converges to the constant μ\mu (with probability 11). But what is its distribution along the way to becoming a constant? This is addressed by the central limit theorem (CLT), which, as its name suggests, is a limit theorem of central importance in statistics.

The CLT states that for large nn, the distribution of Xˉn\bar{X}_n after standardization approaches a standard Normal distribution. By standardization, we mean that we subtract μ\mu, the expected value of Xˉn\bar{X}_n, and divide by σ/n\sigma/\sqrt{n}, the standard deviation of Xˉn\bar{X}_n.

Theorem: Central Limit Theorem

As nn \to \infty,

n(Xˉnμσ)N(0,1) in distribution.\sqrt{n}\left(\frac{\bar{X}_n - \mu}{\sigma}\right) \to \mathcal{N}(0,1) \textrm{ in distribution.}

In words, the CDF of the left-hand side approaches Φ\Phi, the CDF of the standard Normal distribution.

The CLT is an asymptotic result, telling us about the limiting distribution of Xˉn\bar{X}_n as nn \to \infty, but it also suggests an approximation for the distribution of Xˉn\bar{X}_n when nn is a finite large number.

Central limit theorem, approximation form.

For large nn, the distribution of Xˉn\bar{X}_n is approximately N(μ,σ2/n)\mathcal{N}(\mu,\sigma^2/n). Of course, we already knew from properties of expectation and variance that Xˉn\bar{X}_n has mean μ\mu and variance σ2/n\sigma^2/n; the central limit theorem gives us the additional information that Xˉn\bar{X}_n is approximately Normal with said mean and variance.

Let's take a moment to admire the generality of this result. The distribution of the individual XjX_j can be anything in the world, as long as the mean and variance are finite. We could have a discrete distribution like the Binomial, a bounded continuous distribution, or a distribution with multiple peaks and valleys. No matter what, the act of averaging will cause Normality to emerge. In the figure below we show histograms of the distribution of Xˉn\bar{X}_n for four different starting distributions and for n=1,5,30,100n=1,5,30,100. We can see that as nn increases, the distribution of Xˉn\bar{X}_n starts to look Normal, regardless of the distribution of the XjX_j.

5_clt.png

This does not mean that the distribution of the XjX_j is irrelevant, however. If the XjX_j have a highly skewed or multimodal distribution, we may need nn to be very large before the Normal approximation becomes accurate; at the other extreme, if the XjX_j are already i.i.d. Normals, the distribution of Xˉn\bar{X}_n is exactly N(μ,σ2/n)\mathcal{N}(\mu,\sigma^2/n) for all nn. Since there are no infinite datasets in the real world, the quality of the Normal approximation for finite nn is an important consideration.

The CLT says that the sample mean Xˉn\bar{X}_n is approximately Normal, but since the sum Wn=X1++Xn=nXˉnW_n = X_1 + \dots + X_n = n \bar{X}_n is just a scaled version of Xˉn\bar{X}_n, the CLT also implies WnW_n is approximately Normal. If the XjX_j have mean μ\mu and variance σ2\sigma^2WnW_n has mean σ2\sigma^2 and variance nσ2n\sigma^2. The CLT then states that for large nn,

WnN(nμ,nσ2).W_n \overset{\cdot}\sim \mathcal{N}(n\mu,n\sigma^2).

This is completely equivalent to the approximation for Xˉn\bar{X}_n, but it can be useful to state it in this form because many of the named distributions we have studied can be considered as a sum of i.i.d. r.v.s.

Example Poisson Convergence to Normal

屏幕截图 2023-07-04 193828.png

Example Binomial Convergence to Normal

屏幕截图 2023-07-04 194040.png