Continuous Random Variables 2

140 阅读4分钟

Universality of the Uniform

In this section, we will discuss a remarkable property of the Uniform distribution: given a Unif(0,1)\textrm{Unif}(0,1) r.v., we can construct an r.v. with any continuous distribution we want. Conversely, given an r.v. with an arbitrary continuous distribution, we can create a Unif(0,1)\textrm{Unif}(0,1) r.v. We call this the universality of the Uniform, because it tells us the Uniform is a universal starting point for building r.v.s with other distributions. Universality of the Uniform also goes by many other names, such as the probability integral transform, inverse transform sampling, the quantile transformation, and even the fundamental theorem of simulation.

To keep the proofs simple, we will state the universality of the Uniform for a case where we know that the inverse of the desired CDF exists. More generally, similar ideas can be used to simulate a random draw from any desired CDF, as a function of a Unif(0,1)\textrm{Unif}(0,1) r.v.

Theorem: Universality of the Uniform

Let FF be a CDF which is a continuous function and strictly increasing on the support of the distribution. This ensures that the inverse function F1F^{-1} exists, as a function from (0,1)(0,1) to F1F^{-1}. We then have the following results.

  1. Let UUnif(0,1)U \sim \textrm{Unif}(0,1) and X=F1(U)X = F^{-1}(U). Then XX is an r.v. with CDF FF.
  2. Let XX be an r.v. with CDF FF. Then F(X)Unif(0,1)F(X) \sim \textrm{Unif}(0,1).

Let's make sure we understand what each part of the theorem is saying.

The first part of the theorem says that if we start with UUnif(0,1)U \sim \textrm{Unif}(0,1) and a CDF FF, then we can create an r.v. whose CDF is FF by plugging UU into the inverse CDF F1F^{-1}. Since F1F^{-1}is a function (known as the quantile function), UUis a random variable, and a function of a random variable is a random variable, F1(U)F^{-1}(U) is a random variable; universality of the Uniform says its CDF is FF.

The second part of the theorem goes in the reverse direction, starting from an r.v. XX whose CDF is FF and then creating a Unif(0,1)\textrm{Unif}(0,1) r.v. Again, FF is a function, XX is a random variable, and a function of a random variable is a random variable, so F(X)F(X) is a random variable. Since any CDF is between 00 and 11 everywhere, F(X)F(X) must take values between 00 and 11. Universality of the Uniform says that the distribution of F(X)F(X) is Uniform on (0,1)(0,1).

Warning

The second part of universality of the Uniform involves plugging a random variable XX into its own CDF FF. This may seem strangely self-referential, but it makes sense because FF is just a function (that satisfies the properties of a valid CDF), so F(X)F(X) is a function of a random variable and hence is itself a random variable. There is a potential notational confusion, however: F(x)=P(Xx)F(x) = P(X \leq x) by definition, but it would be incorrect to say ''F(X)=P(XX)=1F(X) = P(X \leq X) = 1''. Rather, we should first find an expression for the CDF as a function of xx, then replace xx with XX to obtain a random variable. For example, if the CDF of XX is F(x)=1exF(x) = 1-e^{-x} for x>0x>0, then F(x)=1exF(x) = 1-e^{-x}.

Proof:

捕获.JPG

To gain more insight into what the quantile function F1F^{-1} and universality of the Uniform mean, let's consider an example that is familiar to millions of students: percentiles on an exam.

Example: Percentiles

捕获.JPG

Example: Universality with Logistic

捕获.JPG

Normal

The Normal distribution is a famous continuous distribution with a bell-shaped PDF. It is extremely widely used in statistics because of a theorem, the central limit theorem, which says that under very weak assumptions, the sum of a large number of i.i.d. random variables has an approximately Normal distribution, regardless of the distribution of the individual r.v.s. This means we can start with independent r.v.s from almost any distribution, discrete or continuous, but once we add up a bunch of them, the distribution of the resulting r.v. looks like a Normal distribution.

Definition: Standard Normal Distribution

A continuous r.v. ZZ is said to have the standard Normal distribution if its PDF φ\varphi is given by

φ(z)=12πez2/2,<z<.\varphi(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2}, \quad -\infty < z < \infty.

We write this as ZN(0,1)Z \sim \mathcal{N}(0,1).

The constant 12π\frac{1}{\sqrt{2\pi}} in front of the PDF may look surprising (why is something with π\pi needed in front of something with ee, when there are no circles in sight?), but it turns out to be what is needed to make the PDF integrate to 1. Such constants are called normalizing constants because they normalize the total area under the PDF to 1. The standard Normal CDF Φ\Phi is the accumulated area under the PDF:

Φ(z)=zφ(t)dt=z12πet2/2dt.\Phi(z) = \int_{-\infty}^z \varphi(t) dt = \int_{-\infty}^z \frac{1}{\sqrt{2\pi}} e^{-t^2/2}dt.

We need to leave this as an integral: it turns out to be mathematically impossible to find a closed-form expression for the antiderivative of φ\varphi, meaning that we cannot express Φ\Phi as a finite sum of more familiar functions like polynomials or exponentials. But closed-form or no, it's still a well-defined function: if we give Φ\Phi an input zz, it returns the accumulated area under the PDF from -\infty up to zz.

Notation

By convention, we use φ\varphi for the standard Normal PDF and PhiPhi for the standard Normal CDF. We will often use ZZ to denote a standard Normal r.v. The standard Normal PDF and CDF are plotted in the following figure. The PDF is bell-shaped and symmetric about 0, and the CDF is SS-shaped. These have the same general shape as the Logistic PDF and CDF that we saw in a couple of previous examples, but the Normal PDF decays to 00 much more quickly: notice that nearly all of the area under φ\varphi is between -3 and 3, whereas we had to go out to -5 and 5 for the Logistic PDF.

4_Normalpdfcdf.png

There are several important symmetry properties that can be deduced from the standard Normal PDF and CDF.

捕获.JPG

The general Normal distribution has two parameters, denoted μ\mu and σ2\sigma^2, which are the mean and variance (the mean and variance of a distribution are measures of the average and how spread out the distribution is, respectively; these are defined and explored in the next unit). Starting with a standard Normal r.v. ZN(0,1)Z \sim \mathcal{N}(0,1), we can convert to a Normal r.v. with any desired parameters μ\mu and σ2\sigma^2 by a location-scale transformation.

Definition: Normal Distribution

If ZN(0,1)Z \sim \mathcal{N}(0,1), then

X=μ+σZX = \mu + \sigma Z

is said to have the Normal distribution with mean parameter μ\mu and variance parameter σ2\sigma^2, for any real μ\mu and σ2\sigma^2 with σ>0\sigma >0. We denote this by XN(μ,σ2)X \sim \mathcal{N}(\mu,\sigma^2).

Of course, if we can get from ZZ to XX, then we can get from XX back to ZZ. The process of getting a standard Normal from a non-standard Normal is called, appropriately enough, standardization. For XN(μ,σ2)X \sim \mathcal{N}(\mu,\sigma^2), the standardized version of XX is

XμσN(0,1).\frac{X-\mu}{\sigma} \sim \mathcal{N}(0,1).

We can use standardization to find the CDF and PDF of XX in terms of the standard Normal CDF and PDF.

Theorem:Normal CDF and PDF

Let XN(μ,σ2)X \sim \mathcal{N}(\mu,\sigma^2). Then the CDF of XX is

F(x)=Φ(xμσ),F(x) = \Phi\left(\frac{x-\mu}{\sigma}\right),

and the PDF of XX is

f(x)=φ(xμσ)1σ.f(x) = \varphi\left(\frac{x-\mu}{\sigma}\right)\frac{1}{\sigma}.

Proof:

捕获.JPG

Three important benchmarks for the Normal distribution are the probabilities of falling within one, two, and three standard deviations of the mean parameter μ\mu. The 68-95-99.7 rule tells us that these probabilities are what the name suggests.

Theorem: 68-95-99.7 Rule

If XN(μ,σ2)X \sim \mathcal{N}(\mu,\sigma^2), then

P(Xμ<σ)0.68P(Xμ<2σ)0.95P(Xμ<3σ)0.997.\begin{align*} P(|X-\mu|< \sigma) &\approx 0.68\\ P(|X-\mu|< 2\sigma) &\approx 0.95\\ P(|X-\mu|< 3\sigma) &\approx 0.997. \end{align*}

Example

Let XN(1,4)X \sim \mathcal{N}(-1,4). What is P(X<3)P(|X| < 3), exactly (in terms of Φ\Phi) and approximately?

Solution:

捕获.JPG

One more useful property of the Normal distribution is that the sum of independent Normals is Normal.

Theorem: Sum of Independent Normals

If X1N(μ1,σ12)X_1 \sim \mathcal{N}(\mu_1,\sigma_1^2) and X2N(μ2,σ22)X_2 \sim \mathcal{N}(\mu_2,\sigma_2^2) are independent, then

X1+X2N(μ1+μ2,σ12+σ22),X1X2N(μ1μ2,σ12+σ22).\begin{align*} X_1 + X_2 &\sim \mathcal{N}(\mu_1+\mu_2,\sigma_1^2+\sigma_2^2), \\ X_1 - X_2 &\sim \mathcal{N}(\mu_1-\mu_2,\sigma_1^2+\sigma_2^2). \end{align*}