Averages, Law of Large Numbers, and Central Limit Theorem 6

67 阅读4分钟

Expectation of a Continuous Random Variable

The definition of expectation for continuous r.v.s is analogous to the definition for discrete r.v.s; we just replace the sum with an integral and the PMF with the PDF.

Definition: Expectation of a Continuous r.v.

The expected value (also called the expectation or mean) of a continuous r.v. XX with PDF ff is

E(X)=xf(x)dx.E(X) = \int_{-\infty}^\infty x f(x) dx.

As in the discrete case, the expectation of a continuous r.v. may or may not exist. When discussing expectations, it would be very tedious to have to add ''(if it exists)" after every mention of an expectation not yet shown to exist, so we will often leave this implicit.

The integral is taken over the entire real line, but if the support of XX is not the entire real line we can just integrate over the support.

Linearity of expectation holds for continuous r.v.s, just as it did for discrete r.v.s. LOTUS also holds for continuous r.v.s, replacing the sum with an integral and the PMF with the PDF:

Theorem: LOTUS, Continuous

If XX is a continuous r.v. with PDF ff and gg is a function from R\mathbb{R} to R\mathbb{R}, then

E(g(X))=g(x)f(x)dx.E(g(X)) = \int_{-\infty}^\infty g(x) f(x) dx.

Example Mean and Variance of a Uniform r.v.

Let's derive the mean and variance of UUnif(a,b)U \sim \textrm{Unif}(a,b). The expectation is extremely intuitive: the PDF is constant, so its balancing point should be the midpoint of (a,b)(a,b). This is exactly what we find by using the definition of expectation for continuous r.v.s:

E(U)=abx1badx=1ba(b22a22)=a+b2.E(U) = \int_a^b x \cdot \frac{1}{b-a} dx= \frac{1}{b-a} \left(\frac{b^2}{2} - \frac{a^2}{2}\right) = \frac{a+b}{2}.

For the variance, we first find E(U2)E(U^2) using the continuous version of LOTUS:

E(U2)=abx21badx=13b3a3ba.E(U^2) = \int_a^b x^2 \frac{1}{b-a} dx = \frac{1}{3} \cdot \frac{b^3 - a^3}{b-a}.

Then

Var(U)=E(U2)(EU)2=13b3a3ba(a+b2)2,\textrm{Var}(U) = E(U^2) - (EU)^2 = \frac{1}{3} \cdot \frac{b^3 - a^3}{b-a} - \left(\frac{a+b}{2}\right)^2,

which reduces, after factoring b3a3=(ba)(a2+ab+b2)b^3-a^3=(b-a)(a^2+ab+b^2) and simplifying, to

Var(U)=(ba)212.\textrm{Var}(U) = \frac{(b-a)^2}{12}.

Example Mean and Variance of a Normal r.v.

Next let's derive the mean and variance of a Normal r.v., showing that a N(μ,σ2)\mathcal{N}(\mu,\sigma^2) r.v. does indeed have mean μ\mu and variance σ2\sigma^2. To start, let's consider the standard Normal. By symmetry, its mean must be 0. We can also see this symmetry by looking at the definition of E(Z)E(Z):

E(Z)=12πzez2/2dz,E(Z) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty ze^{-z^2/2} dz,

and since g(z)=zez2/2g(z) = ze^{-z^2/2} is an odd function, the area under gg from -\infty to 0 cancels the area under gg from 0 to %\infty%. Therefore E(Z)=0E(Z) = 0. In fact, the same argument shows that E(Zn)=0E(Z^n) = 0 for any odd positive number nn. For the variance, we can use LOTUS.

Var(Z)=E(Z2)(EZ)2=E(Z2)=12πz2ez2/2dz=22π0z2ez2/2dz\begin{align*} \textrm{Var}(Z) &= E(Z^2) - (EZ)^2 = E(Z^2) \\ &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty z^2 e^{-z^2/2} dz \\ &= \frac{2}{\sqrt{2\pi}} \int_0^\infty z^2 e^{-z^2/2} dz \end{align*}

The last step uses the fact that z2ez2/2z^2 e^{-z^2/2} is an even function. Now we use integration by parts with u=zu = z and dv=zez2/2dzdv = ze^{-z^2/2}dz, so du=dzdu = dz and v=ez2/2v = -e^{-z^2/2}:

Var(Z)=22π(zez2/20+0ez2/2dz)=22π(0+2π2)=1.\begin{align*} \textrm{Var}(Z) &= \frac{2}{\sqrt{2\pi}} \left(-ze^{-z^2/2}\bigg|_0^{\infty} + \int_0^\infty e^{-z^2/2}dz \right) \\ &= \frac{2}{\sqrt{2\pi}} \left(0 + \frac{\sqrt{2\pi}}{2}\right) \\ &= 1. \end{align*}

The first term of the integration by parts equals 0 because ez2/2e^{-z^2/2} decays much faster than zz grows, and the second term is 2π/2\sqrt{2\pi}/2 because it's half of the total area under ez2/2e^{-z^2/2}, which we've already proved is 2π\sqrt{2\pi}. So the standard Normal distribution has mean 0 and variance 1.

For XN(μ,σ2)X \sim \mathcal{N}(\mu,\sigma^2), we can write XN(μ,σ2)X \sim \mathcal{N}(\mu,\sigma^2) with ZN(0,1)Z \sim \mathcal{N}(0,1), and then

E(X)=μ+σ0=μ,E(X) = \mu + \sigma \cdot 0 = \mu,
Var(X)=σ2Var(Z)=σ2.\textrm{Var}(X) = \sigma^2 \textrm{Var}(Z) = \sigma^2.

Example Mean and variance of an Exponential r.v.

To obtain the mean and variance of an Exponential r.v., let's start by finding the mean and variance of XExpo(1)X \sim \textrm{Expo}(1):

E(X)=0xexdx=1,E(X) = \int_0^\infty x e^{-x} dx = 1,

and by LOTUS,

E(X2)=0x2exdx=2,E(X^2) = \int_0^\infty x^2 e^{-x} dx = 2,

where the integrals were done using standard integration by parts calculations. Then

Var(X)=E(X2)(EX)2=1.\textrm{Var}(X) = E(X^2) - (EX)^2 = 1.

Now let Y=X/λExpo(λ)Y = X / \lambda \sim \textrm{Expo}(\lambda). Then

E(Y)=1λE(X)=1λ,Var(Y)=1λ2Var(X)=1λ2,\begin{align*} E(Y) &= \frac{1}{\lambda} E(X) = \frac{1}{\lambda},\\ \textrm{Var}(Y) &= \frac{1}{\lambda^2} \textrm{Var}(X) = \frac{1}{\lambda^2}, \end{align*}

Example Blissville and Blotchville

Fred lives in Blissville, where buses always arrive exactly on time, with the time between successive buses fixed at 10 minutes. Having lost his watch, he arrives at the bus stop at a uniformly random time on a certain day (assume that buses run 24 hours a day, every day, and that the time that Fred arrives is independent of the bus arrival process).

(a) What is the distribution of how long Fred has to wait for the next bus? What is the average time that Fred has to wait?

(b) Given that the bus has not yet arrived after 6 minutes, what is the probability that Fred will have to wait at least 3 more minutes?

(c) Fred moves to Blotchville, a city with inferior urban planning and where buses are much more erratic. Now, when any bus arrives, the time until the next bus arrives is an Exponential random variable with mean 10 minutes. Fred arrives at the bus stop at a random time, not knowing how long ago the previous bus came. What is the distribution of Fred's waiting time for the next bus? What is the average time that Fred has to wait?

(d) When Fred complains to a friend how much worse transportation is in Blotchville, the friend says: ''Stop whining so much! You arrive at a uniform instant between the previous bus arrival and the next bus arrival. The average length of that interval between buses is 10 minutes, but since you are equally likely to arrive at any time in that interval, your average waiting time is only 5 minutes." Fred disagrees, both from experience and from solving Part (c) while waiting for the bus. Explain what is wrong with the friend's reasoning.

Solution: (a) The distribution is Uniform on (0,10)(0,10), so the mean is 5 minutes.

(b) Let TT be the waiting time. Then

P(T6+3T>6)=P(T9,T>6)P(T>6)=P(T9)P(T>6)=1/104/10=14.P(T \geq 6 + 3 | T > 6) = \frac{P(T \geq 9, T >6)}{P(T > 6)}=\frac{P(T \geq 9)}{P(T>6)} = \frac{1/10}{4/10}=\frac{1}{4}.

In particular, Fred's waiting time in Blissville is not memoryless; conditional on having waited 6 minutes already, there's only a 1/41/4 chance that he'll have to wait another 3 minutes, whereas if he had just showed up, there would be a P(T3)=7/10P(T \geq 3) = 7/10 chance of having to wait at least 3 minutes.

(c) By the memoryless property, the distribution is Exponential with parameter 1/101/10 (and mean 10 minutes) regardless of when Fred arrives; how much longer the next bus will take to arrive is independent of how long ago the previous bus arrived. The average time that Fred has to wait is 10 minutes.

(d) Fred's friend is making the mistake, of replacing a random variable (the time between buses) by its expectation (10 minutes), thereby ignoring the variability in interarrival times. The average length of a time interval between two buses is 10 minutes, but Fred is not equally likely to arrive at any of these intervals: Fred is more likely to arrive during a long interval between buses than to arrive during a short interval between buses. For example, if one interval between buses is 50 minutes and another interval is 5 minutes, then Fred is 10 times more likely to arrive during the 50-minute interval. This phenomenon is known as length-biasing, and it comes up in many real-life situations. For example, asking randomly chosen mothers how many children they have yields a different distribution from asking randomly chosen people how many siblings they have, including themselves. Asking students the sizes of their classes and averaging those results may give a much higher value than taking a list of classes and averaging the sizes of each (this is called the class size paradox).