Averages, Law of Large Numbers, and Central Limit Theorem 5

99 阅读3分钟

Poisson

The last famous discrete distribution that we'll discuss in this unit is the Poisson, which is an extremely popular distribution for modeling discrete data. We'll introduce its PMF, mean, and variance, and then discuss its story in more detail.

Definition: Poisson Distribution

An r.v. XX has the Poisson distribution with parameter λ\lambda, where λ>0\lambda > 0, if the PMF of XX is

P(X=k)=eλλkk!,k=0,1,2,.P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k=0, 1, 2, \dots.

We write this as XPois(λ)X \sim \textrm{Pois}(\lambda).

This is a valid PMF because of the Taylor series k=0λkk!=eλ\sum_{k=0}^\infty \frac{\lambda^k}{k!} = e^{\lambda}.

Example Poisson Expectation and Variance

Let XPois(λ)X \sim \textrm{Pois}(\lambda). Then the mean and variance are both equal to λ\lambda. For the* mean*, we have

E(X)=eλk=0kλkk!=eλk=1kλkk!=λeλk=1λk1(k1)!=λeλeλ=λ.\begin{align*} E(X) &= e^{-\lambda} \sum_{k=0}^\infty k \frac{\lambda^k}{k!} \\ &= e^{-\lambda} \sum_{k=1}^\infty k \frac{\lambda^k}{k!} \\ &= \lambda e^{-\lambda} \sum_{k=1}^\infty \frac{\lambda^{k-1}}{(k-1)!} \\ &= \lambda e^{-\lambda} e^\lambda = \lambda. \end{align*}

To get the variance, we first find E(X2)E(X^2). By LOTUS,

E(X2)=k=0k2P(X=k)=eλk=0k2λkk!.E(X^2) = \sum_{k=0}^\infty k^2 P(X=k) = e^{-\lambda} \sum_{k=0}^\infty k^2 \frac{\lambda^k}{k!}.

Using the same method we used to get the variance of a Geometric r.v., we can obtain

E(X2)=λ(1+λ),E(X^2) = \lambda (1 + \lambda),

so

Var(X)=E(X2)(EX)2=λ(1+λ)λ2=λ.\textrm{Var}(X) = E(X^2) - (EX)^2 = \lambda (1 + \lambda) - \lambda^2 = \lambda.

The figure below shows the PMF and CDF of the Pois(2)\textrm{Pois}(2) and Pois(5)\textrm{Pois}(5) distributions from k=0k=0 to k=10k=10. It appears that the mean of the Pois(2)\textrm{Pois}(2) is around 2 and the mean of the Pois(5)\textrm{Pois}(5) is around 5, consistent with our findings above. The PMF of the Pois(2)\textrm{Pois}(2) is highly skewed, but as λ\lambda grows larger, the skewness is reduced and the PMF becomes more bell-shaped.

5_PoisPMFCDF.png

The Poisson distribution is often used in situations where we are counting the number of successes in a particular region or interval of time, and there are a large number of trials, each with a small probability of success. For example, the following random variables could follow a distribution that is approximately Poisson.

  • The number of emails you receive in an hour. There are a lot of people who could potentially email you in that hour, but it is unlikely that any specific person will actually email you in that hour. Alternatively, imagine subdividing the hour into milliseconds. There are 3.6×1063.6 \times 10^6 seconds in an hour, but in any specific millisecond it is unlikely that you will get an email.
  • The number of chips in a chocolate chip cookie. Imagine subdividing the cookie into small cubes; the probability of getting a chocolate chip in a single cube is small, but the number of cubes is large.
  • The number of earthquakes in a year in some region of the world. At any given time and location, the probability of an earthquake is small, but there are a large number of possible times and locations for earthquakes to occur over the course of the year.

The parameter λ\lambda is interpreted as the rate of occurrence of these rare events; in the examples above, λ\lambda could be 20 (emails per hour), 10 (chips per cookie), and 2 (earthquakes per year). In applications similar to the ones above, we can approximate the distribution of the number of events that occur by a Poisson distribution.

Poisson Approximation

Let A1,A2,,AnA_1, A_2, \dots, A_n be events with pj=P(Aj)p_j = P(A_j), where nn is large, the pjp_j are small, and the pjp_j are independent or weakly dependent. Let X=j=1nI(Aj)X=\sum_{j=1}^n I(A_j) count how many of the AjA_j occur. Then XX is approximately Pois(λ)\textrm{Pois}(\lambda), with λ=j=1npj\lambda = \sum_{j=1}^n p_j.

The Poisson paradigm is also called the law of rare events. The interpretation of ''rare" is that the pjp_j are small, not that λ\lambda is small. For example, in the email example, the low probability of getting an email from a specific person in a particular hour is offset by the large number of people who could send you an email in that hour.

In the examples we gave above, the number of events that occur isn't exactly Poisson because a Poisson random variable has no upper bound, whereas how many of A1,,AnA_1,\dots,A_n occur is at most nn, and there is a limit to how many chocolate chips can be crammed into a cookie. But the Poisson distribution often gives good approximations. Note that the conditions for the Poisson paradigm to hold are fairly flexible: the nn trials can have different success probabilities, and the trials don't have to be independent, though they should not be very dependent. So there are a wide variety of situations that can be cast in terms of the Poisson paradigm. This makes the Poisson a popular model, or at least a starting point, for data whose values are nonnegative integers (called count data in statistics).

The Poisson approximation greatly simplifies obtaining a good approximate solution to the birthday problem, and makes it possible to obtain good approximations to various variations which would be very difficult to solve exactly.

Example Birthday Problem Continued

If we have mm people and make the usual assumptions about birthdays, then each pair of people has probability p=1/365p=1/365 of having the same birthday, and there are (m2){m \choose 2} pairs. By the Poisson paradigm the distribution of the number XX of birthday matches is approximately Pois(λ)\textrm{Pois}(\lambda), where Pois(λ)\textrm{Pois}(\lambda). Then the probability of at least one match is

P(X1)=1P(X=0)1eλ.P(X \geq 1) = 1 - P(X=0) \approx 1-e^{-\lambda}.

For m=23m=23, λ=253/365\lambda = 253/365 and 1eλ0.5000021-e^{-\lambda} \approx 0.500002, which agrees with the earlier result that we need 23 people to have a 50-50 chance of a matching birthday. Note that even though m=23m=23 is fairly small, the relevant quantity in this problem is actually (m2){m \choose 2}, which is the total number of ''trials'' for a successful birthday match, so the Poisson approximation still performs well.

Example Near-birthday Problem

What if we want to find the number of people required in order to have a 50-50 chance that two people would have birthdays within one day of each other (i.e., on the same day or one day apart)? Unlike the original birthday problem, this is difficult to obtain an exact answer for, but the Poisson paradigm still applies.

The probability that any two people have birthdays within one day of each other is 3/3653/365 (choose a birthday for the first person, and then the second person needs to be born on that day, the day before, or the day after). Again there are (m2){m \choose 2} possible pairs, so the number of within-one-day matches is approximately Pois(λ)\textrm{Pois}(\lambda) where λ=(m2)3365\lambda = {m \choose 2} \frac{3}{365}. Then a calculation similar to the one above tells us that we need m=14m=14 or more. This was a quick approximation, but it turns out that m=14m=14 is the exact answer!

Another useful property of the Poisson distribution is that the sum of independent Poissons is Poisson.

Theorem: Sum of Independent Poissons

If XPois(λ1)X \sim \textrm{Pois}(\lambda_1) , YPois(λ2)Y \sim \textrm{Pois}(\lambda_2), and XX is independent of YY, then X+YPois(λ1+λ2)X+Y \sim \textrm{Pois}(\lambda_1 + \lambda_2).

Proof:

image.png

The story of the Poisson distribution provides intuition for this result. If there are two different types of events occurring at rates λ1\lambda_1 and λ2\lambda_2, independently, then the overall event rate is λ1+λ2\lambda_1 + \lambda_2.