Discrete Random Variables 2

203 阅读4分钟

Bernoulli and Binomial

Some distributions are so ubiquitous in probability and statistics that they have their own names. We will introduce these named distributions throughout the course, starting with a very simple but useful case: an r.v. that can take on only two possible values, 0 and 1.

Definition: Bernoulli Distribution

An r.v. XX is said to have the Bernoulli distribution with parameter pp if P(X=1)=pP(X = 1) = p and P(X=0)=1pP(X = 0) = 1 - p, where 0<p<10 < p < 1. We write this as XBern(p)X \sim \textrm{Bern}(p) . The symbol \sim is read ''is distributed as''.

Any r.v. whose possible values are 0 and 1 has a Bern(pp) distribution, with pp the probability of the r.v. equaling 1. This number pp in Bern(pp) is called the parameter of the distribution; it determines which specific Bernoulli distribution we have. Thus there is not just one Bernoulli distribution, but rather a family of Bernoulli distributions, indexed by pp. For example, if XBern(1/3)X \sim \textrm{Bern}(1/3), it would be correct but incomplete to say ''XX is Bernoulli"; to fully specify the distribution of XX, we should both say its name (Bernoulli) and its parameter value (1/31/3), which is the point of the notation XBern(1/3)X \sim \textrm{Bern}(1/3).

Any event has a Bernoulli r.v. that is naturally associated with it, equal to 11 if the event happens and 00 otherwise. This is called the indicator random variable of the event; we will see that such r.v.s are extremely useful.

Definition: Indicator Random Variable

The indicator random variable of an event AA is the r.v. which equals 1 if AA occurs and 0 otherwise. We will denote the indicator r.v. of AA by IAI_A or I(A)I(A). Note that IABern(p)I_A \sim \textrm{Bern}(p) with p=P(A)p=P(A).

We often imagine Bernoulli r.v.s using coin tosses, but this is just convenient language for discussing the following general story.

Story Bernoulli Trial

An experiment that can result in either a ''success" or a ''failure" (but not both) is called a Bernoulli trial. A Bernoulli random variable can be thought of as the indicator of success in a Bernoulli trial: it equals 1 if success occurs and 0 if failure occurs in the trial.

Because of this story, the parameter pp is often called the success probability of the Bern(p)Bern(p) distribution. Once we start thinking about Bernoulli trials, it's hard not to start thinking about what happens when we have more than one Bernoulli trial.

Story Binomial Distribution

Suppose that nn independent Bernoulli trials are performed, each with the same success probability pp. Let XX be the number of successes. The distribution of XX is called the Binomial distribution with parameters nn and pp. We write XBin(n,p)X \sim \textrm{Bin}(n,p) to mean that XX has the Binomial distribution with parameters nn and pp, where nn is a positive integer and 0<p<10 < p <1.

Notice that we define the Binomial distribution not by its PMF, but by a story about the type of experiment that could give rise to a random variable with a Binomial distribution. The most famous distributions in statistics all have stories which explain why they are so often used as models for data, or as the building blocks for more complicated distributions.

Thinking about the named distributions first and foremost in terms of their stories has many benefits. It facilitates pattern recognition, allowing us to see when two problems are essentially identical in structure; it often leads to cleaner solutions that avoid PMF calculations altogether; and it helps us understand how the named distributions are connected to one another. Here it is clear that Bern(p)Bern(p) is the same distribution as Bin(1,p)Bin(1, p): the Bernoulli is a special case of the Binomial.

Using the story definition of the Binomial, let's find its PMF.

Theorem:Binomial PMF

If XBin(n,p)X \sim \textrm{Bin}(n,p), then the PMF of XX is

P(X=k)=(nk)pk(1p)nkP(X=k) = {n \choose k} p^k (1-p)^{n-k}

for k=0,1,,nk=0, 1, \dots, n (and P(X=k)=0P(X=k) = 0 otherwise).

Proof: An experiment consisting of nn independent Bernoulli trials produces a sequence of successes and failures. The probability of any specific sequence of kk successes and nkn-k failures is pk(1p)nkp^k (1-p)^{n-k}. There are (nk){n \choose k} such sequences, since we just need to select where the successes are. Therefore, letting XX be the number of successes,

P(X=k)=(nk)pk(1p)nkP(X=k) = {n \choose k} p^k (1-p)^{n-k}

for k=0,1,,nk=0, 1, \dots, n, and P(X=k)=0P(X=k) = 0 otherwise. This is a valid PMF because it is nonnegative and it sums to 1 by the binomial theorem. The following figure shows plots of the Binomial PMF for various values of nn and pp. Note that the PMF of the Bin(10,1/2)\textrm{Bin}(10,1/2) distribution is symmetric about 5, but when the success probability is not 1/2, the PMF is skewed. For a fixed number of trials nn, XX tends to be larger when the success probability is high and lower when the success probability is low, as we would expect from the story of the Binomial distribution. Also recall that in any PMF plot, the sum of the heights of the vertical bars must be 1.

3_BinPMFs.png

We've used Story Binomial Distribution to find the Bin(n,p)Bin(n ,p) PMF. The story also gives us a straightforward proof of the fact that if XX is Binomial, then nXn-X is also Binomial.

Theorem

Let XBin(n,p)X \sim \textrm{Bin}(n,p), and q=1pq = 1 - p (we often use qq to denote the failure probability of a Bernoulli trial). Then nXBin(n,q)n-X \sim \textrm{Bin}(n,q).

Proof: Using the story of the Binomial, interpret XX as the number of successes in nn independent Bernoulli trials. Then nXn-X is the number of failures in those trials. Interchanging the roles of success and failure, we have nXBin(n,q)n-X \sim \textrm{Bin}(n,q). Alternatively, we can check that nXn-X has the Bin(n,q)Bin(n, q) PMF. Let Y=nXY = n-X. The PMF of YY is

P(Y=k)=P(X=nk)=(nnk)pnkqk=(nk)qkpnk,P(Y=k)=P(X=n-k) = {n \choose {n-k}} p^{n-k} q^k = {n \choose k} q^k p^{n-k},

for k=0,1,,nk=0,1,\dots,n.

Corollary

Let XBin(n,p)X \sim \textrm{Bin}(n,p) with p=1/2p = 1/2 and nn even. Then the distribution of XX is symmetric about n/2n/2, in the sense that P(X=n/2+j)=P(X=n/2j)P(X=n/2 + j) = P(X=n/2 - j) for all nonnegative integers j.

Proof: By Theorem, nXn-X is also Bin(n,1/2)Bin(n ,1/2), so

P(X=k)=P(nX=k)=P(X=nk)P(X=k) = P(n-X=k)=P(X=n-k)

for all nonnegative integers kk. Letting k=n/2+jk=n/2 + j, the desired result follows. This explains why the Bin(10,1/2)\textrm{Bin}(10,1/2) PMF is symmetric about 5 in the above figure.

Hypergeometric

If we have an urn filled with ww white and bb black balls, then drawing nn balls out of the urn with replacement yields a Bin(n,w/(w+b))\textrm{Bin}(n,w/(w+b)) distribution for the number of white balls obtained in nn trials, since the draws are independent Bernoulli trials, each with probability w/(w+b)w/(w+b) of success. If we instead sample without replacement, as illustrated in the following figure, then the number of white balls follows a Hypergeometric distribution.

3_urn.png

Story Hypergeometric Distribution

Consider an urn with ww white balls and bb black balls. We draw nn balls out of the urn at random without replacement, such that all (w+bn){w+b \choose n} samples are equally likely. Let XX be the number of white balls in the sample. Then XX is said to have the Hypergeometric distribution with parameters w, b, and n; we denote this by XHGeom(w,b,n)X \sim \textrm{HGeom}(w,b,n). As with the Binomial distribution, we can obtain the PMF of the Hypergeometric distribution from the story.

Theorem: Hypergeometric PMF

If XHGeom(w,b,n)X \sim \textrm{HGeom}(w,b,n), then the PMF of XX is

P(X=k)=(wk)(bnk)(w+bn),P(X=k) = \frac{{w \choose k}{b \choose n-k}}{{w+b \choose n}},

for integers kk satisfying 0kw0 \leq k \leq w and 0nkb0 \leq n-k \leq b, and P(X=k)=0P(X=k) = 0 otherwise.

Proof: 捕获.JPG

Example Elk Capture-recapture

捕获.JPG

Warning: Binomial vs. Hypergeometric

The Binomial and Hypergeometric distributions are often confused. Both are discrete distributions taking on integer values between 0 and nn for some nn, and both can be interpreted as the number of successes in nn Bernoulli trials (for the Hypergeometric, each tagged elk in the recaptured ample can be considered a success and each untagged elk a failure). However, a crucial part of the Binomial story is that the Bernoulli trials involved are independent. The Bernoulli trials in the Hypergeometric story are dependent, since the sampling is done without replacement: knowing that one elk in our sample is tagged decreases the probability that the second elk will also be tagged.