Bernoulli and Binomial
Some distributions are so ubiquitous in probability and statistics that they have their own names. We will introduce these named distributions throughout the course, starting with a very simple but useful case: an r.v. that can take on only two possible values, 0 and 1.
Definition: Bernoulli Distribution
An r.v. is said to have the Bernoulli distribution with parameter if and , where . We write this as . The symbol is read ''is distributed as''.
Any r.v. whose possible values are 0 and 1 has a Bern() distribution, with the probability of the r.v. equaling 1. This number in Bern() is called the parameter of the distribution; it determines which specific Bernoulli distribution we have. Thus there is not just one Bernoulli distribution, but rather a family of Bernoulli distributions, indexed by . For example, if , it would be correct but incomplete to say '' is Bernoulli"; to fully specify the distribution of , we should both say its name (Bernoulli) and its parameter value (), which is the point of the notation .
Any event has a Bernoulli r.v. that is naturally associated with it, equal to if the event happens and otherwise. This is called the indicator random variable of the event; we will see that such r.v.s are extremely useful.
Definition: Indicator Random Variable
The indicator random variable of an event is the r.v. which equals 1 if occurs and 0 otherwise. We will denote the indicator r.v. of by or . Note that with .
We often imagine Bernoulli r.v.s using coin tosses, but this is just convenient language for discussing the following general story.
Story Bernoulli Trial
An experiment that can result in either a ''success" or a ''failure" (but not both) is called a Bernoulli trial. A Bernoulli random variable can be thought of as the indicator of success in a Bernoulli trial: it equals 1 if success occurs and 0 if failure occurs in the trial.
Because of this story, the parameter is often called the success probability of the distribution. Once we start thinking about Bernoulli trials, it's hard not to start thinking about what happens when we have more than one Bernoulli trial.
Story Binomial Distribution
Suppose that independent Bernoulli trials are performed, each with the same success probability . Let be the number of successes. The distribution of is called the Binomial distribution with parameters and . We write to mean that has the Binomial distribution with parameters and , where is a positive integer and .
Notice that we define the Binomial distribution not by its PMF, but by a story about the type of experiment that could give rise to a random variable with a Binomial distribution. The most famous distributions in statistics all have stories which explain why they are so often used as models for data, or as the building blocks for more complicated distributions.
Thinking about the named distributions first and foremost in terms of their stories has many benefits. It facilitates pattern recognition, allowing us to see when two problems are essentially identical in structure; it often leads to cleaner solutions that avoid PMF calculations altogether; and it helps us understand how the named distributions are connected to one another. Here it is clear that is the same distribution as : the Bernoulli is a special case of the Binomial.
Using the story definition of the Binomial, let's find its PMF.
Theorem:Binomial PMF
If , then the PMF of is
for (and otherwise).
Proof: An experiment consisting of independent Bernoulli trials produces a sequence of successes and failures. The probability of any specific sequence of successes and failures is . There are such sequences, since we just need to select where the successes are. Therefore, letting be the number of successes,
for , and otherwise. This is a valid PMF because it is nonnegative and it sums to 1 by the binomial theorem. The following figure shows plots of the Binomial PMF for various values of and . Note that the PMF of the distribution is symmetric about 5, but when the success probability is not 1/2, the PMF is skewed. For a fixed number of trials , tends to be larger when the success probability is high and lower when the success probability is low, as we would expect from the story of the Binomial distribution. Also recall that in any PMF plot, the sum of the heights of the vertical bars must be 1.
We've used Story Binomial Distribution to find the PMF. The story also gives us a straightforward proof of the fact that if is Binomial, then is also Binomial.
Theorem
Let , and (we often use to denote the failure probability of a Bernoulli trial). Then .
Proof: Using the story of the Binomial, interpret as the number of successes in independent Bernoulli trials. Then is the number of failures in those trials. Interchanging the roles of success and failure, we have . Alternatively, we can check that has the PMF. Let . The PMF of is
for .
Corollary
Let with and even. Then the distribution of is symmetric about , in the sense that for all nonnegative integers j.
Proof: By Theorem, is also , so
for all nonnegative integers . Letting , the desired result follows. This explains why the PMF is symmetric about 5 in the above figure.
Hypergeometric
If we have an urn filled with white and black balls, then drawing balls out of the urn with replacement yields a distribution for the number of white balls obtained in trials, since the draws are independent Bernoulli trials, each with probability of success. If we instead sample without replacement, as illustrated in the following figure, then the number of white balls follows a Hypergeometric distribution.
Story Hypergeometric Distribution
Consider an urn with white balls and black balls. We draw balls out of the urn at random without replacement, such that all samples are equally likely. Let be the number of white balls in the sample. Then is said to have the Hypergeometric distribution with parameters w, b, and n; we denote this by . As with the Binomial distribution, we can obtain the PMF of the Hypergeometric distribution from the story.
Theorem: Hypergeometric PMF
If , then the PMF of is
for integers satisfying and , and otherwise.
Proof:
Example Elk Capture-recapture
Warning: Binomial vs. Hypergeometric
The Binomial and Hypergeometric distributions are often confused. Both are discrete distributions taking on integer values between 0 and for some , and both can be interpreted as the number of successes in Bernoulli trials (for the Hypergeometric, each tagged elk in the recaptured ample can be considered a success and each untagged elk a failure). However, a crucial part of the Binomial story is that the Bernoulli trials involved are independent. The Bernoulli trials in the Hypergeometric story are dependent, since the sampling is done without replacement: knowing that one elk in our sample is tagged decreases the probability that the second elk will also be tagged.