Why Study Probability?

Mathematics is the logic of certainty; probability is the logic of uncertainty. Probability is extremely useful in a wide variety of fields, since it provides tools for understanding and explaining variation, separating signal from noise, and modeling complex phenomena. For example, probability is needed in:

Statistics: Probability is the foundation and language for statistics, enabling many powerful methods for using data to learn about the world.
Physics: Einstein famously said ''God does not play dice with the universe", but current understanding of quantum physics heavily involves probability at the most fundamental level of nature.
Biology: Genetics is deeply intertwined with probability, both in the inheritance of genes and in modeling random mutations.
Computer science: Probability also plays an essential role in studying randomized algorithms, machine learning, and artificial intelligence.
Finance: Probability is central in quantitative finance. Modeling stock prices over time and determining ''fair" prices for financial instruments are based heavily on probability.
Political science: In recent years, political science has become more and more quantitative and statistical, e.g., in predicting and understanding election results.
Medicine: The development of randomized clinical trials, in which patients are randomly assigned to receive treatment or placebo, has transformed medical research in recent years.
Life: Life is uncertain, and probability is the logic of uncertainty. While it isn't practical to carry out a formal probability calculation for every decision made in life, thinking hard about probability can help us avert some common fallacies, shed light on coincidences, and make better predictions.

Sample Spaces and Pebble World

The mathematical framework for probability is built around sets. Imagine that an experiment is performed, resulting in one out of a set of possible outcomes. Before the experiment is performed, it is unknown which outcome will be the result; after, the result ''crystallizes" into the actual outcome.

Definition: Sample Sspace and Event

The sample space $S$ of an experiment is the set of all possible outcomes of the experiment. An event $A$ is a subset of the sample space $S$ , and we say that $A$ occurred if the actual outcome is in $A$ .

A sample space as Pebble World, with two events and spotlighted. View Larger Image
Image Description

The sample space of an experiment can be finite or infinite. When the sample space is finite, we can visualize it as Pebble World. Each pebble represents an outcome, and an event is a set of pebbles. Performing the experiment amounts to randomly selecting one pebble.

Set theory is very useful in probability, since it provides a rich language for expressing and working with events. Set operations, especially unions, intersections, and complements, make it easy to build new events in terms of already-defined events.

For example, let $S$ be the sample space of an experiment and let $A, B \subseteq S$ be events. Then the union $A \cup B$ is the event that occurs if and only if at least one of $A$ and $B$ occurs, the intersection $A \cap B$ is the event that occurs if and only if both $A$ and $B$ occur, and the complement $A^{C}$ is the event that occurs if and only if $A$ does not occur. We also have De Morgan's laws:

(A \cup B)^c = A^c \cap B^c \textrm{ and } (A \cap B)^c = A^c \cup B^c,

since saying that it is not the case that at least one of $A$ and $B$ occur is the same as saying that $A$ does not occur and $B$ does not occur, and saying that it is not the case that both occur is the same as saying that at least one does not occur. Analogous results hold for unions and intersections of more than two events.

Example Coin Flips

A coin is flipped 10 times. Writing Heads as $H$ and Tails as $T$ , a possible outcome (pebble) is $HHHTHHTTHT$ , and the sample space is the set of all possible strings of length 10 of $H$ 's and $T$ 's. We can (and will) encode $H$ as $1$ and $T$ as $0$ , so that an outcome is a sequence $(s_1,\dots,s_{10})$ with $s_j \in \{0,1\}$ , and the sample space is the set of all such sequences. Now let's look at some events:

Let $A_1$ be the event that the first flip is Heads. As a set,
$A_1 = \{(1,s_2, \dots,s_{10}): s_j \in \{0,1\} \textrm{ for $2 \leq j \leq 10$}\}.$
This is a subset of the sample space, so it is indeed an event; saying that $A_1$ occurs is the same thing as saying that the first flip is Heads. Similarly, let $A_j$ be the event that the $j$ th flip is Heads for $j=2,3,\dots,10$ .
Let $B$ be the event that at least one flip was Heads. As a set,
$B = \bigcup_{j=1}^{10} A_j.$
Let $C$ be the event that all the flips were Heads. As a set,
$C = \bigcap_{j=1}^{10} A_j.$
Let $D$ be the event that there were at least two consecutive Heads. As a set,
$D = \bigcup_{j=1}^9 (A_j \cap A_{j+1}).$

A Set Theory Dictionary

捕获.JPG

Naive definition of probability

Historically, the earliest definition of the probability of an event was to count the number of ways the event could happen and divide by the total number of possible outcomes for the experiment. We call this the naive definition since it is restrictive and relies on strong assumptions; nevertheless, it is important to understand, and useful when not misused.

Definition: Naive Definition of Probability

Let $A$ be an event for an experiment with a finite sample space $S$ . The naive probability of $A$ is
$P_{\textrm{naive}}(A) = \frac{|A|}{|S|}=\frac{\textrm{number of outcomes favorable to $A$}}{\textrm{total number of outcomes in $S$}},$
where $|A|$ is the size (cardinality) of set $A$ .

The naive definition is very restrictive in that it requires $S$ to be finite, with equal mass for each pebble. It has often been misapplied by people who assume equally likely outcomes without justification and make arguments to the effect of "either it will happen or it won't, and we don't know which, so it's 50-50". For example, if we don't know whether or not there is life on Saturn, should we conclude that it is 50-50? What about intelligent life on Saturn, which seems like it should be strictly less likely than there being any form of life on Saturn? But there are several important types of problems where the naive definition is applicable, such as when there is symmetry in the problem that makes the outcomes equally likely.

How to Count

Multiplication rule

In some problems, we can directly count the number of possibilities using a basic but versatile principle called the multiplication rule. We'll see that the multiplication rule leads naturally to counting rules for sampling with replacement and sampling without replacement, two scenarios that often arise in probability and statistics.

Theorem: Multiplication Rule

Consider a compound experiment consisting of two sub-experiments, Experiment A and Experiment B. Suppose that Experiment A has $a$ possible outcomes, and for each of those outcomes Experiment B has $b$ possible outcomes. Then the compound experiment has $ab$ possible outcomes.

We can use the multiplication rule to arrive at formulas for sampling with and without replacement. Many experiments in probability and statistics can be interpreted in one of these two contexts, so it is appealing that both formulas follow directly from the same basic counting principle.

Theorem: Sampling with Replacement

Consider $n$ objects and making $k$ choices from them, one at a time with replacement (i.e., choosing a certain object does not preclude it from being chosen again). Then there are $n^k$ possible outcomes.

Theorem: Sampling without Replacement

Consider $n$ objects and making $k$ choices from them, one at a time without replacement (i.e., choosing a certain object precludes it from being chosen again). Then there are $n(n-1) \cdots (n-k+1)$ possible outcomes, for $k \leq n$ (and $0$ possibilities for $k > n$ ).

The above theorems about counting, but when the naive definition applies, we can use them to calculate probabilities. This brings us to our next example, a famous problem in probability called the birthday problem. The solution incorporates both sampling with replacement and sampling without replacement.

Example Birthday Problem

There are $k$ people in a room. Assume each person's birthday is equally likely to be any of the 365 days of the year (we exclude February 29), and that people's birthdays are independent (we assume there are no twins in the room). What is the probability that two or more people in the group have the same birthday?

There are $365^k$ ways to assign birthdays to the people in the room, since we can imagine the 365 days of the year being sampled $k$ times, with replacement. By assumption, all of these possibilities are equally likely, so the naive definition of probability applies.

Used directly, the naive definition says we just need to count the number of ways to assign birthdays to $k$ people such that there are two or more people who share a birthday. But this counting problem is hard, since it could be Emma and Steve who share a birthday, or Steve and Naomi, or all three of them, or the three of them could share a birthday while two others in the group share a different birthday, or various other possibilities.

Instead, let's count the complement: the number of ways to assign birthdays to $k$ people such that no two people share a birthday. This amounts to sampling the 365 days of the year without replacement, so the number of possibilities is $365\cdot364\cdot363 \cdots (365-k+1)$ for $k \leq 365$ . Therefore the probability of no birthday matches in a group of $k$ people is

P(\textrm{no birthday match}) = \frac{365 \cdot 364 \cdots (365-k+1)}{365^k},

and the probability of at least one birthday match is

P(\textrm{at least 1 birthday match}) = 1 - \frac{365 \cdot 364 \cdots (365-k+1)}{365^k}.

The figure plots the probability of at least one birthday match as a function of $k$ . The first value of $k$ for which the probability of a match exceeds 0.5 is $k=23$ . Thus, in a group of 23 people, there is a better than 50% chance that two or more of them will have the same birthday. By the time we reach $k=57$ , the probability of a match exceeds 99%.

Of course, for $k = 366$ we are guaranteed to have a match, but it's surprising that even with a much smaller number of people it's overwhelmingly likely that there is a birthday match. For a quick intuition into why it should not be so surprising, note that with 23 people there are ${23 \choose 2} = 253$ pairs of people, any of which could be a birthday match.

Adjusting for Overcounting

In many counting problems, it is not easy to directly count each possibility once and only once. If, however, we are able to count each possibility exactly $c$ times for some $c$ , then we can adjust by dividing by $c$ . For example, if we have exactly double-counted each possibility, we can divide by $2$ to get the correct count. We call this adjusting for overcounting.

Example Committees and Teams

Consider a group of four people.

(a) How many ways are there to choose a two-person committee?

(b) How many ways are there to break the people into two teams of two?

(a) One way to count the possibilities is by listing them out: labeling the people as 1, 2, 3, 4, the possibilities are $\boxed{1,2}$ , $\boxed{1,3}$ , $\boxed{1,4}$ , $\boxed{2,3}$ , $\boxed{2,4}$ , $\boxed{3,4}$ .

Another approach is to use the multiplication rule with an adjustment for overcounting. By the multiplication rule, there are 4 ways to choose the first person on the committee and 3 ways to choose the second person on the committee, but this counts each possibility twice, since picking 1 and 2 to be on the committee is the same as picking 2 and 1 to be on the committee. Since we have overcounted by a factor of 2, the number of possibilities is $(4 \cdot 3)/2 = 6$ .

(b) Here are 3 ways to see that there are 3 ways to form the teams. Labeling the people as $1, 2, 3, 4$ , we can directly list out the possibilities: $\boxed{1,2}\boxed{3,4}$ , $\boxed{1,3}\boxed{2,4}$ , and $\boxed{1,4}\boxed{2,3}$ . Listing out all possibilities would quickly become tedious or infeasible with more people though. Another approach is to note that it suffices to specify person 1's teammate (and then the other team is determined). A third way is to use (a) to see that there are 6 ways to choose one team. This overcounts by a factor of 2, since picking $1$ and $2$ to be a team is equivalent to picking $3$ and $4$ to be a team. So again the answer is $6/2=3$ .

A binomial coefficient counts the number of subsets of a certain size for a set, such as the number of ways to choose a committee of size $K$ from a set of $n$ people. Sets and subsets are by definition unordered, e.g., $\{3,1,4\}=\{4,1,3\}$ , so we are counting the number of ways to choose $k$ objects out of $n$ , without replacement and without distinguishing between the different orders in which they could be chosen.

Definition: Binomial Coefficient

For any nonnegative integers $k$ and $n$ , the binomial coefficient ${n \choose k}$ , read as ' $n$ ' choose ' $k$ ', is the number of subsets of size $k$ for a set of size $n$ .

Theorem: Binomial Coefficient Formula

For $k \leq n$ , we have
${n \choose k} = \frac{n(n-1)\cdots (n-k+1)}{k!}=\frac{n!}{(n-k)!k!}.$
For $k > n$ , we have ${n \choose k} = 0$ .

Proof: Let $A$ be a set with $|A|=n$ . Any subset of has size at most $n$ , so ${n \choose k} = 0$ for $k > n$ . Now let $k \leq n$ . By Theorem Sampling without Replacement, there are $n(n-1) \cdots (n-k+1)$ ways to make an ordered choice of $k$ elements without replacement. This overcounts each subset of interest by a factor of $k!$ (since we don't care how these elements are ordered), so we can get the correct count by dividing by $k!$ .

Example Full House in Poker

A 5-card hand is dealt from a standard, well-shuffled 52-card deck. The hand is called a full house in poker if it consists of three cards of some rank and two cards of another rank, e.g., three 7's and two 10's (in any order). What is the probability of a full house?

All of the ${52 \choose 5}$ possible hands are equally likely by symmetry, so the naive definition is applicable. To find the number of full house hands, use the multiplication rule. There are 13 choices for what rank we have three of; for concreteness, assume we have three 7's. There are ${4 \choose 3}$ ways to choose which 7's we have. Then there are 12 choices for what rank we have two of, say 10's for concreteness, and ${4 \choose 2}$ ways to choose two 10's. Thus,

P(\textrm{full house}) = \frac{13 {4 \choose 3} 12 {4 \choose 2}}{{52 \choose 5}} = \frac{3744}{2598960} \approx 0.00144.

Probability, Counting, and Story Proofs 1