Story Proofs

A story proof is a proof by interpretation. For counting problems, this often means counting the same thing in two different ways, rather than doing tedious algebra. A story proof often avoids messy calculations and goes further than an algebraic proof toward explaining why the result is true. The word ''story" has several meanings, some more mathematical than others, but a story proof (in the sense in which we're using the term) is a fully valid mathematical proof. Here are some examples of story proofs, which also serve as further examples of counting.

Example The Team Captain

For any positive integers $n$ and $k$ with $k \leq n$ ,

n {n-1 \choose k-1} = k {n \choose k}.

This is again easy to check algebraically, using the fact that $m!=m(m-1)!$ for any positive integer $m$ , but a story proof is more insightful.

Story Proof: Consider a group of $n$ people, from which a team of $k$ will be chosen, one of whom will be the team captain. To specify a possibility, we could first choose the team captain and then choose the remaining $k-1$ team members; this gives the left-hand side. Equivalently, we could first choose the $k$ team members and then choose one of them to be captain; this gives the right-hand side.

Example Vandermonde's Identity

A famous relationship between binomial coefficients, called Vandermonde's identity, says that

{m+n \choose k} = \sum_{j=0}^k {m \choose j}{n \choose k-j}.

This identity will come up several times in this course. Trying to prove it with a brute force expansion of all the binomial coefficients would be a nightmare. But a story proves the result elegantly and makes it clear why the identity holds.

Story Proof: Consider a group of $m$ peacocks and $n$ toucans, from which a set of size $k$ birds will be chosen. There are ${m+n \choose k }$ possibilities for this set of birds. If there are $j$ peacocks in the set, then there must be $k - j$ toucans in the set. The right-hand side of Vandermonde's identity sums up the cases for $j$ .

General Definition of Probability

We have now seen several methods for counting outcomes in a sample space, allowing us to calculate probabilities if the naive definition applies. But the naive definition can only take us so far, since it requires equally likely outcomes and can't handle an infinite sample space. We now give the general definition of probability. It requires just two axioms, but from these axioms it is possible to prove a vast array of results about probability.

Definition: General Definition of Probability

A probability space consists of a sample space $S$ and a probability function $P$ which takes an event $A \subseteq S$ as input and returns $P(A)$ , a real number between 0 and 1, as output. The function $P$ must satisfy the following axioms:

1. $P(\emptyset) = 0$ , $P(S) = 1$ .

2.If $A_1, A_2, \dots$ are disjoint events, then
$P\left(\bigcup_{j=1}^\infty A_j \right) = \sum_{j=1}^\infty P(A_j).$
(Saying that these events are disjoint means that they are mutually exclusive: $A_i \cap A_j = \emptyset$ for $i \neq j$ .)

In Pebble World, the definition says that probability behaves like mass: the mass of an empty pile of pebbles is 0, the total mass of all the pebbles is 1, and if we have non-overlapping piles of pebbles, we can get their combined mass by adding the masses of the individual piles. Unlike in the naive case, we can now have pebbles of differing masses, and we can also have a countably infinite number of pebbles as long as their total mass is 1.

Any function $P$ (mapping events to numbers in the interval $[0, 1]$ ) that satisfies the two axioms is considered a valid probability function. However, the axioms don't tell us how probability should be interpreted; different schools of thought exist.

The frequentist view of probability is that it represents a long-run frequency over a large number of repetitions of an experiment: if we say a coin has probability $1/2$ of Heads, that means the coin would land Heads 50% of the time if we tossed it over and over and over.

The Bayesian view of probability is that it represents a degree of belief about the event in question, so we can assign probabilities to hypotheses like ''candidate A will win the election" or ''the defendant is guilty'' even if it isn't possible to repeat the same election or the same crime over and over again.

The Bayesian and frequentist perspectives are complementary, and both will be helpful for developing intuition in later chapters. Regardless of how we choose to interpret probability, we can use the two axioms to derive other properties of probability, and these results will hold for any valid probability function.

Theorem: Properties of Probability

Probability has the following properties, for any events $A$ and $B$ .

$P(A^c) = 1 - P(A).$

If $A \subseteq B$ , then $P(A) \leq P(B).$

$P(A \cup B) = P(A) + P(B) - P(A \cap B).$

Proof:

Since $A$ and $A^C$ are disjoint and their union is $S$ , the second axiom gives
$P(S) = P(A \cup A^c) = P(A) + P(A^c) = 1.$
If $A \subseteq B$ , then we can write $B$ as the union of $A$ and $B \cap A^c$ , where $B \cap A^c$ is the part of $B$ not also in $A$ . This is illustrated in the figure below.

Since $A$ and $B \cap A^c$ are disjoint, we can apply the second axiom:
$P(B) = P(A \cup (B \cap A^c)) = P(A) + P(B \cap A^c).$
Probability is nonnegative, so $P(B \cap A^c) \geq 0$ , proving that $P(B) \geq P(A)$ .
The intuition for this result can be seen using a Venn diagram like the one below.

The shaded region represents $A \cup B$ , but the probability of this region is not $P(A) + P(B)$ , because that would count the football-shaped intersection region $A \cap B$ twice. To correct for this, we subtract $P(A \cap B)$ . This is a useful intuition, but not a proof.

For a proof using the axioms of probability, we can write $A \cup B$ as the union of the disjoint events and . Then by the second axiom,
$P(A \cup B) = P(A \cup (B \cap A^c)) = P(A) + P(B \cap A^c).$
So it suffices to show that $P(B \cap A^c) = P(B) - P(A \cap B)$ . Since $A \cap B$ and $B \cap A^c$ are disjoint and their union is $B$ , another application of the second axiom gives us
$P(A \cap B) + P(B \cap A^c) = P(B).$
So $P(B \cap A^c) = P(B) - P(A \cap B),$ as desired.

The third property is a special case of inclusion-exclusion, a formula for finding the probability of a union of events when the events are not necessarily disjoint. We showed above that for two events $A$ and $B$ ,
$P(A \cup B) = P(A) + P(B) - P(A \cap B).$
For three events, inclusion-exclusion says
$P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C) - P(B \cap C) + P(A \cap B \cap C).$
For intuition, consider a triple Venn diagram like the one below.

To get the total area of the shaded region $A \cup B \cup C$ , we start by adding the areas of the three circles, $P(A) + P(B) + P(C)$ . The three football-shaped regions have each been counted twice, so we then subtract $P(A \cap B) + P(A \cap C) + P(B \cap C)$ . Finally, the region in the center has been added three times and subtracted three times, so in order to count it exactly once, we must add it back again. This ensures that each region of the diagram has been counted once and exactly once.

Now we can write inclusion-exclusion for $n$ events.

Theorem: Inclusion-exclusion

For any events $A_1, \dots, A_n$ ,
$P\left(\bigcup_{i=1}^n A_i\right) = \sum_i P(A_i) - \sum_{i < j} P(A_i \cap A_j) + \sum_{i < j< k} P(A_i \cap A_j \cap A_k) - \dots\\ + (-1)^{n+1} P(A_1 \cap \dots \cap A_n).$

Probability, Counting, and Story Proofs 2

Story Proofs

Example The Team Captain

Example Vandermonde's Identity

General Definition of Probability