Probability, Counting, and Story Proofs 2

486 阅读4分钟

Story Proofs

A story proof is a proof by interpretation. For counting problems, this often means counting the same thing in two different ways, rather than doing tedious algebra. A story proof often avoids messy calculations and goes further than an algebraic proof toward explaining why the result is true. The word ''story" has several meanings, some more mathematical than others, but a story proof (in the sense in which we're using the term) is a fully valid mathematical proof. Here are some examples of story proofs, which also serve as further examples of counting.

Example The Team Captain

For any positive integers nn and kk with knk \leq n,

n(n1k1)=k(nk).n {n-1 \choose k-1} = k {n \choose k}.

This is again easy to check algebraically, using the fact that m!=m(m1)!m!=m(m-1)! for any positive integer mm, but a story proof is more insightful.

Story Proof: Consider a group of nn people, from which a team of kk will be chosen, one of whom will be the team captain. To specify a possibility, we could first choose the team captain and then choose the remaining k1k-1 team members; this gives the left-hand side. Equivalently, we could first choose the kk team members and then choose one of them to be captain; this gives the right-hand side.

Example Vandermonde's Identity

A famous relationship between binomial coefficients, called Vandermonde's identity, says that

(m+nk)=j=0k(mj)(nkj).{m+n \choose k} = \sum_{j=0}^k {m \choose j}{n \choose k-j}.

This identity will come up several times in this course. Trying to prove it with a brute force expansion of all the binomial coefficients would be a nightmare. But a story proves the result elegantly and makes it clear why the identity holds.

Story Proof: Consider a group of mm peacocks and nn toucans, from which a set of size kk birds will be chosen. There are (m+nk){m+n \choose k } possibilities for this set of birds. If there are jj peacocks in the set, then there must be kjk - j toucans in the set. The right-hand side of Vandermonde's identity sums up the cases for jj.

General Definition of Probability

We have now seen several methods for counting outcomes in a sample space, allowing us to calculate probabilities if the naive definition applies. But the naive definition can only take us so far, since it requires equally likely outcomes and can't handle an infinite sample space. We now give the general definition of probability. It requires just two axioms, but from these axioms it is possible to prove a vast array of results about probability.

Definition: General Definition of Probability

A probability space consists of a sample space SS and a probability function PP which takes an event ASA \subseteq S as input and returns P(A)P(A), a real number between 0 and 1, as output. The function PP must satisfy the following axioms:

1.P()=0P(\emptyset) = 0, P(S)=1P(S) = 1.

2.If A1,A2,A_1, A_2, \dots are disjoint events, then

P(j=1Aj)=j=1P(Aj).P\left(\bigcup_{j=1}^\infty A_j \right) = \sum_{j=1}^\infty P(A_j).

(Saying that these events are disjoint means that they are mutually exclusive: AiAj=A_i \cap A_j = \emptyset for iji \neq j.)

In Pebble World, the definition says that probability behaves like mass: the mass of an empty pile of pebbles is 0, the total mass of all the pebbles is 1, and if we have non-overlapping piles of pebbles, we can get their combined mass by adding the masses of the individual piles. Unlike in the naive case, we can now have pebbles of differing masses, and we can also have a countably infinite number of pebbles as long as their total mass is 1.

Any function PP (mapping events to numbers in the interval [0,1][0, 1]) that satisfies the two axioms is considered a valid probability function. However, the axioms don't tell us how probability should be interpreted; different schools of thought exist.

The frequentist view of probability is that it represents a long-run frequency over a large number of repetitions of an experiment: if we say a coin has probability 1/21/2 of Heads, that means the coin would land Heads 50% of the time if we tossed it over and over and over.

The Bayesian view of probability is that it represents a degree of belief about the event in question, so we can assign probabilities to hypotheses like ''candidate A will win the election" or ''the defendant is guilty'' even if it isn't possible to repeat the same election or the same crime over and over again.

The Bayesian and frequentist perspectives are complementary, and both will be helpful for developing intuition in later chapters. Regardless of how we choose to interpret probability, we can use the two axioms to derive other properties of probability, and these results will hold for any valid probability function.

Theorem: Properties of Probability

Probability has the following properties, for any events AA and BB.

  1. P(Ac)=1P(A).P(A^c) = 1 - P(A).
  2. If ABA \subseteq B, then P(A)P(B).P(A) \leq P(B).
  3. P(AB)=P(A)+P(B)P(AB).P(A \cup B) = P(A) + P(B) - P(A \cap B).

Proof:

  1. Since AA and ACA^C are disjoint and their union is SS, the second axiom gives

    P(S)=P(AAc)=P(A)+P(Ac)=1.P(S) = P(A \cup A^c) = P(A) + P(A^c) = 1.
  2. If ABA \subseteq B, then we can write BB as the union of AA and BAcB \cap A^c, where BAcB \cap A^c is the part of BB not also in AA. This is illustrated in the figure below.

    image.png

    Since AA and BAcB \cap A^c are disjoint, we can apply the second axiom:

    P(B)=P(A(BAc))=P(A)+P(BAc).P(B) = P(A \cup (B \cap A^c)) = P(A) + P(B \cap A^c).

    Probability is nonnegative, so P(BAc)0P(B \cap A^c) \geq 0, proving that P(B)P(A)P(B) \geq P(A).

  3. The intuition for this result can be seen using a Venn diagram like the one below.

    image.png

    The shaded region represents ABA \cup B, but the probability of this region is not P(A)+P(B)P(A) + P(B), because that would count the football-shaped intersection region ABA \cap B twice. To correct for this, we subtract P(AB)P(A \cap B). This is a useful intuition, but not a proof.

    For a proof using the axioms of probability, we can write ABA \cup B as the union of the disjoint events and . Then by the second axiom,

    P(AB)=P(A(BAc))=P(A)+P(BAc).P(A \cup B) = P(A \cup (B \cap A^c)) = P(A) + P(B \cap A^c).

    So it suffices to show that P(BAc)=P(B)P(AB)P(B \cap A^c) = P(B) - P(A \cap B). Since ABA \cap B and BAcB \cap A^c are disjoint and their union is BB, another application of the second axiom gives us

    P(AB)+P(BAc)=P(B).P(A \cap B) + P(B \cap A^c) = P(B).

    So P(BAc)=P(B)P(AB),P(B \cap A^c) = P(B) - P(A \cap B), as desired.

    The third property is a special case of inclusion-exclusion, a formula for finding the probability of a union of events when the events are not necessarily disjoint. We showed above that for two events AA and BB,

    P(AB)=P(A)+P(B)P(AB).P(A \cup B) = P(A) + P(B) - P(A \cap B).

    For three events, inclusion-exclusion says

    P(ABC)=P(A)+P(B)+P(C)P(AB)P(AC)P(BC)+P(ABC).P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C) - P(B \cap C) + P(A \cap B \cap C).

    For intuition, consider a triple Venn diagram like the one below.

    image.png

    To get the total area of the shaded region ABCA \cup B \cup C, we start by adding the areas of the three circles, P(A)+P(B)+P(C)P(A) + P(B) + P(C). The three football-shaped regions have each been counted twice, so we then subtract P(AB)+P(AC)+P(BC)P(A \cap B) + P(A \cap C) + P(B \cap C). Finally, the region in the center has been added three times and subtracted three times, so in order to count it exactly once, we must add it back again. This ensures that each region of the diagram has been counted once and exactly once.

Now we can write inclusion-exclusion for nn events.

Theorem: Inclusion-exclusion

For any events A1,,AnA_1, \dots, A_n,

P(i=1nAi)=iP(Ai)i<jP(AiAj)+i<j<kP(AiAjAk)+(1)n+1P(A1An).P\left(\bigcup_{i=1}^n A_i\right) = \sum_i P(A_i) - \sum_{i < j} P(A_i \cap A_j) + \sum_{i < j< k} P(A_i \cap A_j \cap A_k) - \dots\\ + (-1)^{n+1} P(A_1 \cap \dots \cap A_n).