Probability, Counting, and Story Proofs 1

254 阅读8分钟

Why Study Probability?

Mathematics is the logic of certainty; probability is the logic of uncertainty. Probability is extremely useful in a wide variety of fields, since it provides tools for understanding and explaining variation, separating signal from noise, and modeling complex phenomena. For example, probability is needed in:

  1. Statistics: Probability is the foundation and language for statistics, enabling many powerful methods for using data to learn about the world.
  2. Physics: Einstein famously said ''God does not play dice with the universe", but current understanding of quantum physics heavily involves probability at the most fundamental level of nature.
  3. Biology: Genetics is deeply intertwined with probability, both in the inheritance of genes and in modeling random mutations.
  4. Computer science: Probability also plays an essential role in studying randomized algorithms, machine learning, and artificial intelligence.
  5. Finance: Probability is central in quantitative finance. Modeling stock prices over time and determining ''fair" prices for financial instruments are based heavily on probability.
  6. Political science: In recent years, political science has become more and more quantitative and statistical, e.g., in predicting and understanding election results.
  7. Medicine: The development of randomized clinical trials, in which patients are randomly assigned to receive treatment or placebo, has transformed medical research in recent years.
  8. Life: Life is uncertain, and probability is the logic of uncertainty. While it isn't practical to carry out a formal probability calculation for every decision made in life, thinking hard about probability can help us avert some common fallacies, shed light on coincidences, and make better predictions.

Sample Spaces and Pebble World

The mathematical framework for probability is built around sets. Imagine that an experiment is performed, resulting in one out of a set of possible outcomes. Before the experiment is performed, it is unknown which outcome will be the result; after, the result ''crystallizes" into the actual outcome.

Definition: Sample Sspace and Event

The sample space SS of an experiment is the set of all possible outcomes of the experiment. An event AA is a subset of the sample space SS, and we say that AA occurred if the actual outcome is in AA.

A sample space as Pebble World, with two events  and  spotlighted. View Larger Image
Image Description

The sample space of an experiment can be finite or infinite. When the sample space is finite, we can visualize it as Pebble World. Each pebble represents an outcome, and an event is a set of pebbles. Performing the experiment amounts to randomly selecting one pebble.

Set theory is very useful in probability, since it provides a rich language for expressing and working with events. Set operations, especially unions, intersections, and complements, make it easy to build new events in terms of already-defined events.

For example, let SS be the sample space of an experiment and let A,BSA, B \subseteq S be events. Then the union ABA \cup B is the event that occurs if and only if at least one of AA and BB occurs, the intersection ABA \cap B is the event that occurs if and only if both AA and BB occur, and the complement ACA^{C} is the event that occurs if and only if AA does not occur. We also have De Morgan's laws:

(AB)c=AcBc and (AB)c=AcBc,(A \cup B)^c = A^c \cap B^c \textrm{ and } (A \cap B)^c = A^c \cup B^c,

since saying that it is not the case that at least one of AA and BB occur is the same as saying that AA does not occur and BB does not occur, and saying that it is not the case that both occur is the same as saying that at least one does not occur. Analogous results hold for unions and intersections of more than two events.

Example Coin Flips

A coin is flipped 10 times. Writing Heads as HH and Tails as TT, a possible outcome (pebble) is HHHTHHTTHTHHHTHHTTHT, and the sample space is the set of all possible strings of length 10 of HH's and TT's. We can (and will) encode HH as 11 and TT as 00, so that an outcome is a sequence (s1,,s10)(s_1,\dots,s_{10}) with sj{0,1}s_j \in \{0,1\}, and the sample space is the set of all such sequences. Now let's look at some events:

  1. Let A1A_1 be the event that the first flip is Heads. As a set,

    A1={(1,s2,,s10):sj{0,1} for 2j10}.A_1 = \{(1,s_2, \dots,s_{10}): s_j \in \{0,1\} \textrm{ for \(2 \leq j \leq 10\)}\}.

    This is a subset of the sample space, so it is indeed an event; saying that A1A_1 occurs is the same thing as saying that the first flip is Heads. Similarly, let AjA_j be the event that the jjth flip is Heads for j=2,3,,10j=2,3,\dots,10.

  2. Let BB be the event that at least one flip was Heads. As a set,

    B=j=110Aj.B = \bigcup_{j=1}^{10} A_j.
  3. Let CC be the event that all the flips were Heads. As a set,

    C=j=110Aj.C = \bigcap_{j=1}^{10} A_j.
  4. Let DD be the event that there were at least two consecutive Heads. As a set,

    D=j=19(AjAj+1).D = \bigcup_{j=1}^9 (A_j \cap A_{j+1}).

A Set Theory Dictionary

捕获.JPG

捕获.JPG

捕获.JPG

捕获.JPG

Naive definition of probability

Historically, the earliest definition of the probability of an event was to count the number of ways the event could happen and divide by the total number of possible outcomes for the experiment. We call this the naive definition since it is restrictive and relies on strong assumptions; nevertheless, it is important to understand, and useful when not misused.

Definition: Naive Definition of Probability

Let AA be an event for an experiment with a finite sample space SS. The naive probability of AA is

Pnaive(A)=AS=number of outcomes favorable to Atotal number of outcomes in S,P_{\textrm{naive}}(A) = \frac{|A|}{|S|}=\frac{\textrm{number of outcomes favorable to \(A\)}}{\textrm{total number of outcomes in \(S\)}},

where A|A| is the size (cardinality) of set AA.

The naive definition is very restrictive in that it requires SS to be finite, with equal mass for each pebble. It has often been misapplied by people who assume equally likely outcomes without justification and make arguments to the effect of "either it will happen or it won't, and we don't know which, so it's 50-50". For example, if we don't know whether or not there is life on Saturn, should we conclude that it is 50-50? What about intelligent life on Saturn, which seems like it should be strictly less likely than there being any form of life on Saturn? But there are several important types of problems where the naive definition is applicable, such as when there is symmetry in the problem that makes the outcomes equally likely.

How to Count

Multiplication rule

In some problems, we can directly count the number of possibilities using a basic but versatile principle called the multiplication rule. We'll see that the multiplication rule leads naturally to counting rules for sampling with replacement and sampling without replacement, two scenarios that often arise in probability and statistics.

Theorem: Multiplication Rule

Consider a compound experiment consisting of two sub-experiments, Experiment A and Experiment B. Suppose that Experiment A has aa possible outcomes, and for each of those outcomes Experiment B has bb possible outcomes. Then the compound experiment has abab possible outcomes.

We can use the multiplication rule to arrive at formulas for sampling with and without replacement. Many experiments in probability and statistics can be interpreted in one of these two contexts, so it is appealing that both formulas follow directly from the same basic counting principle.

Theorem: Sampling with Replacement

Consider nn objects and making kk choices from them, one at a time with replacement (i.e., choosing a certain object does not preclude it from being chosen again). Then there are nkn^k possible outcomes.

Theorem: Sampling without Replacement

Consider nn objects and making kk choices from them, one at a time without replacement (i.e., choosing a certain object precludes it from being chosen again). Then there are n(n1)(nk+1)n(n-1) \cdots (n-k+1) possible outcomes, for knk \leq n (and 00 possibilities for k>nk > n).

The above theorems about counting, but when the naive definition applies, we can use them to calculate probabilities. This brings us to our next example, a famous problem in probability called the birthday problem. The solution incorporates both sampling with replacement and sampling without replacement.

Example Birthday Problem

There are kk people in a room. Assume each person's birthday is equally likely to be any of the 365 days of the year (we exclude February 29), and that people's birthdays are independent (we assume there are no twins in the room). What is the probability that two or more people in the group have the same birthday?

There are 365k365^k ways to assign birthdays to the people in the room, since we can imagine the 365 days of the year being sampled kk times, with replacement. By assumption, all of these possibilities are equally likely, so the naive definition of probability applies.

Used directly, the naive definition says we just need to count the number of ways to assign birthdays to kk people such that there are two or more people who share a birthday. But this counting problem is hard, since it could be Emma and Steve who share a birthday, or Steve and Naomi, or all three of them, or the three of them could share a birthday while two others in the group share a different birthday, or various other possibilities.

Instead, let's count the complement: the number of ways to assign birthdays to kk people such that no two people share a birthday. This amounts to sampling the 365 days of the year without replacement, so the number of possibilities is 365364363(365k+1)365\cdot364\cdot363 \cdots (365-k+1) for k365k \leq 365. Therefore the probability of no birthday matches in a group of kk people is

P(no birthday match)=365364(365k+1)365k,P(\textrm{no birthday match}) = \frac{365 \cdot 364 \cdots (365-k+1)}{365^k},

and the probability of at least one birthday match is

P(at least 1 birthday match)=1365364(365k+1)365k.P(\textrm{at least 1 birthday match}) = 1 - \frac{365 \cdot 364 \cdots (365-k+1)}{365^k}.

The figure plots the probability of at least one birthday match as a function of kk. The first value of kk for which the probability of a match exceeds 0.5 is k=23k=23. Thus, in a group of 23 people, there is a better than 50% chance that two or more of them will have the same birthday. By the time we reach k=57k=57, the probability of a match exceeds 99%.

1_pbirthday.png

Of course, for k=366k = 366 we are guaranteed to have a match, but it's surprising that even with a much smaller number of people it's overwhelmingly likely that there is a birthday match. For a quick intuition into why it should not be so surprising, note that with 23 people there are (232)=253{23 \choose 2} = 253 pairs of people, any of which could be a birthday match.

Adjusting for Overcounting

In many counting problems, it is not easy to directly count each possibility once and only once. If, however, we are able to count each possibility exactly cc times for some cc, then we can adjust by dividing by cc. For example, if we have exactly double-counted each possibility, we can divide by 22 to get the correct count. We call this adjusting for overcounting.

Example Committees and Teams

Consider a group of four people.

(a) How many ways are there to choose a two-person committee?

(b) How many ways are there to break the people into two teams of two?

(a) One way to count the possibilities is by listing them out: labeling the people as 1, 2, 3, 4, the possibilities are 1,2\boxed{1,2}, 1,3\boxed{1,3}, 1,4\boxed{1,4}, 2,3\boxed{2,3}, 2,4\boxed{2,4}, 3,4\boxed{3,4}.

Another approach is to use the multiplication rule with an adjustment for overcounting. By the multiplication rule, there are 4 ways to choose the first person on the committee and 3 ways to choose the second person on the committee, but this counts each possibility twice, since picking 1 and 2 to be on the committee is the same as picking 2 and 1 to be on the committee. Since we have overcounted by a factor of 2, the number of possibilities is (43)/2=6(4 \cdot 3)/2 = 6.

(b) Here are 3 ways to see that there are 3 ways to form the teams. Labeling the people as 1,2,3,41, 2, 3, 4, we can directly list out the possibilities: 1,23,4\boxed{1,2}\boxed{3,4}, 1,32,4\boxed{1,3}\boxed{2,4}, and 1,42,3\boxed{1,4}\boxed{2,3}. Listing out all possibilities would quickly become tedious or infeasible with more people though. Another approach is to note that it suffices to specify person 1's teammate (and then the other team is determined). A third way is to use (a) to see that there are 6 ways to choose one team. This overcounts by a factor of 2, since picking 11 and 22 to be a team is equivalent to picking 33 and 44 to be a team. So again the answer is 6/2=36/2=3.

A binomial coefficient counts the number of subsets of a certain size for a set, such as the number of ways to choose a committee of size KK from a set of nn people. Sets and subsets are by definition unordered, e.g., {3,1,4}={4,1,3}\{3,1,4\}=\{4,1,3\}, so we are counting the number of ways to choose kk objects out of nn, without replacement and without distinguishing between the different orders in which they could be chosen.

Definition: Binomial Coefficient

For any nonnegative integers kk and nn, the binomial coefficient (nk){n \choose k} , read as 'nn' choose 'kk', is the number of subsets of size kk for a set of size nn.

Theorem: Binomial Coefficient Formula

For knk \leq n, we have

(nk)=n(n1)(nk+1)k!=n!(nk)!k!.{n \choose k} = \frac{n(n-1)\cdots (n-k+1)}{k!}=\frac{n!}{(n-k)!k!}.

For k>nk > n, we have (nk)=0{n \choose k} = 0.

Proof: Let AA be a set with A=n|A|=n. Any subset of has size at most nn, so (nk)=0{n \choose k} = 0 for k>nk > n. Now let knk \leq n. By Theorem Sampling without Replacement, there are n(n1)(nk+1)n(n-1) \cdots (n-k+1) ways to make an ordered choice of kk elements without replacement. This overcounts each subset of interest by a factor of k!k! (since we don't care how these elements are ordered), so we can get the correct count by dividing by k!k!.

Example Full House in Poker

A 5-card hand is dealt from a standard, well-shuffled 52-card deck. The hand is called a full house in poker if it consists of three cards of some rank and two cards of another rank, e.g., three 7's and two 10's (in any order). What is the probability of a full house?

All of the (525){52 \choose 5} possible hands are equally likely by symmetry, so the naive definition is applicable. To find the number of full house hands, use the multiplication rule. There are 13 choices for what rank we have three of; for concreteness, assume we have three 7's. There are (43){4 \choose 3} ways to choose which 7's we have. Then there are 12 choices for what rank we have two of, say 10's for concreteness, and (42){4 \choose 2} ways to choose two 10's. Thus,

P(full house)=13(43)12(42)(525)=374425989600.00144.P(\textrm{full house}) = \frac{13 {4 \choose 3} 12 {4 \choose 2}}{{52 \choose 5}} = \frac{3744}{2598960} \approx 0.00144.