The Importance of Thinking Conditionally
Conditional probability is the concept that addresses the following fundamental question: how should we update our beliefs in light of the evidence we observe? In fact, a useful perspective is that all probabilities are conditional; whether or not it's written explicitly, there is always background knowledge (or assumptions) built into every probability.
In addition to giving us a technique for updating our probabilities based on observed information, conditioning is a very powerful problem-solving strategy, often making it possible to solve a complicated problem by decomposing it into manageable pieces with case-by-case reasoning. Due to the central importance of conditioning, we say that conditioning is the soul of statistics.
Definition and Ituition
Definition: Conditional Probability
If and are events with , then the conditional probability of given , denoted by , is defined as
Here is the event whose uncertainty we want to update, and is the evidence we observe (or want to treat as given). We call the prior probability of and the posterior probability of (''prior" means before updating based on the evidence, and ''posterior" means after updating based on the evidence). It is important to interpret the event appearing after the vertical conditioning bar as the evidence that we have observed or that is being conditioned on: is the probability of given the evidence , not the probability of some entity called .
Example Two Cards
Solution: By the naive definition of probability and the multiplication rule,
since a favorable outcome is determined by choosing any of the 13 hearts and then any of the remaining 25 red cards. Also, since the 4 suits are equally likely, and
since there are 26 favorable possibilities for the second card, and for each of those, the first card can be any other card. A neater way to see that is by symmetry: from a vantage point before having done the experiment, the second card is equally likely to be any card in the deck. We now have all the pieces needed to apply the definition of conditional probability:
This is a simple example, but already there are several things worth noting.
- It's extremely important to be careful about which events to put on which side of the conditioning bar. In particular, . The next section explores how and are related in general. Confusing these two quantities is called the prosecutor's fallacy. If instead we had defined to be the event that the second card is a heart, the two conditional probabilities would have been equal.
- Both and make sense (intuitively and mathematically); the chronological order in which cards were chosen does not dictate which conditional probabilities we can look at. When we calculate conditional probabilities, we are considering what information observing one event provides about another event, not whether one event causes another.
To shed more light on what conditional probability means, here are two intuitive interpretations.
Intuition: Pebble World
Consider a finite sample space, with the outcomes visualized as pebbles with total mass 1. Since is an event, it is a set of pebbles, and likewise for .
Events and are subsets of the sample space.
Now suppose that we learn that occurred. Upon obtaining this information, we get rid of all the pebbles in because they are incompatible with the knowledge that has occurred. Then is the total mass of the pebbles remaining in .
Then, we renormalize, that is, divide all the masses by a constant so that the new total mass of the remaining pebbles is 1. This is achieved by dividing by , the total mass of the pebbles in . The updated mass of the outcomes corresponding to event is the conditional probability .
Intuition: Frequentist Interpretation
Imagine repeating an experiment many times, randomly generating a long list of observed outcomes, each of them represented by a string of twenty-four 0's and 1's. The conditional probability of given can then be thought of in a natural way: it is the fraction of times that occurs, restricting attention to the trials where occurs. In the above figure, our experiment has outcomes which can be represented as a string of 0's and 1's; is the event that the first digit is 1 and is the event that the second digit is 1. Conditioning on , we circle all the repetitions where occurred, and then we look at the fraction of circled repetitions in which event also occurred.
In symbols, let be the number of occurrences of respectively in a large number of repetitions of the experiment. The frequentist interpretation is that
Then is interpreted as , which equals . This interpretation again translates to .
Bayes' Rule and the Law of Total Probability
The definition of conditional probability is simple—just a ratio of two probabilities—but it has far-reaching consequences. The first consequence is obtained easily by moving the denominator in the definition to the other side of the equation.
Theorem
For any events and with positive probabilities,
At first sight this theorem may not seem very useful: it is the definition of conditional probability, just written slightly differently, and anyway it seems circular to use to help find when was defined in terms of . But we will see that the theorem is in fact very useful, since it often turns out to be possible to find conditional probabilities without going back to the definition.
Applying the above theorem repeatedly, we can generalize to the intersection of events.
Theorem
For any events with positive probabilities,
The commas denote intersections. For example, is the probability that occurs, given that both and occur.
We are now ready to introduce the two main theorems about conditional probability: Bayes' rule and the law of total probability (LOTP).
Theorem: Bayes' Rule
This follows immediately from Theorem Bayes' Rule, which in turn followed immediately from the definition of conditional probability. Yet Bayes' rule has important implications and applications in probability and statistics, since it is so often necessary to find conditional probabilities, and often is much easier to find directly than (or vice versa).
The law of total probability (LOTP) relates conditional probability to unconditional probability. It is essential for fulfilling the promise that conditional probability can be used to decompose complicated probability problems into simpler pieces.
Theorem: Law of Total Probability (LOTP)
Let be a partition of the sample space (i.e., the are disjoint events and their union is ), with for all . Then
Proof:
The law of total probability tells us that to get the unconditional probability of , we can divide the sample space into disjoint slices , find the conditional probability of within each of the slices, then take a weighted sum of the conditional probabilities, where the weights are the probabilities . The choice of how to divide up the sample space is crucial: a well-chosen partition will reduce a complicated problem into simpler pieces, whereas a poorly chosen partition will only exacerbate our problems, requiring us to calculate difficult probabilities instead of just one!
Example Random Coin
You have one fair coin, and one biased coin which lands Heads with probability 3/4. You pick one of the coins at random and flip it three times. It lands Heads all three times. Given this information, what is the probability that the coin you picked is the fair one?
Solution: