Conditioning as A Problem-solving Tool

Conditioning is a powerful tool for solving problems because it lets us engage in wishful thinking: when we encounter a problem that would be made easier if only we knew whether $E$ happened or not, we can condition on $E$ and then on $E^c$ , consider these possibilities separately, then combine them using LOTP.

Strategy: Condition on What You Wish You Knew

Example Monty Hall

On the game show Let's Make a Deal, hosted by Monty Hall, a contestant chooses one of three closed doors, two of which have a goat behind them and one of which has a car. Monty, who knows where the car is, then opens one of the two remaining doors. The door he opens always has a goat behind it (he never reveals the car!). If he has a choice, then he picks a door at random with equal probabilities. Monty then offers the contestant the option of switching to the other unopened door. If the contestant's goal is to get the car, should she switch doors?

Solution: Let's label the doors 1 through 3. Without loss of generality, we can assume the contestant picked door 1 (if she didn't pick door 1, we could simply relabel the doors, or rewrite this solution with the door numbers permuted). Monty opens a door, revealing a goat. As the contestant decides whether or not to switch to the remaining unopened door, what does she really wish she knew? Naturally, her decision would be a lot easier if she knew where the car was! This suggests that we should condition on the location of the car. Let $C_i$ be the event that the car is behind door $i$ , for $i = 1, 2, 3$ . By LOTP,

P(\textrm{get car}) = P(\textrm{get car}|C_1) \cdot \frac{1}{3} + P(\textrm{get car}|C_2) \cdot \frac{1}{3} + P(\textrm{get car}|C_3) \cdot \frac{1}{3}.

Suppose the contestant employs the switching strategy. If the car is behind door 1, then switching will fail, so $P(\textrm{get car}|C_1) = 0$ . If the car is behind door 2 or 3, then because Monty always reveals a goat, the remaining unopened door must contain the car, so switching will succeed. Thus,

P(\textrm{get car}) = 0 \cdot \frac{1}{3} + 1 \cdot \frac{1}{3} + 1 \cdot \frac{1}{3} = \frac{2}{3},

so the switching strategy succeeds $2/3$ of the time.

The following figure is a tree diagram of the argument we have just outlined: using the switching strategy, the contestant will win as long as the car is behind doors 2 or 3, which has probability $2/3$ . We can also give an intuitive frequentist argument in favor of switching. Imagine playing this game 1000 times. Typically, about 333 times your initial guess for the car's location will be correct, in which case switching will fail. The other 667 or so times, you will win by switching.

There's a subtlety though, which is that when the contestant chooses whether to switch, she also knows which door Monty opened. We showed that the unconditional probability of success is $2/3$ (when following the switching strategy), but let's also show that the conditional probability of success for switching, given the information that Monty provides, is also $2/3$ .

Let $M_j$ be the event that Monty opens door $j$ , for $j = 2,3$ . Then

P(\textrm{get car}) = P(\textrm{get car}|M_2) P(M_2)+ P(\textrm{get car}|M_3) P(M_3),

where by symmetry $P(M_2)=P(M_3)=1/2$ and $P(\textrm{get car}|M_2) = P(\textrm{get car}|M_3).$ The symmetry here is that there is nothing in the statement of the problem that distinguishes between door 2 and door 3.

Let $x=P(\textrm{get car}|M_2)=P(\textrm{get car}|M_3)$ . Plugging in what we know,

\frac{2}{3}=P(\textrm{get car}) = \frac{x}{2} + \frac{x}{2} = x,

as claimed.

Bayes' rule also works nicely for finding the conditional probability of success using the switching strategy, given the evidence. Suppose that Monty opens door 2. Using the notation and results above,

P(C_1|M_2) = \frac{P(M_2|C_1)P(C_1)}{P(M_2)} = \frac{(1/2)(1/3)}{1/2}=\frac{1}{3}.

So given that Monty opens door 2, there is a $1/3$ chance that the contestant's original choice of door has the car, which means that there is a $2/3$ chance that the switching strategy will succeed.

Many people, upon seeing this problem for the first time, argue that there is no advantage to switching: ''There are two doors remaining, and one of them has the car, so the chances are 50-50.'' After the last chapter, we recognize that this argument misapplies the naive definition of probability. Yet the naive definition, even when inappropriate, has a powerful hold on people's intuitions.

To build correct intuition, let's consider an extreme case. Suppose that there are a million doors, 999,999 of which contain goats and 1 of which has a car. After the contestant's initial pick, Monty opens 999,998 doors with goats behind them and offers the choice to switch. In this extreme case, it becomes clear that the probabilities are not 50-50 for the two unopened doors; very few people would stubbornly stick with their original choice. The same is true for the three-door case.

Strategy: Condition on the First Step

A useful strategy in many problems is to condition on the first step of the experiment. The next two examples apply this strategy, which we call first-step analysis.

Example Branching Process

A single amoeba, Bobo, lives in a pond. After one minute Bobo will either die, split into two amoebas, or stay the same, with equal probability, and in subsequent minutes all living amoebas will behave the same way, independently. What is the probability that the amoeba population will eventually die out?

Solution: Let $D$ be the event that the population eventually dies out; we want to find $P(D)$ . We proceed by conditioning on the outcome at the first step: let $B_i$ be the event that Bobo turns into $i$ amoebas after the first minute, for $i=0, 1, 2$ . We know $P(D|B_0) = 1$ and $P(D|B_1) = P(D)$ (if Bobo stays the same, we're back to where we started). If Bobo splits into two, then we just have two independent versions of our original problem! We need both of the offspring to eventually die out, so $P(D|B_2) = P(D)^2$ . Now we have exhausted all the possible cases and can combine them with the law of total probability:

\begin{aligned} P(D) &= P(D|B_0) \cdot \frac{1}{3} + P(D|B_1) \cdot \frac{1}{3} + P(D|B_2) \cdot \frac{1}{3} \\ &= 1 \cdot \frac{1}{3} + P(D) \cdot \frac{1}{3} + P(D)^2 \cdot \frac{1}{3}. \end{aligned}

Solving for $P(D)$ gives $P(D) = 1$ : the amoeba population will die out with probability 1.

The strategy of first-step analysis works here because the problem is self-similar in nature: when Bobo continues as a single amoeba or splits into two, we end up with another version or another two versions of our original problem. Conditioning on the first step allows us to express $P(D)$ in terms of itself.

Example Gambler's Ruin

Two gamblers, A and B, make a sequence of $1 bets. In each bet, gambler A has probability $p$ of winning, and gambler B has probability $q = 1 - p$ of winning. Gambler A starts with $i$ dollars and gambler B starts with $N - i$ dollars; the total wealth between the two remains constant since every time A loses a dollar, the dollar goes to B, and vice versa.

We can visualize this game as a random walk on the integers between 0 and $N$ , where $p$ is the probability of going to the right in a given step: imagine a person who starts at position $i$ and, at each time step, moves one step to the right with probability $p$ and one step to the left with probability $q = 1 - p$ . The game ends when either A or B is ruined, i.e., when the random walk reaches 0 or $N$ . What is the probability that A wins the game (walking away with all the money)?

Solution: Note that after the first step, it's exactly the same game, except that A's wealth is now either $i + 1$ or $i - 1$ . Let $p_i$ be the probability that A wins the game, given that A starts with $i$ dollars. We will use first-step analysis to solve for the $p_i$ . Let $W$ be the event that A wins the game. By LOTP, conditioning on the outcome of the first round, we have

\begin{aligned} p\_i &= P(W|\textrm{A starts at (i), wins round 1}) \cdot p + P(W| \textrm{A starts at (i), loses round 1}) \cdot q \\ &= P(W|\textrm{A starts at (i+1)}) \cdot p + P(W| \textrm{A starts at (i-1)}) \cdot q \\ &= p\_{i+1} \cdot p + p\_{i-1} \cdot q. \end{aligned}

This must be true for all $i$ from 1 to $N - 1$ , and we also have the boundary conditions $p_0 = 0$ and $p_N = 1$ . Now we can solve this equation, called a difference equation, to obtain the $p_i$ . This gives that the probability of A winning with a starting wealth of $i$ is

p_i = \left\{\begin{matrix} \frac{1-\left(\frac{q}{p}\right)^i}{1-\left(\frac{q}{p}\right)^N} & \textrm{if } p \neq 1/2, \\ \frac{i}{N} & \textrm{if } p=1/2. \\ \end{matrix}\right.

Example Simpson's Paradox

Two doctors, Dr. Hibbert and Dr. Nick, each perform two types of surgeries: heart surgery and Band-Aid removal. Each surgery can be either a success or a failure. The two doctors' respective records are given in the following tables, and shown graphically in the figure, where white dots represent successful surgeries and black dots represent failed surgeries.

	Dr. Hibbert
	Heart	Band-aid
Success	70	10
Failure	20	0

	Dr. Nick
	Heart	Band-aid
Success	2	81
Failure	8	9

Dr. Hibbert had a higher success rate than Dr. Nick in heart surgeries: 70 out of 90 versus 2 out of 10. Dr. Hibbert also had a higher success rate in Band-Aid removal: 10 out of 10 versus 81 out of 90. But if we aggregate across the two types of surgeries to compare overall surgery success rates, Dr. Hibbert was successful in 80 out of 100 surgeries while Dr. Nick was successful in 83 out of 100 surgeries: Dr. Nick's overall success rate is higher!

What's happening is that Dr. Hibbert, presumably due to his reputation as the superior doctor, is performing a greater number of heart surgeries, which are inherently riskier than Band-Aid removals. His overall success rate is lower not because of lesser skill on any particular type of surgery, but because a larger fraction of his surgeries are risky.

Let's use event notation to make this precise. For events $A$ , $B$ , and $C$ , we say that we have a Simpson's paradox if

\begin{aligned} P(A|B,C) < P(A|B^c, C)\\ P(A|B,C^c) < P(A|B^c, C^c), \end{aligned}

but

P(A|B) > P(A|B^c).

In this case, let $A$ be the event of a successful surgery, $B$ be the event that Dr. Nick is the surgeon, and $C$ be the event that the surgery is a heart surgery. The conditions for Simpson's paradox are fulfilled because the probability of a successful surgery is lower under Dr. Nick than under Dr. Hibbert whether we condition on heart surgery or on Band-Aid removal, but the overall probability of success is higher for Dr. Nick.

The law of total probability tells us mathematically why this can happen:

\begin{aligned} P(A|B) &= P(A|C,B) P(C|B) + P(A|C^c,B) P(C^c|B) \\ P(A|B^c) &= P(A|C,B^c) P(C|B^c) + P(A|C^c,B^c) P(C^c|B^c) . \end{aligned}

Although we have

P(A|C,B) < P(A|C,B^c)

and

P(A|C^c,B) < P(A|C^c,B^c),

the weights $P(C|B)$ and $P(C^c|B)$ can flip the overall balance. In our situation

P(C^c|B) > P(C^c|B^c),

since Dr. Nick is much more likely to be performing Band-Aid removals, and this difference is large enough to cause $P(A|B)$ to be greater than $P(A|B^c)$ .

Aggregation across different types of surgeries presents a misleading picture of the doctors' abilities because we lose the information about which doctor tends to perform which type of surgery.

Conditional Probability and Bayes' Rule 3