Joint, marginal, and conditional distributions

So far we have been focusing on the distribution of one random variable at a time, but very often we care about the relationship between multiple r.v.s in the same experiment. To give just a few examples:

Surveys: When conducting a survey, we may ask multiple questions to each respondent in order to determine the relationship between, say, opinions on social issues and opinions on economic issues.
Medicine: To evaluate the effectiveness of a treatment, we may take multiple measurements per patient; an ensemble of blood pressure, heart rate, and cholesterol readings can be more informative than any of these measurements considered separately.
Genetics: To study the relationships between various genetic markers and a particular disease, if we only looked separately at distributions for each genetic marker, we could fail to learn about whether an interaction between markers is related to the disease.
Time series: To study how something evolves over time, we can often make a series of measurements over time, and then study the series jointly. There are many applications of such series, such as global temperatures, stock prices, or national unemployment rates. The series of measurements considered jointly can help us deduce trends for the purpose of forecasting future measurements.

This unit considers joint distributions, also called multivariate distributions, which describe how multiple r.v.s interact with each other. We introduce multivariate analogs of the CDF, PMF, and PDF in order to provide a complete specification of the relationship between multiple r.v.s. After this groundwork is in place, we'll study a couple of famous named multivariate distributions, generalizing the Binomial and Normal distributions to higher dimensions.

The three key concepts for this section are joint, marginal, and conditional distributions. Recall that the distribution of a single r.v. $X$ provides complete information about the probability of $X$ falling into any subset of the real line. Analogously, the joint distribution of two r.v.s $X$ and $Y$ provides complete information about the probability of the vector falling into any subset of the plane. The marginal distribution of is the individual distribution of , ignoring the value of , and the conditional distribution of given is the updated distribution for after observing . We'll look at these concepts in the discrete case first, then extend them to the continuous case.

Discrete

The most general description of the joint distribution of two r.v.s is the joint CDF, which applies to discrete and continuous r.v.s alike.

Definition: Joint CDF

The joint CDF of r.v.s $X$ and $Y$ is the function $F_{X,Y}$ given by
$F_{X,Y}(x,y) = P(X\leq x, Y \leq y).$
The joint CDF of r.v.s is defined analogously.

Unfortunately, the joint CDF of discrete r.v.s is not a well-behaved function; as in the univariate case, it consists of jumps and flat regions. For this reason, with discrete r.v.s we usually work with the joint PMF, which also determines the joint distribution and is much easier to visualize.

Definition: Joint PMF

The joint PMF of discrete r.v.s $X$ and $Y$ is the function $p_{X,Y}$ given by
$p_{X,Y}(x,y) = P(X=x, Y=y).$
The joint PMF of $n$ discrete r.v.s is defined analogously.

Just as univariate PMFs must be nonnegative and sum to 1, we require valid joint PMFs to be nonnegative and sum to 1, where the sum is taken over all possible values of $X$ and $Y$ :

\sum_x \sum_y P(X=x, Y=y) = 1.

The following figure shows a sketch of what the joint PMF of two discrete r.v.s could look like. The height of a vertical bar at $(x, y)$ represents the probability $P(X=x,Y=y)$ . For the joint PMF to be valid, the total height of the vertical bars must be 1.

From the joint distribution of $X$ and $Y$ , we can get the distribution of $X$ alone by summing over the possible values of $Y$ . This gives us the familiar PMF of $X$ that we have seen in previous chapters. In the context of joint distributions, we will call it the marginal or unconditional distribution of $X$ , to make it clear that we are referring to the distribution of $X$ alone, without regard for the value of $Y$ .

Definition: Marginal PMF

For discrete r.v.s $X$ and $Y$ , the marginal PMF of $X$ is
$P(X=x) = \sum_y P(X=x, Y=y).$
The marginal PMF of $X$ is the PMF of $X$ , viewing $X$ individually rather than jointly with $Y$ . The above equation follows from the axioms of probability (we are summing over disjoint cases). The operation of summing over the possible values of $Y$ in order to convert the joint PMF into the marginal PMF of $X$ is known as marginalizing out $Y$ .

Similarly, the marginal PMF of $Y$ is obtained by summing over all possible values of $X$ . So given the joint PMF, we can marginalize out $Y$ to get the PMF of $X$ , or marginalize out $X$ to get the PMF of $Y$ . But if we only know the marginal PMFs of $X$ and $Y$ , there is no way to recover the joint PMF without further assumptions.

Now suppose that we observe the value of $X$ and want to update our distribution of $Y$ to reflect this information. Instead of using the marginal PMF $P(Y=y)$ , which does not take into account any information about $X$ , we should use a PMF that conditions on the event $X=x$ , where $x$ is the value we observed for $X$ . This naturally leads us to consider conditional PMFs.

Definition: Conditional PMF

For discrete r.v.s $X$ and $Y$ , the conditional PMF of $Y$ given $X=x$ is
$P(Y=y | X=x) = \frac{P(X=x, Y=y)}{P(X=x)}.$
This is viewed as a function of $y$ for fixed $x$ .

The following figure illustrates the definition of conditional PMF. To condition on the event , we first take the joint PMF and focus in on the vertical bars where takes on the value ; in the figure, these are shown in bold. All of the other vertical bars are irrelevant because they are inconsistent with the knowledge that occurred. Since the total height of the bold bars is the marginal probability , we then renormalize the conditional PMF by dividing by ; this ensures that the conditional PMF will sum to . Therefore conditional PMFs are PMFs, just as conditional probabilities are probabilities. Notice that there is a different conditional PMF of for every possible value of ; the following figure highlights just one of these conditional PMFs.

屏幕截图 2023-07-05 202324.png

Definition: Independence of Discrete r.v.s

Random variables $X$ and $Y$ are independent if for all $x$ and $y$ ,
$F_{X,Y}(x,y) = F_X(x) F_Y(y).$
If $X$ and $Y$ are discrete, this is equivalent to the condition
$P(X=x, Y=y) = P(X=x) P(Y=y)$
for all $x$ and $y$ , and it is also equivalent to the condition
$P(Y=y|X=x) = P(Y=y)$
for all $y$ and all $x$ such that $P(X=x) > 0$ .

Using the terminology from this chapter, the definition says that for independent r.v.s, the joint CDF factors into the product of the marginal CDFs, or that the joint PMF factors into the product of the marginal PMFs. Remember that in general, the marginal distributions do not determine the joint distribution: this is the entire reason why we wanted to study joint distributions in the first place! But in the special case of independence, the marginal distributions are all we need in order to specify the joint distribution; we can get the joint PMF by multiplying the marginal PMFs.

Another way of looking at independence is that all the conditional PMFs are the same as the marginal PMF. In other words, starting with the marginal PMF of $Y$ , no updating is necessary when we condition on $X=x$ , regardless of what $x$ is. There is no event purely involving $X$ that influences our distribution of $Y$ , and vice versa.

Continuous

Once we have a handle on discrete joint distributions, it isn't much harder to consider continuous joint distributions. We simply make the now-familiar substitutions of integrals for sums and PDFs for PMFs, remembering that the probability of any individual point is now 0.

Formally, in order for $X$ and $Y$ to have a continuous joint distribution, we require that the joint CDF

F_{X,Y}(x,y) = P(X \leq x, Y \leq y)

be differentiable with respect to $x$ and $y$ . The partial derivative with respect to $x$ and $y$ is called the joint PDF. The joint PDF determines the joint distribution, as does the joint CDF.

Definition: Joint PDF

If $X$ and $Y$ are continuous with joint CDF $F_{X,Y}$ , their* joint PDF* is the derivative of the joint CDF with respect to $x$ and $y$ :
$f_{X,Y}(x,y) = \frac{\partial^2}{\partial x \partial y} F_{X,Y}(x,y).$

We require valid joint PDFs to be nonnegative and integrate to 1:

f_{X,Y}(x,y) \geq 0, \textrm{ and } \int_{-\infty}^\infty \int_{-\infty}^\infty f_{X,Y}(x,y) dx dy = 1.

In the univariate case, the PDF was the function we integrated to get the probability of an interval. Similarly, the joint PDF of two r.v.s is the function we integrate to get the probability of a two-dimensional region.

The following figure shows a sketch of what a joint PDF of two r.v.s could look like. As usual with continuous r.v.s, we need to keep in mind that the height of the surface $f_{X,Y}(x,y)$ at a single point does not represent a probability. The probability of any specific point in the plane is 0; furthermore, now that we've gone up a dimension, the probability of any line or curve in the plane is also 0. The only way we can get nonzero probability is by integrating over a region of positive area in the $xy$ -plane.

When we integrate the joint PDF over an area $A$ , what we are calculating is the volume under the surface of the joint PDF and above $A$ . Thus, probability is represented by volume under the joint PDF. The total volume under a valid joint PDF is 1.

In the discrete case, we get the marginal PMF of $X$ by summing over all possible values of $Y$ in the joint PMF. In the continuous case, we get the marginal PDF of $X$ by integrating over all possible values of $Y$ in the joint PDF.

Definition: Marginal PDF

For continuous r.v.s $X$ and $Y$ with joint PDF $f_{X,Y}$ , the marginal PDF of $X$ is
$f_X(x) = \int_{-\infty}^\infty f_{X,Y}(x,y) dy.$
This is the PDF of $X$ , viewing $X$ individually rather than jointly with $Y$ .

To simplify notation, we have mainly been looking at the joint distribution of two r.v.s rather than $n$ r.v.s, but marginalization works analogously with any number of variables. For example, if we have the joint PDF of $X,Y,Z,W$ but want the joint PDF of $X,W$ , we just have to integrate over all possible values of $Y$ and $Z$ :

f_{X,W}(x,w) = \int_{-\infty}^\infty \int_{-\infty}^\infty f_{X,Y,Z,W}(x,y,z,w) dydz.

Conceptually this is very easy---just integrate over the unwanted variables to get the joint PDF of the wanted variables---but computing the integral may or may not be difficult. Returning to the case of the joint distribution of two r.v.s $X$ and $Y$ , let's consider how to update our distribution for $Y$ after observing the value of $X$ , using the conditional PDF.

Definition: Conditional PDF

For continuous r.v.s $X$ and $Y$ with joint PDF $f_{X,Y}$ , the conditional PDF of $Y$ given $X=x$ is
$f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)}.$
This is considered as a function of $y$ for fixed $x$ .

屏幕截图 2023-07-06 004107.png

Note that we can recover the joint PDF $f_{X,Y}$ if we have the conditional PDF $f_{Y|X}$ and the corresponding marginal $f_X$ :

f_{X,Y}(x,y) = f_{Y|X}(y|x) f_X(x).

Similarly, we can recover the joint PDF if we have $f_{X|Y}$ and $f_Y$ :

f_{X,Y}(x,y) = f_{X|Y}(x|y) f_Y(y).

This allows us to develop continuous analogs of Bayes' rule and LOTP. These formulas still hold in the continuous case, replacing probabilities with probability density functions.

Theorem: Continuous Form of Bayes' Rule and LOTP

For continuous r.v.s $X$ and $Y$ ,
$f_{Y|X}(y|x) = \frac{f_{X|Y}(x|y)f_Y(y)}{f_X(x)},$ $f_X(x) = \int_{-\infty}^\infty f_{X|Y}(x|y)f_Y(y) dy.$

Proof: 屏幕截图 2023-07-06 004926.png

Finally, let's discuss the definition of independence for continuous r.v.s; then we'll turn to concrete examples. As in the discrete case, we can view independence of continuous r.v.s in two ways. One is that the joint CDF factors into the product of the marginal CDFs, or the joint PDF factors into the product of the marginal PDFs. The other is that the conditional PDF of $Y$ given $X=x$ is the same as the marginal PDF of $Y$ , so conditioning on $X$ provides no information about $Y$ .

Definition: Independence of Continuous r.v.s

Random variables $X$ and $Y$ are independent if for all $x$ and $y$ ,
$F_{X,Y}(x,y) = F_X(x) F_Y(y).$
If $X$ and $Y$ are continuous with joint PDF $f_{X,Y}$ , this is equivalent to the condition
$f_{X,Y}(x,y) = f_X(x) f_Y(y)$
for all $x$ and $y$ , and it is also equivalent to the condition
$f_{Y|X}(y|x) = f_Y(y)$
for all $y$ and all $x$ such that $f_X(x) > 0$ .

Example Comparing Exponentials of Different Rates

Let $T_1 \sim \textrm{Expo}(\lambda_1)$ and $T_2 \sim \textrm{Expo}(\lambda_2)$ be independent. Find $P(T_1 < T_2)$ . For example, $T_1$ could be the lifetime of a refrigerator and $T_2$ could be the lifetime of a stove (if we are willing to assume Exponential distributions for these), and then $P(T_1< T_2)$ is the probability that the refrigerator fails before the stove. We know from Chapter 5 that $\min(T_1,T_2) \sim \textrm{Expo}(\lambda_1 + \lambda_2)$ , which tells us about when the first appliance failure will occur, but we also may want to know about which appliance will fail first.

Solution:

屏幕截图 2023-07-06 161921.png

Joint Distributions and Conditional Expectation

Joint, marginal, and conditional distributions

Discrete

Continuous

Example Comparing Exponentials of Different Rates