Joint, marginal, and conditional distributions
So far we have been focusing on the distribution of one random variable at a time, but very often we care about the relationship between multiple r.v.s in the same experiment. To give just a few examples:
- Surveys: When conducting a survey, we may ask multiple questions to each respondent in order to determine the relationship between, say, opinions on social issues and opinions on economic issues.
- Medicine: To evaluate the effectiveness of a treatment, we may take multiple measurements per patient; an ensemble of blood pressure, heart rate, and cholesterol readings can be more informative than any of these measurements considered separately.
- Genetics: To study the relationships between various genetic markers and a particular disease, if we only looked separately at distributions for each genetic marker, we could fail to learn about whether an interaction between markers is related to the disease.
- Time series: To study how something evolves over time, we can often make a series of measurements over time, and then study the series jointly. There are many applications of such series, such as global temperatures, stock prices, or national unemployment rates. The series of measurements considered jointly can help us deduce trends for the purpose of forecasting future measurements.
This unit considers joint distributions, also called multivariate distributions, which describe how multiple r.v.s interact with each other. We introduce multivariate analogs of the CDF, PMF, and PDF in order to provide a complete specification of the relationship between multiple r.v.s. After this groundwork is in place, we'll study a couple of famous named multivariate distributions, generalizing the Binomial and Normal distributions to higher dimensions.
The three key concepts for this section are joint, marginal, and conditional distributions. Recall that the distribution of a single r.v. provides complete information about the probability of falling into any subset of the real line. Analogously, the joint distribution of two r.v.s and provides complete information about the probability of the vector falling into any subset of the plane. The marginal distribution of is the individual distribution of , ignoring the value of , and the conditional distribution of given is the updated distribution for after observing . We'll look at these concepts in the discrete case first, then extend them to the continuous case.
Discrete
The most general description of the joint distribution of two r.v.s is the joint CDF, which applies to discrete and continuous r.v.s alike.
Definition: Joint CDF
The joint CDF of r.v.s and is the function given by
The joint CDF of r.v.s is defined analogously.
Unfortunately, the joint CDF of discrete r.v.s is not a well-behaved function; as in the univariate case, it consists of jumps and flat regions. For this reason, with discrete r.v.s we usually work with the joint PMF, which also determines the joint distribution and is much easier to visualize.
Definition: Joint PMF
The joint PMF of discrete r.v.s and is the function given by
The joint PMF of discrete r.v.s is defined analogously.
Just as univariate PMFs must be nonnegative and sum to 1, we require valid joint PMFs to be nonnegative and sum to 1, where the sum is taken over all possible values of and :
The following figure shows a sketch of what the joint PMF of two discrete r.v.s could look like. The height of a vertical bar at represents the probability . For the joint PMF to be valid, the total height of the vertical bars must be 1.
From the joint distribution of and , we can get the distribution of alone by summing over the possible values of . This gives us the familiar PMF of that we have seen in previous chapters. In the context of joint distributions, we will call it the marginal or unconditional distribution of , to make it clear that we are referring to the distribution of alone, without regard for the value of .
Definition: Marginal PMF
For discrete r.v.s and , the marginal PMF of is
The marginal PMF of is the PMF of , viewing individually rather than jointly with . The above equation follows from the axioms of probability (we are summing over disjoint cases). The operation of summing over the possible values of in order to convert the joint PMF into the marginal PMF of is known as marginalizing out .
Similarly, the marginal PMF of is obtained by summing over all possible values of . So given the joint PMF, we can marginalize out to get the PMF of , or marginalize out to get the PMF of . But if we only know the marginal PMFs of and , there is no way to recover the joint PMF without further assumptions.
Now suppose that we observe the value of and want to update our distribution of to reflect this information. Instead of using the marginal PMF , which does not take into account any information about , we should use a PMF that conditions on the event , where is the value we observed for . This naturally leads us to consider conditional PMFs.
Definition: Conditional PMF
For discrete r.v.s and , the conditional PMF of given is
This is viewed as a function of for fixed .
The following figure illustrates the definition of conditional PMF. To condition on the event , we first take the joint PMF and focus in on the vertical bars where takes on the value ; in the figure, these are shown in bold. All of the other vertical bars are irrelevant because they are inconsistent with the knowledge that occurred. Since the total height of the bold bars is the marginal probability , we then renormalize the conditional PMF by dividing by ; this ensures that the conditional PMF will sum to . Therefore conditional PMFs are PMFs, just as conditional probabilities are probabilities. Notice that there is a different conditional PMF of for every possible value of ; the following figure highlights just one of these conditional PMFs.
Definition: Independence of Discrete r.v.s
Random variables and are independent if for all and ,
If and are discrete, this is equivalent to the condition
for all and , and it is also equivalent to the condition
for all and all such that .
Using the terminology from this chapter, the definition says that for independent r.v.s, the joint CDF factors into the product of the marginal CDFs, or that the joint PMF factors into the product of the marginal PMFs. Remember that in general, the marginal distributions do not determine the joint distribution: this is the entire reason why we wanted to study joint distributions in the first place! But in the special case of independence, the marginal distributions are all we need in order to specify the joint distribution; we can get the joint PMF by multiplying the marginal PMFs.
Another way of looking at independence is that all the conditional PMFs are the same as the marginal PMF. In other words, starting with the marginal PMF of , no updating is necessary when we condition on , regardless of what is. There is no event purely involving that influences our distribution of , and vice versa.
Continuous
Once we have a handle on discrete joint distributions, it isn't much harder to consider continuous joint distributions. We simply make the now-familiar substitutions of integrals for sums and PDFs for PMFs, remembering that the probability of any individual point is now 0.
Formally, in order for and to have a continuous joint distribution, we require that the joint CDF
be differentiable with respect to and . The partial derivative with respect to and is called the joint PDF. The joint PDF determines the joint distribution, as does the joint CDF.
Definition: Joint PDF
If and are continuous with joint CDF , their* joint PDF* is the derivative of the joint CDF with respect to and :
We require valid joint PDFs to be nonnegative and integrate to 1:
In the univariate case, the PDF was the function we integrated to get the probability of an interval. Similarly, the joint PDF of two r.v.s is the function we integrate to get the probability of a two-dimensional region.
The following figure shows a sketch of what a joint PDF of two r.v.s could look like. As usual with continuous r.v.s, we need to keep in mind that the height of the surface at a single point does not represent a probability. The probability of any specific point in the plane is 0; furthermore, now that we've gone up a dimension, the probability of any line or curve in the plane is also 0. The only way we can get nonzero probability is by integrating over a region of positive area in the -plane.
When we integrate the joint PDF over an area , what we are calculating is the volume under the surface of the joint PDF and above . Thus, probability is represented by volume under the joint PDF. The total volume under a valid joint PDF is 1.
In the discrete case, we get the marginal PMF of by summing over all possible values of in the joint PMF. In the continuous case, we get the marginal PDF of by integrating over all possible values of in the joint PDF.
Definition: Marginal PDF
For continuous r.v.s and with joint PDF , the marginal PDF of is
This is the PDF of , viewing individually rather than jointly with .
To simplify notation, we have mainly been looking at the joint distribution of two r.v.s rather than r.v.s, but marginalization works analogously with any number of variables. For example, if we have the joint PDF of but want the joint PDF of , we just have to integrate over all possible values of and :
Conceptually this is very easy---just integrate over the unwanted variables to get the joint PDF of the wanted variables---but computing the integral may or may not be difficult. Returning to the case of the joint distribution of two r.v.s and , let's consider how to update our distribution for after observing the value of , using the conditional PDF.
Definition: Conditional PDF
For continuous r.v.s and with joint PDF , the conditional PDF of given is
This is considered as a function of for fixed .
Note that we can recover the joint PDF if we have the conditional PDF and the corresponding marginal :
Similarly, we can recover the joint PDF if we have and :
This allows us to develop continuous analogs of Bayes' rule and LOTP. These formulas still hold in the continuous case, replacing probabilities with probability density functions.
Theorem: Continuous Form of Bayes' Rule and LOTP
For continuous r.v.s and ,
Proof:
Finally, let's discuss the definition of independence for continuous r.v.s; then we'll turn to concrete examples. As in the discrete case, we can view independence of continuous r.v.s in two ways. One is that the joint CDF factors into the product of the marginal CDFs, or the joint PDF factors into the product of the marginal PDFs. The other is that the conditional PDF of given is the same as the marginal PDF of , so conditioning on provides no information about .
Definition: Independence of Continuous r.v.s
Random variables and are independent if for all and ,
If and are continuous with joint PDF , this is equivalent to the condition
for all and , and it is also equivalent to the condition
for all and all such that .
Example Comparing Exponentials of Different Rates
Let and be independent. Find . For example, could be the lifetime of a refrigerator and could be the lifetime of a stove (if we are willing to assume Exponential distributions for these), and then is the probability that the refrigerator fails before the stove. We know from Chapter 5 that , which tells us about when the first appliance failure will occur, but we also may want to know about which appliance will fail first.
Solution: