Random variables and distributions are among the most useful concepts in all of probability and statistics. This unit introduces discrete random variables and distributions, and the next unit introduces continuous random variables and distributions.

Random Variables

Random variables are an incredibly useful concept that simplifies notation and expands our ability to quantify uncertainty and summarize the results of experiments. Random variables are essential throughout statistics, so it is crucial to think through what they mean, both intuitively and mathematically.

Sometimes a definition of ''random variable" (r.v.) is given that is a barely paraphrased version of "a random variable is a variable that takes on random values", but this fails to say where the randomness come from. To make the notion of random variable precise, we define it as a function mapping the sample space to the real line.

Definition: Random Variable

Given an experiment with sample space $S$ , a random variable (r.v.) is a function from the sample space $S$ to the real numbers $\mathbb{R}$ . It is common, but not required, to denote random variables by capital letters.

Thus, a random variable $X$ assigns a numerical value $X(s)$ to each possible outcome $s$ of the experiment. The randomness comes from the fact that we have a random experiment (with probabilities described by the probability function $P$ ); the mapping itself is deterministic.

This definition is abstract but fundamental; one of the most important skills to develop when studying probability and statistics is the ability to go back and forth between abstract ideas and concrete examples. Relatedly, it is important to work on recognizing the essential pattern or structure of a problem and how it connects to problems you have studied previously. We will often discuss stories that involve tossing coins or drawing balls from urns because they are simple, convenient scenarios to work with, but many other problems are isomorphic: they have the same essential structure, but in a different guise.

To start, let's consider a coin-tossing example. The structure of the problem is that we have a sequence of trials where there are two possible outcomes for each trial. Here we think of the possible outcomes as $H$ (Heads) and $T$ (Tails), but we could just as well think of them as ''success" and ''failure" or as 1 and 0, for example.

Example Coin Tosses

Consider an experiment where we toss a fair coin twice. The sample space consists of four possible outcomes: $S=\{HH, HT, TH, TT\}.$ Here are some random variables on this space (for practice, you can think up some of your own). Each r.v. is a numerical summary of some aspect of the experiment.

捕获.JPG

We can also encode the sample space as $\{(1,1),(1,0),(0,1),(0,0)\}$ , where $1$ is the code for Heads and $0$ is the code for Tails. Then we can give explicit formulas for $X, Y, I$ :

X(s_1,s_2)= s_1+s_2, Y(s_1,s_2)=2-s_1-s_2, I(s_1,s_2)=s_1,

where for simplicity we write $X(s_1, s_2)$ to mean $X((s_1, s_2))$ , etc. For most r.v.s we will consider, it is tedious or infeasible to write down an explicit formula in this way. Fortunately, it is usually unnecessary to do so. As before, for a sample space with a finite number of outcomes we can visualize the outcomes as pebbles, with the mass of a pebble corresponding to its probability, such that the total mass of the pebbles is 1. A random variable simply labels each pebble with a number. The following figure shows two random variables defined on the same sample space: the pebbles or outcomes are the same, but the real numbers assigned to the outcomes are different.

Before we perform the experiment, the outcome $s$ has not yet been realized, so we don't know the value of $X$ , though we could calculate the probability that $X$ will take on a given value or range of values. After we perform the experiment and the outcome $s$ has been realized, the random variable crystallizes into the numerical value $X(S)$ . In this way, random variables provide numerical summaries of the experiment in question.

Discrete Random Variable and Probability Mass Functions

Definition: Discrete Random Variable

A random variable $X$ is said to be discrete if there is a finite list of values $a_1,a_2, \dots ,a_n$ or an infinite list of values $a_1,a_2, \dots$ such that $P(X=a_j \textrm{ for some } j) = 1$ . If $X$ is a discrete r.v., then the finite or countably infinite set of values $x$ such that $P(X = x) > 0$ is called the support of X.

Most commonly in applications, the support of a discrete r.v. is a set of integers. In contrast, a continuous r.v. can take on any real value in an interval (possibly even the entire real line); such r.v.s are defined more precisely in Unit 4. It is also possible to have an r.v. that is a hybrid of discrete and continuous, such as by flipping a coin and then generating a discrete r.v. if the coin lands Heads and generating a continuous r.v. if the coin lands Tails. For example, imagine that a customer in a store flips a coin to decide whether to make a purchase. If the coin lands Heads, the customer doesn't buy anything; if Tails, the customer spends some random positive real amount of money. But the starting point for understanding such r.v.s is to understand discrete and continuous r.v.s.

Given a random variable, we would like to be able to describe its behavior using the language of probability. For example, we might want to answer questions about the probability that the r.v. will fall into a given range: if $L$ is the lifetime earnings of a randomly chosen U.S. college graduate, what is the probability that $L$ exceeds a million dollars? If $M$ is the number of major earthquakes in California in the next five years, what is the probability that $M$ equals 0? The distribution of a random variable provides the answers to these questions; it specifies the probabilities of all events associated with the r.v., such as the probability of it equaling 3 and the probability of it being at least 110. We will see that there are several equivalent ways to express the distribution of an r.v. For a discrete r.v., the most natural way to do so is with a probability mass function, which we now define.

Definition: Probability Mass Function

The probability mass function (PMF) of a discrete r.v. $X$ is the function $p_X$ given by $p_X(x) = P(X=x)$ . Note that this is positive if $x$ is in the support of X, and $0$ otherwise. Here $X = x$ denotes an event, consisting of all outcomes $s$ to which $X$ assigns the number $x$ .

Let's look at a few examples of PMFs.

Example Coin Tosses Continued

In this example we'll find the PMFs of all the random variables in Example Coin Tosses, the example with two fair coin tosses. Here are the r.v.s we defined, along with their PMFs:

捕获.JPG

The PMFs of $X$ , $Y$ , and $I$ are plotted in the above figur. Vertical bars are drawn to make it easier to compare the heights of different points.

We will now state the properties of a valid PMF.

Theorem: Valid PMFs

Let $X$ be a discrete r.v. with support $x_1,x_2,\dots$ (assume these values are distinct and, for notational simplicity, that the support is countably infinite; the analogous results hold if the support is finite).

The PMF $p_x$ of $X$ must satisfy the following two criteria:

Nonnegative: $p_X(x) > 0$ if $x = x_j$ for some $j$ , and $p_X(x) = 0$ otherwise;

Sums to 1: $\sum_{j=1}^\infty p_X(x_j) = 1$ .

Proof: The first criterion is true since probability is nonnegative. The second is true since $X$ must take on some value, and the events $\{X=x_j\}$ are disjoint, so

\sum_{j=1}^\infty P(X=x_j) = P \left(\bigcup_{j=1}^\infty \{X=x_j\} \right) = P(X=x_1 \textrm{ or } X=x_2 \textrm{ or } \dots) = 1.

Conversely, if distinct values $x_1, x_2, \dots$ are specified and we have a function satisfying the two criteria above, then this function is the PMF of some r.v.

The PMF is one way of expressing the distribution of a discrete r.v. This is because once we know the PMF of $X$ , we can calculate the probability that $X$ will fall into a given subset of the real numbers by summing over the appropriate values of $x$ . Given a discrete r.v. $X$ and a set $B$ of real numbers, if we know the PMF of $X$ we can find $P(X \in B)$ , the probability that $X$ is in $B$ , by summing up the heights of the vertical bars at points in $B$ in the plot of the PMF of $X$ . Knowing the PMF of a discrete r.v. determines its distribution.

Example Poisson Distribution

An r.v. $X$ has the Poisson distribution with parameter $\lambda$ , where $\lambda > 0$ , if the PMF of $X$ is

P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k=0, 1, 2, \dots.

We write this as $X \sim \textrm{Pois}(\lambda)$ . The Poisson is one of the most widely used distributions in all of statistics, and is a very common choice of model (or building block for more complicated models) for data that counts the number of occurences of some kind. The Poisson is discussed in much more detail in Unit 5. The Poisson also arises through the Poisson process, a model that is used in a wide variety of problems in which events occur at random points in time. Poisson processes are introduced in Unit 4.

Discrete Random Variables 1

Random Variables

Example Coin Tosses

Discrete Random Variable and Probability Mass Functions

Example Coin Tosses Continued

Example Poisson Distribution