Discrete Random Variables 1

156 阅读5分钟

Random variables and distributions are among the most useful concepts in all of probability and statistics. This unit introduces discrete random variables and distributions, and the next unit introduces continuous random variables and distributions.

Random Variables

Random variables are an incredibly useful concept that simplifies notation and expands our ability to quantify uncertainty and summarize the results of experiments. Random variables are essential throughout statistics, so it is crucial to think through what they mean, both intuitively and mathematically.

Sometimes a definition of ''random variable" (r.v.) is given that is a barely paraphrased version of "a random variable is a variable that takes on random values", but this fails to say where the randomness come from. To make the notion of random variable precise, we define it as a function mapping the sample space to the real line.

3_rv_diagram.png

Definition: Random Variable

Given an experiment with sample space SS, a random variable (r.v.) is a function from the sample space SS to the real numbers R\mathbb{R}. It is common, but not required, to denote random variables by capital letters.

Thus, a random variable XX assigns a numerical value X(s)X(s) to each possible outcome ss of the experiment. The randomness comes from the fact that we have a random experiment (with probabilities described by the probability function PP); the mapping itself is deterministic.

This definition is abstract but fundamental; one of the most important skills to develop when studying probability and statistics is the ability to go back and forth between abstract ideas and concrete examples. Relatedly, it is important to work on recognizing the essential pattern or structure of a problem and how it connects to problems you have studied previously. We will often discuss stories that involve tossing coins or drawing balls from urns because they are simple, convenient scenarios to work with, but many other problems are isomorphic: they have the same essential structure, but in a different guise.

To start, let's consider a coin-tossing example. The structure of the problem is that we have a sequence of trials where there are two possible outcomes for each trial. Here we think of the possible outcomes as HH (Heads) and TT (Tails), but we could just as well think of them as ''success" and ''failure" or as 1 and 0, for example.

Example Coin Tosses

Consider an experiment where we toss a fair coin twice. The sample space consists of four possible outcomes: S={HH,HT,TH,TT}.S=\{HH, HT, TH, TT\}. Here are some random variables on this space (for practice, you can think up some of your own). Each r.v. is a numerical summary of some aspect of the experiment.

捕获.JPG

We can also encode the sample space as {(1,1),(1,0),(0,1),(0,0)}\{(1,1),(1,0),(0,1),(0,0)\}, where 11 is the code for Heads and 00 is the code for Tails. Then we can give explicit formulas for X,Y,IX, Y, I:

X(s1,s2)=s1+s2,Y(s1,s2)=2s1s2,I(s1,s2)=s1,X(s_1,s_2)= s_1+s_2, Y(s_1,s_2)=2-s_1-s_2, I(s_1,s_2)=s_1,

where for simplicity we write X(s1,s2)X(s_1, s_2) to mean X((s1,s2))X((s_1, s_2)), etc. For most r.v.s we will consider, it is tedious or infeasible to write down an explicit formula in this way. Fortunately, it is usually unnecessary to do so. As before, for a sample space with a finite number of outcomes we can visualize the outcomes as pebbles, with the mass of a pebble corresponding to its probability, such that the total mass of the pebbles is 1. A random variable simply labels each pebble with a number. The following figure shows two random variables defined on the same sample space: the pebbles or outcomes are the same, but the real numbers assigned to the outcomes are different.

3_rv12.png

Before we perform the experiment, the outcome ss has not yet been realized, so we don't know the value of XX, though we could calculate the probability that XX will take on a given value or range of values. After we perform the experiment and the outcome ss has been realized, the random variable crystallizes into the numerical value X(S)X(S). In this way, random variables provide numerical summaries of the experiment in question.

Discrete Random Variable and Probability Mass Functions

Definition: Discrete Random Variable

A random variable XX is said to be discrete if there is a finite list of values a1,a2,,ana_1,a_2, \dots ,a_n or an infinite list of values a1,a2,a_1,a_2, \dots such that P(X=aj for some j)=1P(X=a_j \textrm{ for some } j) = 1. If XX is a discrete r.v., then the finite or countably infinite set of values xx such that P(X=x)>0P(X = x) > 0 is called the support of X.

Most commonly in applications, the support of a discrete r.v. is a set of integers. In contrast, a continuous r.v. can take on any real value in an interval (possibly even the entire real line); such r.v.s are defined more precisely in Unit 4. It is also possible to have an r.v. that is a hybrid of discrete and continuous, such as by flipping a coin and then generating a discrete r.v. if the coin lands Heads and generating a continuous r.v. if the coin lands Tails. For example, imagine that a customer in a store flips a coin to decide whether to make a purchase. If the coin lands Heads, the customer doesn't buy anything; if Tails, the customer spends some random positive real amount of money. But the starting point for understanding such r.v.s is to understand discrete and continuous r.v.s.

Given a random variable, we would like to be able to describe its behavior using the language of probability. For example, we might want to answer questions about the probability that the r.v. will fall into a given range: if LL is the lifetime earnings of a randomly chosen U.S. college graduate, what is the probability that LL exceeds a million dollars? If MM is the number of major earthquakes in California in the next five years, what is the probability that MM equals 0? The distribution of a random variable provides the answers to these questions; it specifies the probabilities of all events associated with the r.v., such as the probability of it equaling 3 and the probability of it being at least 110. We will see that there are several equivalent ways to express the distribution of an r.v. For a discrete r.v., the most natural way to do so is with a probability mass function, which we now define.

Definition: Probability Mass Function

The probability mass function (PMF) of a discrete r.v. XX is the function pXp_X given by pX(x)=P(X=x)p_X(x) = P(X=x). Note that this is positive if xx is in the support of X, and 00 otherwise. Here X=xX = x denotes an event, consisting of all outcomes ss to which XX assigns the number xx.

Let's look at a few examples of PMFs.

Example Coin Tosses Continued

In this example we'll find the PMFs of all the random variables in Example Coin Tosses, the example with two fair coin tosses. Here are the r.v.s we defined, along with their PMFs:

捕获.JPG

3_pmfofXYI.png

The PMFs of XX, YY, and II are plotted in the above figur. Vertical bars are drawn to make it easier to compare the heights of different points.

We will now state the properties of a valid PMF.

Theorem: Valid PMFs

Let XX be a discrete r.v. with support x1,x2,x_1,x_2,\dots (assume these values are distinct and, for notational simplicity, that the support is countably infinite; the analogous results hold if the support is finite).

The PMF pxp_x of XX must satisfy the following two criteria:

  • Nonnegative: pX(x)>0p_X(x) > 0 if x=xjx = x_j for some jj, and pX(x)=0p_X(x) = 0 otherwise;
  • Sums to 1: j=1pX(xj)=1\sum_{j=1}^\infty p_X(x_j) = 1.

Proof: The first criterion is true since probability is nonnegative. The second is true since XX must take on some value, and the events {X=xj}\{X=x_j\} are disjoint, so

j=1P(X=xj)=P(j=1{X=xj})=P(X=x1 or X=x2 or )=1.\sum_{j=1}^\infty P(X=x_j) = P \left(\bigcup_{j=1}^\infty \{X=x_j\} \right) = P(X=x_1 \textrm{ or } X=x_2 \textrm{ or } \dots) = 1.

Conversely, if distinct values x1,x2,x_1, x_2, \dots are specified and we have a function satisfying the two criteria above, then this function is the PMF of some r.v.

The PMF is one way of expressing the distribution of a discrete r.v. This is because once we know the PMF of XX, we can calculate the probability that XX will fall into a given subset of the real numbers by summing over the appropriate values of xx. Given a discrete r.v. XX and a set BB of real numbers, if we know the PMF of XX we can find P(XB)P(X \in B) , the probability that XX is in BB, by summing up the heights of the vertical bars at points in BB in the plot of the PMF of XX. Knowing the PMF of a discrete r.v. determines its distribution.

Example Poisson Distribution

An r.v. XX has the Poisson distribution with parameter λ\lambda, where λ>0\lambda > 0, if the PMF of XX is

P(X=k)=eλλkk!,k=0,1,2,.P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k=0, 1, 2, \dots.

We write this as XPois(λ)X \sim \textrm{Pois}(\lambda). The Poisson is one of the most widely used distributions in all of statistics, and is a very common choice of model (or building block for more complicated models) for data that counts the number of occurences of some kind. The Poisson is discussed in much more detail in Unit 5. The Poisson also arises through the Poisson process, a model that is used in a wide variety of problems in which events occur at random points in time. Poisson processes are introduced in Unit 4.