Discrete Uniform

A very simple story, closely connected to the naive definition of probability, describes picking a random number from some finite set of possibilities.

Story Discrete Uniform Distribution

Let $C$ be a finite, nonempty set of numbers. Choose one of these numbers uniformly at random (i.e., all values in $C$ are equally likely). Call the chosen number $X$ . Then $X$ is said to have the Discrete Uniform distribution with parameter $C$ ; we denote this by $X \sim \textrm{DUnif}(C)$ . The PMF of $X \sim \textrm{DUnif}(C)$ is

P(X=x)=\frac{1}{|C|}

for $x \in C$ (and $0$ otherwise), since a PMF must sum to $1$ . As with questions based on the naive definition of probability, questions based on a Discrete Uniform distribution reduce to counting problems. Specifically, for $X \sim \textrm{DUnif}(C)$ and any $A \subseteq C$ , we have

P(X \in A) = \frac{|A|}{|C|}.

Example Random Slips of Paper

There are 100 slips of paper in a hat, each of which has one of the numbers $1, 2, \dots, 100$ written on it, with no number appearing more than once. Five of the slips are drawn, one at a time. First consider random sampling with replacement (with equal probabilities).

(a) What is the distribution of how many of the drawn slips have a value of at least $80$ written on them?

(b) What is the distribution of the value of the j-th draw (for $1 \leq j \leq 5$ )?

Now consider random sampling without replacement (with all sets of five slips equally likely to be chosen).

(d) What is the distribution of how many of the drawn slips have a value of at least 80 written on them?

(e) What is the distribution of the value of the j-th draw (for $1 \leq j \leq 5$ )?

(f) What is the probability that the number 100 is drawn at least once?

Solution:

(a) By the story of the Binomial, the distribution is $Bin(5, 0.21)$ .

(b) Let $X_j$ be the value of the jth draw. By symmetry, $X_j \sim \textrm{DUnif}(1,2,\dots,100)$ .

P(X_j=100 \textrm{ for at least one j}) = 1-P(X_1 \neq 100, \dots, X_5 \neq 100).

By the naive definition of probability, this is

1 - (99/100)^5 \approx 0.049.

(d) By the story of the Hypergeometric, the distribution is $\textrm{HGeom}(21,79,5)$ .

(e) Let $Y_j$ be the value of the jth draw. By symmetry, $Y_j \sim \textrm{DUnif}(1,2,\dots,100)$ . Here learning any $Y_i$ gives information about the other values (so $Y_i, \dots, Y_5$ are not independent), but symmetry still holds since, unconditionally, the jth slip drawn is equally likely to be any of the slips.

(f) The events $Y_1=100,\dots,Y_5=100$ are disjoint since we are now sampling without replacement, so

P(Y_j=100 \textrm{ for some \(j\)}) = P(Y_1=100)+\dots+P(Y_5=100)=0.05.

Cumulative Distribution Functions

Another function that describes the distribution of an r.v. is the cumulative distribution function (CDF). Unlike the PMF, which only discrete r.v.s possess, the CDF is defined for all r.v.s.

Definition

The cumulative distribution function (CDF) of an r.v. $X$ is the function $F_X$ given by $F_X(x) = P(X \leq x)$ . When there is no risk of ambiguity, we sometimes drop the subscript and just write $F$ (or some other letter) for a CDF.

The next example demonstrates that for discrete r.v.s, we can freely convert between CDF and PMF.

Example

Let $X \sim \textrm{Bin}(4,1/2)$ . The following figure shows the PMF and CDF of $X$ .

From PMF to CDF: To find $P(X \leq 1.5)$ , which is the CDF evaluated at 1.5, we sum the PMF over all values of the support that are less than or equal to 1.5:
$P(X \leq 1.5) = P(X=0) + P(X=1) = \left(\frac{1}{2}\right)^4 + 4\left(\frac{1}{2}\right)^4 = \frac{5}{16}.$
Similarly, the value of the CDF at an arbitrary point $x$ is the sum of the heights of the vertical bars of the PMF at values less than or equal to $x$ .
From CDF to PMF: The CDF of a discrete r.v. consists of jumps and flat regions. The height of a jump in the CDF at $x$ is equal to the value of the PMF at $x$ . For example, in the above figure, the height of the jump in the CDF at 2 is the same as the height of the corresponding vertical bar in the PMF; this is indicated in the figure with curly braces. The flat regions of the CDF correspond to values outside the support of $X$ , so the PMF is equal to 0 in those regions.

Valid CDFs satisfy the following criteria.

Theorem: Valid CDFs

Any CDF $F$ has the following properties.

Increasing: If $x_1 \leq x_2$ , then $F(x_1) \leq F(x_2)$ .

Right-continuous: As in the above figure, the CDF is continuous except possibly for having some jumps. Wherever there is a jump, the CDF is continuous from the right. That is, for any $a$ , we have $F(a) = \lim_{x \to a^+} F(x)$ .

Convergence to and in the limits: $\lim_{x \to -\infty} F(x) = 0 \textrm{ and }\lim_{x \to \infty} F(x) = 1.$

The converse is true too: it turns out that given any function meeting these criteria, we can construct a random variable whose CDF is $F$ . To recap, we have now seen three equivalent ways of expressing the distribution of a random variable. Two of these are the PMF and the CDF: we know these two functions contain the same information, since we can always figure out the CDF from the PMF and vice versa. Generally the PMF is easier to work with for discrete r.v.s, since evaluating the CDF requires a summation. A third way to describe a distribution is with a story that explains (in a precise way) how the distribution can arise. We used the stories of the Binomial and Hypergeometric distributions to derive the corresponding PMFs. Thus the story and the PMF also contain the same information, though we can often achieve more intuitive proofs with the story than with PMF calculations.

Discrete Random Variables 3

Discrete Uniform

Story Discrete Uniform Distribution

Example Random Slips of Paper

Cumulative Distribution Functions

Example