next up previous
Next: About this document ...

Lecture 6: Random variables
Having performed an experiment, be it tossing a coin, drawing card from a deck, or whatever, it is frequently convenient to quantize the outcome, that is, associate to the outcome a number. One might want to count the number of heads obtained, or the number of queens. In other words, one wants to assign a number (sometimes a complex number or a vector) to each outcome. All this means is that one wants a function whose domain is the sample sample space and whose range is a convenient set, like the real numbers, the complex numbers, or perhaps Rn.

For now, let us consider functions which take values in the real numbers. (The other cases are so similar that we will take for granted that their behavior is similar. ) Our goal for using functions is that they can be used to define events. In the coin tossing example, our function might count the number of heads. Call this function R. We can look at the set

\begin{displaymath}
\{R = 6\} \equiv \{\omega\in \Omega: R(\omega) = 6\}.\end{displaymath}

If we have chosen the set of events to contain all subsets of $\Omega$, then this set is an event, and we can ask for the probability of $\{R=6\}$. Usually we get really lazy with the notation, and ask for the probability of R=6. The important thing is that there is a relation between R and the sigma algebra. The precise relation is that if the model is $(\Omega,{\cal F},\Pr)$and $R:\Omega\rightarrow(-\infty,\infty)$, then for every interval ${\cal I}$, 
 \begin{displaymath}
\{R \in {\cal I}\} := \{\omega\in\Omega:R(\omega) \in {\cal I}\}\in {\cal F}\end{displaymath} (1)
so that it makes sense to speak of

\begin{displaymath}
\Pr(R\in {\cal I}) := \Pr(\{R \in {\cal I}\} := 
\Pr(\{\omega\in\Omega:R(\omega) \in {\cal I}\})\end{displaymath}

Functions which satisfy (1) are called (real valued) random variables.

Whether or not a function is a random variable depends on the sigma algebra! You should be aware of this point. Here are two examples to drive home the point. The only difference is in the sigma algebra.

Example 1:
$\Omega = \{1,2,3\}$, $\cal F$ is all subsets of $\Omega$,$\Pr(A) = $ the number of elements of A divided by 3. R(x) = x.
Example 2:
$\Omega = \{1,2,3\}$, ${\cal F} =
\{\Omega,\emptyset,\{1,2\},\{3\}\}$, $\Pr(A) = $ the number of elements of A divided by 3. R(x) = x.
In example 2 R is not a random variable since $R = 2 = \{2\}$ is not in $\cal F$. In example 1 R is a random variable since for every interval ${\cal I}$the set $R \in {\cal I}$ is a subset of $\Omega$, and all subsets of $\Omega$are in $\cal F$.

Unless otherwise mentioned, you may assume that all functions we discuss are random variables!!

Random variables are classified into subtypes. Some important ones to mention at the outset are:

Indicator random variables:
These are random variables which take on only the value 0 or 1, so they ``indicate'' whether or not something has happened. For example if A is an event, we can define a random variable, IA, called the indicator of A by the rule

\begin{displaymath}
I_A(\omega) = \left\{
\begin{array}
{lll}
1 &{\rm if}& \omega \in A\ 0 &{\rm if}& \omega \in A^c\end{array}\right.\end{displaymath}

Simple random variables:
These are random variables whose range contains only a finite number of elements. For example, if we model the result of tossing a coin 10 times, and R counts the number of heads, then R is a simple random variable.

Lattice random variables:
A lattice (in the real numbers) is a set of the form

\begin{displaymath}
\{x\in (-\infty,\infty): x = a + kh,\;k\;{\rm is\;any\;integer}\}.\end{displaymath}

For example, the integers are a lattice, as are all the multiples of 1/3. The set of reciprocals of all integers is not a lattice. A lattice random variable is a random variable whose range lies in a lattice. The lattice random variables we usually will be concerned with are random variables whose range is a subset of the integers.

Note: Not all simple random variables are lattice random variables. Consider, for example, a random variable R whose range is $\{0, 1, \sqrt{2}\}$.

Discrete random variables:
A discrete random variable is one whose range is a set which is countable. The name is unfortunate, for it does not imply that the range is a discrete set. Indicator random variables, simple random variables and lattice random variables are all discrete random variables, but so is a random variable whose range is all of the rational numbers.
There is one important link between general discrete random variables, indicator random variables, and partitions of the sample space: Suppose that R is a discrete random variable whose range is the (countable) set Y. Let

\begin{displaymath}
B_y = \{\omega\in \Omega: R(\omega) = y\}\end{displaymath}

The fact that R is a function whose domain is $\Omega$ guarantees that The fact that R is a random variable means each of the sets By is in the sigma algebra.

Now, if we let Iy be the indicator of the event By we have

\begin{displaymath}
R = \sum_{y\in Y} yI_y\end{displaymath}

Probability Mass Functions
To each random variable R we may assign a function, pR, from the real numbers to the interval [0,1] by the rule

\begin{displaymath}
p_R(x) = \Pr(\{\omega\in \Omega : R(\omega) = x\} \end{displaymath}

If x is not in the domain of R then pR(x) = 0. If x is in the domain of R then pR(x) may or not may not be . pR is called the probability mass function of R. Probability mass functions are important for discrete random variables. In fact, many lattice random variables are named by the form of their probability mass functions.

Expected value

To each discrete random variable, R, with range Y we may attempt to assign a real number, called the expected value of R, denoted E[R], by the rule

\begin{displaymath}
E[R] = \sum_{y\in Y} y\Pr(R = y) = \sum_{y\in Y}yp_R(y).\end{displaymath}

The problematic part of this is that the above mentioned sum might not be convergent. When it is not, we say that R does not have an expected value.

The following theorem is quite useful. It is sometimes called the Law of the Unconscious Statistician.
Suppose that $H:(-\infty,\infty)\longrightarrow(-\infty,\infty)$ is a function and that R is a discrete random variable with probability mass function pR(y) and range Y. Then H(R) is a discrete random variable. Furthermore, if

\begin{displaymath}
\sum_{y\in Y} \vert H(y)\vert\cdot p_R(y) < \infty\end{displaymath}

then E[H(R)] is a real number and

\begin{displaymath}
E[H(R)] = \sum_{y\in Y} H(y)\cdot p_R(y)\end{displaymath}

We shall be most interested in applying this theorem in the case where $\mu_R =
E[R]$ and $H(r) = (r-\mu_R)^2$, that is, to compute the quantity E[(R-E[R])2], which is called the variance of R.

Independent random variables
Two (real valued) random variables R and S are called independent if for any intervals ${\cal I}$ and $\cal J$

\begin{displaymath}
\Pr(R\in {\cal I} \cap S\in {\cal J}) = \Pr(R\in {\cal I})\Pr(S\in {\cal J})\end{displaymath}

Mutually independent sets of random variables are defined analogously to mutually independent collections of sets. An important result is that if R and S are independent discrete random variables and E[R] and E[S] are defined, then so is E[RS] and E[RS] = E[R]E[S]. This follows from the Law of the Unconscious Statistician along with the observation that if R and S are independent then

\begin{displaymath}
p_{(R,S)}(x,y) = \Pr(R = x \cap S = y) = \Pr(R = x)\Pr(S=y) = p_R(x)p_S(y).\end{displaymath}

We simply apply the Law to the vector random variable (R,S) and the function H((R,S)) = RS.
Mean and variance of a sum
Notice that if V is any vector valued random variable with V = (V1,V2), the components of V are also random variables, and if V is discrete if and only if its components are discrete. If we denote the range of the j'th component by Yj, then it follows from the law of total probability that

\begin{displaymath}
\begin{array}
{rcl}
\Pr(V_1 = v_1) & = & \sum_{y\in Y_2}\Pr(...
 ... v_2)\ p_{V_2}(v_2) & = & \sum_{y\in Y_1}p_V(y,v_2)\end{array}\end{displaymath}

It then follows from the Law of the Unconscious Statistician that if E[V1] and E[V2] are real numbers and c is any constant, then

E[V1 + aV2] = E[V1] + cE[V2].

Here we take H(V) = V1 + cV2.

Since any pair of random variables may be regarded as the components of a vector valued random variable, the preceding holds for all pairs of discrete random variables.

If we know that the components are independent with real variances then we get

\begin{displaymath}
\begin{array}
{rcll}
Var[V_1 + V_2]
& = & 
E[((V_1 + V_2) - ...
 ...{\rm Add'n\;prop.\;of\;E}\ &&&{\rm for\;ind.\;rv's}\end{array}\end{displaymath}



 
next up previous
Next: About this document ...
Eric S Key
9/25/1998