next up previous
Next: About this document ...

Lecture 4: Independence
If you ask the average person to calculate the probability that two consecutive tosses of a fair die will both yield a 1 you will most likely get the response that since the probability that each toss yields a 1 is 1/6 that the probability of two consecutive 1's is (1/6)(1/6) = 1/36. The reasoning is that since probabilities somehow are fractions of success to tries, the first try reduces the outcomes by (1/6), so the probability of two consecutive 1's is 1/6 of 1/6, which is 1/36. If you then suggest that somehow the dice are rigged so that the outcome of the first die somehow influences that of the second, then you will be told that the previous calculation will not hold.

In probability theory this lack of influence is called independence or sometimes statistical independence (to distinguish it from independence in linear algebra). The precise definition is that the events A and B are independent if and only if

\begin{displaymath}
\Pr(A\cap B) = \Pr(A)\Pr(B).\end{displaymath}

It is important to note that since a probability model is given by specifying both the set of events and the probability measure, independence depends on how probability is assigned. For example:
Model 1:
$\Omega = \{(HH), (HT), (TH), (TT)\}$, the sigma field is all subsets of $\Omega$, and the probability measure is given by the rule that all elements of the sample space are equally likely.
Model 2:
$\Omega = \{(HH), (HT), (TH), (TT)\}$, the sigma field is all subsets of $\Omega$, and the probability measure is given by the rule that (HH) is twice as likely as the other outcomes, and the other outcomes are equally likely.
Model 3:
$\Omega = \{(HH), (HT), (TH), (TT)\}$, the sigma field is all subsets of $\Omega$, and the probability measure is given by the rule that (HH) has probability 1/9, (HT) and (TH) have probability 2/9 and (TT) has probability 4/9.
We recognise that each of these is a model for tossing two coins, so we will use that language to describe two events: A is the event of a head on the first coin, and B is the event of a head on the second coin.

In Model 1, A and B have two elements and each has probability 1/2, while the intersection of A and B has one element and has probability 1/4. Since (1/2)(1/2) = 1/4, we may conclude that in Model 1 that A and B are independent.

In Model 2, the outcome (HH) has probability 2/5, and the others probability 1/5. Hence A and B have probability 3/5, and A and B are not independent since the intersection of A and B has probability 2/5 and

\begin{displaymath}
\frac{3}{5}\times\frac{3}{5}\neq \frac{2}{5}\end{displaymath}

In Model 3 we can also check that the events A and B are independent.

Thus independence can be used as a tool in model building, insofar as some models can be rejected if in those models events that we believe should be independent are not independent in the model. In the preceding, if we believed that the tosses of the coin did not influence one another we would reject Model 2.

Here are some useful observations about independent events.

It is important to generalize the concept of independence to more that pairs of sets. A collection of events $\{A_x, x\in X\}$ is said to be mutually independent if for every non-empty finite subset $Y\subset X$ we have

\begin{displaymath}
\Pr\left(\bigcap_{x\in Y}A_x\right) = \prod_{x\in Y}\Pr(A_x)\end{displaymath}

If the previous equation only holds for sets Y of two elements, we say that the collection of events is pairwise independent. It is a standard exercise to construct an example in which a collection of three events is pairwise independent and not independent.
Two important models
What follows are two important models. Their importance stems not only from their immediate use but also the basic principles they exhibit.
Binomial Model:
The motivation is to study N consecutive tosses of a coin. The sample space is the set of all sequences of length N of the two symbols H and T. We will assume that the tosses are independent, so the probability of any outcome with k H's and N-k T's is

pk(1-p)N-k.

The related statistics question is: Suppose that this experiment is performed and x heads are observed. What is a good estimate of p?

Notice that x can be any integer from 0 to N. In terms of p the probability that x is observed is the probability of a sequence with x H's and N-x T's. There are NCx such sequences so the probability that exactly x heads is observed is

\begin{displaymath}
p(N,x,p)\equiv\mbox{$\left(\begin{array}
{c}N\ x\end{array}\right)$}p^x(1-p)^{N-x}.\end{displaymath}

From this and the binomial theorem we see this is a legitimate assignment of a probability measure.

One procedure for determining an estimate for p is to find that value of p which maximizes p(N,x,p). This is called the Maximum Likelihood Estimate (MLE) of p. Here it is a simple calculus problem to determine that the MLE is x/N.

Notice that this is the average number of heads per toss.

Negative Binomial Model:
This time we want to toss a coin until the N heads are obtained. Now the sample space consists of sequences of H's and T's of any length so long as We want to assume independence of the tosses and assign a probability to each sequence in the same way as in the binomial model: If there are K T's and N H's then the sequence has probability pN(1-p)K. Here K can be any non-negative integer. If we let AK be the event that there are K T's then AK has N+K-1CN-1 members so

\begin{displaymath}
\Pr(A_K) = \mbox{$\left(\begin{array}
{c}N+K-1\ N-1\end{array}\right)$}p^N(1-p)^K\end{displaymath}

(The special case where N = 1 is called the geometric probability model.)

Now it is not so clear that we have a legitimate probabability assignment because it is not clear that in the model the probability of the sample space is 1. However, it is not too difficult to show that

\begin{displaymath}
\sum_{k=0}^\infty \mbox{$\left(\begin{array}
{c}N+k-1\ N-1\end{array}\right)$}p^N(1-p)^k = 1\end{displaymath}

by using Newton's binomial theorem (try the expansion of (1-x)-N), since

\begin{displaymath}
\mbox{$\left(\begin{array}
{c}N+k-1\ N-1\end{array}\right)$...
 ...(-1)^k\mbox{$\left(\begin{array}
{c}-N\ k\end{array}\right)$}.\end{displaymath}

The related statistics question is if x tosses are required to obtain the exactly N heads, what is a good estimate of p? Again, we can use calculus to compute the MLE and we find that since x denotes the total number of tosses then k = x - N, and the MLE is N/x, the average number of heads per toss. The difference this time is that the number of tosses is what is random, not the number of heads.
A general model for independent repeated trials
Suppose that N identical and independent replications of an experiment are to be modeled. For arguements sake, suppose that the outcomes of a single trial are non-negative integers. We can take as the sample space all N-tuples of non-negative integers and the sigma field the set of all subsets of the sample space. To assign probabilities, we let pk = the probability that k is observed on a single trial, so that $p_k \geq 0$ and

\begin{displaymath}
\sum_{k=0}^\infty p_k = 1.\end{displaymath}

We then assign probabilities to all the sample points by the rule

\begin{displaymath}
\Pr(\{(k_1,\dots,k_N)\}) = p_{k_1}\cdots p_{k_N}\end{displaymath}

It is straightforward to check that these probabilities add to 1 and that events determined by individual trials are independent. For example, if N = 4, A =6 is observed on trial 1 and B = 3 is observed on trial 2 then

\begin{displaymath}
A = \{(6,x,y,z): x, y, z \in \{0,1,\dots\}\},\;\;\;
B = \{(w,3,y,z): w, y, z \in \{0,1,\dots\}\},\;\;\;\end{displaymath}

and

\begin{displaymath}
A\cap B = \{(6,3,y,z): y, z \in \{0,1,\dots\}\}.\end{displaymath}

Therefore,

\begin{displaymath}
\Pr(A) = \sum_{x,y,z}p_6p_xp_yp_z = p_6,\;\;\;\;
\Pr(B) = \sum_{w,y,z}p_wp_3p_yp_z = p_3,\end{displaymath}

and

\begin{displaymath}
\Pr(A\cap B) = \sum_{y,z}p_6p_3p_yp_z = p_6p_3,\;\;\;\;\end{displaymath}

so A and B are independent.

For a specifice case, suppose that $\lambda \geq 0$ and

\begin{displaymath}
p_k = \frac{\lambda^k\exp(-\lambda)}{k!}.\end{displaymath}

Then

\begin{displaymath}
f(x_1,\dots,x_N,\lambda)\equiv\Pr(\{(x_1,\dots,x_N)\}) 
= 
\...
 ...{\lambda^{x_1 + \cdots + x_N}\exp(-N\lambda)}{x_1!\cdots x_N!}.\end{displaymath}

If the $(x_1,\dots,x_N)$ represents the data collected from the actual trials, then the MLE for $\lambda$, $\lambda_{MLE}$, is obtained by maximizing $f(x_1,\dots,x_N,\lambda)$ as a function of $\lambda$. It is a simple calculus problem to determine that

\begin{displaymath}
\lambda_{MLE} = \frac{x_1 + \cdots + x_N}{N}\end{displaymath}



 
next up previous
Next: About this document ...
Eric S Key
9/16/1998