Next: About this document ...
Lecture 4: Independence
If you ask the average person to calculate the probability that two consecutive
tosses of a fair die will both yield a 1 you will most likely get the response
that since the probability that each toss yields a 1 is 1/6 that the
probability of two consecutive 1's is (1/6)(1/6) = 1/36. The reasoning is that
since probabilities somehow are fractions of success to tries, the first try
reduces the outcomes by (1/6), so the probability of two consecutive 1's is 1/6
of 1/6, which is 1/36. If you then suggest that somehow the dice are rigged
so that the outcome of the first die somehow influences that of the second,
then you will be told that the previous calculation will not hold.
In probability theory this lack of influence is called independence or
sometimes statistical independence (to distinguish it from independence in
linear algebra). The precise definition is that the events A and B are
independent if and only if

It is important to note that since a probability model is given by specifying
both the set of events and the probability measure, independence depends on how
probability is assigned. For example:
- Model 1:
, the sigma field is all
subsets of
, and the probability measure is given by the rule that all
elements of the sample space are equally likely.
- Model 2:
, the sigma field is all
subsets of
, and the probability measure is given by the rule that (HH)
is twice as likely as the other outcomes, and the other outcomes are equally
likely.
- Model 3:
, the sigma field is all
subsets of
, and the probability measure is given by the rule that (HH)
has probability 1/9, (HT) and (TH) have probability 2/9 and (TT) has
probability 4/9.
We recognise that each of these is a model for tossing two coins, so we will
use that language to describe two events: A is the event of a head on the first
coin, and B is the event of a head on the second coin.
In Model 1, A and B have two elements and each has probability 1/2, while the
intersection of A and B has one element and has probability 1/4. Since
(1/2)(1/2) = 1/4, we may conclude that in Model 1 that A and B are independent.
In Model 2, the outcome (HH) has probability 2/5, and the others probability
1/5. Hence A and B have probability 3/5, and A and B are not independent since
the intersection of A and B has probability 2/5 and

In Model 3 we can also check that the events A and B are independent.
Thus independence can be used as a tool in model building, insofar as some
models can be rejected if in those models events that we believe should be
independent are not independent in the model. In the preceding, if we believed
that the tosses of the coin did not influence one another we would reject Model
2.
Here are some useful observations about independent events.
- Any event that has probability equal to 1 or 0 is independent of any
other event, including itself!
- If A and B are a pair of independent events, A and the complement of B
are independent, as are B and the complement of A, and the complements of A and
B.
- An event and its complement are never independent unless the event has
probability 1 or 0.
It is important to generalize the concept of independence to more that pairs of
sets. A collection of events
is said to be mutually
independent if for every non-empty finite subset
we have

If the previous equation only holds for sets Y of two elements, we say that
the collection of events is pairwise independent. It is a standard
exercise to construct an example in which a collection of three events is
pairwise independent and not independent.
Two important models
What follows are two important models. Their importance stems not only from
their immediate use but also the basic principles they exhibit.
- Binomial Model:
- The motivation is to study N consecutive tosses of a coin. The sample space is
the set of all sequences of length N of the two symbols H and T. We will
assume that the tosses are independent, so the probability of any outcome
with k H's and N-k T's is
pk(1-p)N-k.
The related statistics question is: Suppose that this experiment is performed
and x heads are observed. What is a good estimate of p?
Notice that x can be any integer from 0 to N. In terms of p the probability
that x is observed is the probability of a sequence with x H's and N-x
T's. There are NCx such sequences so the probability that exactly x heads
is observed is

From this and the binomial theorem we see this is a legitimate assignment of
a probability measure.
One procedure for determining an estimate for p is to find that value of p
which maximizes p(N,x,p). This is called the Maximum Likelihood Estimate
(MLE) of p. Here it is a simple calculus problem to determine that the MLE is
x/N.
Notice that this is the average number of heads per toss.
- Negative Binomial Model:
- This time we want to toss a coin until the N heads are obtained. Now the sample
space consists of sequences of H's and T's of any length so long as
- The last symbol in the sequence is an H.
- There are exactly N H's the sequence.
We want to assume independence of the tosses and assign a probability to each
sequence in the same way as in the binomial model: If there are K T's and N
H's then the sequence has probability pN(1-p)K. Here K can be any
non-negative integer. If we let AK be the event that there are K T's then
AK has N+K-1CN-1 members so

(The special case where N = 1 is called the geometric probability model.)
Now it is not so clear that we have a legitimate probabability assignment
because it is not clear that in the model the probability of the sample
space is 1. However, it is not too difficult to show that

by using Newton's binomial theorem (try the expansion of (1-x)-N), since

The related statistics question is if x tosses are required to obtain the
exactly N heads, what is a good estimate of p? Again, we can use calculus to
compute the MLE and we find that since x denotes the total number of tosses
then k = x - N, and the MLE is N/x, the average number of heads per toss.
The difference this time is that the number of tosses is what is random, not
the number of heads.
A general model for independent repeated trials
Suppose that N identical and independent replications of an experiment are to
be modeled. For arguements sake, suppose that the outcomes of a single trial
are non-negative integers. We can take as the sample space all N-tuples of
non-negative integers and the sigma field the set of all subsets of the sample
space. To assign probabilities, we let pk = the probability that k is
observed on a single trial, so that
and

We then assign probabilities to all the sample points by the rule

It is straightforward to check that these probabilities add to 1 and that
events determined by individual trials are independent. For example, if N = 4,
A =6 is observed on trial 1 and B = 3 is observed on trial 2 then

and

Therefore,

and

so A and B are independent.
For a specifice case, suppose that
and

Then

If the
represents the data collected from the actual trials,
then the MLE for
,
, is obtained by maximizing
as a function of
. It is a simple calculus
problem to determine that

Next: About this document ...
Eric S Key
9/16/1998