next up previous
Next: About this document ...

Lecture 22: Basics of Hypothesis Testing

The general theory of hypothesis testing consists of extensions of the following basic example.

I have in my hand a coin. I believe that the coin is fair, that is, that each of toss of the coin is as likely to be heads as tails. I will toss the coin 100 times, and record on each toss whether or not I see a head (H) or a tail (T). I will then use the resulting 100-tuple of H's and T's to decide if the coin is fair. In the end, I may have erred in one of two ways:

Some terminology:

The statistician's goal is to derive a test which has low probabilities of making Type I and Type II errors. There is no one way do this, and we will explore many possibilities.

Let us try one obvious solution. We know from the law of large numbers that if the tosses of the coin are assumed to be independent that

\begin{displaymath}
\Pr\left(\lim_{n\rightarrow\infty} \frac{{\rm
number\;of\;he...
 ...}n{\rm\;tosses}}{n} = {\rm
\;probability\;of\;heads}\right) = 1\end{displaymath}

so we would be surprised if the coin is fair that in 100 tosses we had a total number of heads which was much different than 50. Hence if we let X be our 100-tuple of H's and T's, and defined D(X) to be the number of H's in X, we are looking for a rejection region of the form

\begin{displaymath}
\{X: \vert D(X) - 50\vert \gt u\}\end{displaymath}

where we have to choose u. If we were to choose u = 50, then we would never make a Type I error, but we would probably make a lot of Type II errors. So to cut down the Type II errors, we have to allow some Type I errors.

One thing we could do to limit the probability of a Type I error. For argument's sake, suppose we wish to make a Type I error no more than 5% of the time. The probability of a Type I error is the probability of rejecting the null hypothesis when it is true. When it is true, the coin is fair, and D(X) has a binomial distribution with N = 100 and p=1/2, so we want to pick u so that

\begin{displaymath}
\Pr(\vert D(X)- 50\vert \gt u) \leq 0.05\end{displaymath}

and as close to 0.05 as possible. We can do this in two ways. First,

\begin{displaymath}
\Pr(\vert D(X)- 50\vert \gt u) = 1 - \Pr(\vert D(X)-50\vert ...
 ...y}
{c}100\ k\end{array}\right)$}\left(\frac{1}{2}\right)^{100}\end{displaymath}

so we could try to find u by experimenting. This may not work too well since (1/2)100 is a very small number.

The other possibility is to use the Central Limit Theorem to get an approximate value of u, and then check it. The Central Limit Theorem tells us that

\begin{displaymath}
\Pr(\vert D(X)- 50\vert \gt u) = \Pr(\frac{\vert D(X)-50\ver...
 ...qrt{{\rm Var}(X)}}^\infty \frac{1}{\sqrt{2\pi}}\exp(-x^2/2)\;dx\end{displaymath}

is a reasonably good approximation if D(X) is binomial with N=100 and p = 1/2. Under these conditions the variance of D(X) is $100\times(1/2)\times(1/2) = 25$ so we want to pick u so that

\begin{displaymath}
2\int_{u/5}^\infty \frac{1}{\sqrt{2\pi}}\exp(-x^2/2)\;dx \approx 0.05\end{displaymath}

so $u/5 \approx 1.96$, or $u \approx 9.8$. If we check, setting u = 10 gives

\begin{displaymath}
\Pr(\vert D(X)- 50\vert \gt 10) \approx 0.035\end{displaymath}

and setting u = 9 gives

\begin{displaymath}
\Pr(\vert D(X)- 50\vert \gt 9) \approx 0.057.\end{displaymath}

What then about the Type II error? We have been very non-specific about what the alternative is to our null hypothesis. The alternative the the null hypothesis is called the alternative hypothesis. A hypothesis (null or alternative) is called simple if it contains one element, and it is called composite if it contains more than one element.

If we have no idea at all, the null hypothesis would be that the probability of heads is not 1/2. Maybe we have some additional information, such as that if the probability of heads is not 1/2 then it is less than 1/2. Maybe we know that if it is not 1/2, it is 1/4. In every case the probability of making a Type II error is a function of the elements of the alternative hypothesis.

For example, if the true value of heads is 1/4, then D(X) is binomial with N=100 and p=1/4. The probability of making a Type II error if this is the case is

\begin{displaymath}
\Pr(\vert D(X) - 50\vert \leq 10) =
\sum_{k=40}^{60}
\mbox{$...
 ...{1}{4}\right)^k\left(\frac{3}{4}\right)^{100-k} 
\approx 0.0007\end{displaymath}

and if p = 2/5

\begin{displaymath}
\Pr(\vert D(X) - 50\vert \leq 10) =
\sum_{k=40}^{60}
\mbox{$...
 ...{2}{5}\right)^k\left(\frac{3}{5}\right)^{100-k} 
\approx 0.5379\end{displaymath}

You can use a computer algebra system to graph the probability of Type II error as a function of the probability of heads.

Let p be the probability of heads. It may have occured to you that if the null hypothesis, H0 is $H_{0} = \{1/2\}$ and the alternative hypothesis, H1 is $H_1 = \{1/4\}$ or $H_1=\{p < 1/2\}$ that we should only reject the null hypothesis if D(X) is too small, as getting 75 heads, say, is more indicative of H0 than of H1. So as you can see, the problem of designing a test can be quite complicated. One of our goals will be to develop systematic methods for finding good tests. Our first problem will be to decide what makes a test ``good''.



 
next up previous
Next: About this document ...
Eric S Key
1/21/1999