I have in my possession two pennies which appear, to the naked eye, identical. However, one of them comes up heads 1/2 the time while the other comes up heads only 40 percent of the time. Unfortunately, I have confused them, and then lost one of them. I would like to design an experiment to determine which one I have left.
I think that that the one that is left is the trick one. If this is the case, I expect that in large number of tosses, roughly 40 percent of the tosses will be heads. My experiment will be of the following sort. I will toss the coin 100 times. If I get around 40 heads, I will conclude that my coin is the trick one. If not, I will conclude that it is not the trick one. My problem is to make precise the idea of around 40 heads. I know that with the trick coin I might get 35 heads or 45 heads in 100 tosses. On the other hand, I would be astonished if I got 10 heads, or 90 heads.
I decide then that my test will be if I get anywhere from 35 to 45 heads I will be satisfied that it is the trick coin, and otherwise I will conclude that I have lost the trick coin and all that I have is an ordinary fair coin.
There are then four possibilities before I run my test:
Suppose that, for example, I wanted to know the probability of getting 35 heads in 100 tosses of a coin. If I think of a sequence of 100 H's and T's in which 35 H's occur, each such sequence accounts for one way the experiment of tossing the coin 100 times could occur. Any of these individual sequences is equally likely, so if I could count how many sequences there were, and then multiply by the chance one of them occurs, I would know the chance of getting 35 heads.
Each of these steps is easily computed. In the first place, there are
![]()
p35(1-p)65.
Thus the chance of exactly 35 H's and 65 T's is![]()
![]()




If we double everything, so that we have 200 tosses and 70 to 90 heads we get that the chance of (1) is about 0.871, the chance of (2) is about 0.129, the chance of (3) is about 0.089 and of (4) about 0.911. Remember, we want (2) and (3) to be small. This is certainly better.
On the other hand, if we simply make the interval wider while using 100 tosses, say by using 33 to 47 heads as our criteria, the chance of (1) is nearly 0.875, and the chance of (2) nearly 0.125, but the chance of (3) becomes nearly 0.310, and the chance of (4) drops to about 0.690.
Thus somehow we must strike an acceptable balance between a lengthy experiment and acceptable values for the probability of the test being in error.
In order to begin to make order out of this chaos, we have to have some description of what is happening in general. There are three basic things common to all analyses.