next up previous
Next: About this document ...

Lecture 13: The Markov Inequality and the Chebychev Inequality

Our calculutions have shown that if V is a random sample of size N from a distribution with mean $\mu$ and variance $\sigma^2$ that

1.
The statistic

\begin{displaymath}
T(V) = \frac{1}{N}\sum_{k=1}^N V_k\end{displaymath}

is an unbiased statistic for $\mu$;
2.
The variance of T(V) is given by

\begin{displaymath}
{\rm Var}[T(V)] = \frac{\sigma^2}{N}\end{displaymath}

This leads us to believe that as the sample size N grows, the observations of T(V) should cluster near the mean in the sense that the variance of T(V) is shrinking and small variance means ``the probability is not too spread out''.

We can try to make this more precise by trying to estimate $\Pr(\vert T(V)-\mu\vert \gt x)$ for any positive number x. We start with something simpler.

Suppose that S is a non-negative random variable. Let x be a positive number and let Ix be the random variable given by the rule

\begin{displaymath}
I_x = \left\{\begin{array}
{rr}
0 & {\rm if}\;S < x\ 1 & {\rm if}\;S \geq x\end{array}\right.\end{displaymath}

Then it is easy to check that

\begin{displaymath}
S = S\times 1 = S\times (I_x + 1 - I_x) = SI_x + S(1-I_x) \geq SI_x\end{displaymath}

With a little more work we see that

\begin{displaymath}
SI_x \geq xI_x\end{displaymath}

so that

S - xIx

is a non-negative random variable. Since non-negative random variables have non-negative expected values (if they have expected values at all), we see that if the expected value of S exists, then

\begin{displaymath}
0 \leq E[S-xI_x] = E[S] - xE[I_x] = E[S] - x\Pr(S \geq x)\end{displaymath}

so

\begin{displaymath}
\Pr(S \geq x) \leq \frac{E[S]}{x}.\end{displaymath}

This last inequality, valid when
1.
S is a non-negative random variable;
2.
E[S] is defined;
3.
x is a positive real number;
is called Markov's inequality. Notice that it can also be used to estimate $\Pr(S \gt x)$ since $\Pr(S \gt x) \leq
\Pr(S\geq x)$.

Markov's equality can be used to solve the problem of estimating $\Pr(\vert R - E[R]\vert \geq x)$ since |R-E[R]| is a non-negative random variable with a finite expected value, giving the estimate

\begin{displaymath}
\Pr(\vert R - E[R]\vert \geq x) \leq \frac{E[\vert R-E[R]\vert]}{x}.\end{displaymath}

Since the numerator of the righthand side is not commonly known, we observe that

\begin{displaymath}
\{\vert R - E[R]\vert \geq x\} = \{(R - E[R])^2 \geq x^2\}\end{displaymath}

when x is a non-negative number. So if R has a variance, we get

\begin{displaymath}
\Pr(\vert R - E[R]\vert \geq x) = \Pr((R - E[R])^2 \geq x^2) \leq
\frac{E[(R-E[R])^2]}{x^2} = \frac{{\rm Var}[R]}{x^2} \end{displaymath}

This inequality is called Chebychev's inequality and can be used to give some idea of how likely it is that a random quantity is some number of standard deviations from it mean value, since

\begin{displaymath}
\Pr(\vert R - E[R]\vert \geq y\sqrt{{\rm Var}[R]}) \leq \frac{{\rm Var}[R]}{y^2{\rm
Var}[R]} = \frac{1}{y^2}\end{displaymath}

For example, in the standard scheme of curving grades, each grade range has width one standard deviation, and the C range is centered at the expected score. Thus if R represents a grade and S represents a score,

\begin{displaymath}
\Pr(R = {\rm A\; or\; F}) = \Pr( \vert S - E[S]\vert \geq 3\sqrt{{\rm Var}[S]}/2) \leq
4/9 \end{displaymath}

However, if the scores are in fact normally distributed, we can calculate that (with $\sigma$ the standard deviation and $\mu$ the mean),

\begin{displaymath}
\Pr(R = A or F) = \Pr( \vert S - \mu\vert \geq 3\sigma/2) 
=...
 ...\mu-3\sigma/2}^{\mu+3\sigma/2}
\exp(-((x-\mu)/\sigma)^2/2)\;dx \end{displaymath}

However, by using the change of variables $(x-\mu)/\sigma = z$ we get

\begin{displaymath}
\frac{1}{\sqrt{2\pi\sigma^2}}\int_{\mu-3\sigma/2}^{\mu+3\sig...
 ...{1}{\sqrt{2\pi}}\int_{-3/2}^{3/2}\exp(-z^2/2)\;dz \approx .866.\end{displaymath}

Since if S is normal it is symmetrically distributed about the mean, you would expect that under such a system not more than about 7 percent of the grades would be A's and about the same percentage of F's, no matter how well everyone does!

The implications of Chebychev's inequality for our statistic T(V) are that

\begin{displaymath}
\Pr(\vert T(V) - \mu\vert \gt x) \leq \frac{{\rm Var}[T(V)]}{x^2} = \frac{\sigma^2}{Nx^2}\end{displaymath}

so the probability that T(V) is more than x units from $\mu$ decreases at least linearly in the sample size for each x.



 
next up previous
Next: About this document ...
Eric S Key
10/9/1998