topic badge

16.06 The central limit theorem

Lesson

The central limit theorem

The standard version of the Central Limit Theorem, first proved by the French mathematician Pierre-Simon Laplace in 1810, states that if random samples of size $n$n are drawn from any population with mean $\mu$μ and variance $\sigma^2$σ2, the sampling distribution of $\overline{x}$x will be approximately normally distributed with a mean $\mu_{\overline{x}}=\mu$μx=μ and a variance $\sigma_{\overline{x}}^2=\frac{\sigma^2}{n}$σ2x=σ2n.

That statement is certainly a mouthful, so to unpick it, we will develop the idea a little more carefully. The Central Limit Theorem is one of the most important theorems in all of statistics, so it is really important to understand what it all means.

To do this we will build the concept using a much simplified scenario. we will use a population of only five numbers. In reality many populations are infinite in size, but using just five numbers makes it easy to build the idea. Our samples taken from that population will be smaller subsets of the five numbers, starting from single number sampling and progressing through to samples of size $4$4 and $5$5.

Step 1

Suppose we begin sampling, with replacement, in single units from the set of numbers $1,2,3,4,5$1,2,3,4,5.

Because our sample size is just one, the sample possibilities are $1,2,3,4$1,2,3,4 or $5$5, and the probability of each of these being selected is obviously $\frac{1}{5}$15.

We could imagine sampling single numbers continuously, so that a list of results might look something like this:

 

$1,5,4,5,2,3,1,2,3,5,4,1,2,2,3,4,5,2,4,3,\dots$1,5,4,5,2,3,1,2,3,5,4,1,2,2,3,4,5,2,4,3,

 

We expect the average of these draws to converge on the number $3$3 simply because the mean value of the equally likely outcomes is given by $\frac{\Sigma x}{n}=\frac{1+2+3+4+5}{5}=3$Σxn=1+2+3+4+55=3. This makes perfect sense because there is no reason for the random process to favor one number over any other number, so in the long term the average should tend toward $3$3.

In the draw shown above, as each number is drawn, we see the successive averages forming as:

  • $\frac{1}{1}=1,\frac{1+5}{2}=3$11=1,1+52=3
  • $\frac{1+5+4}{3}=3.333$1+5+43=3.333
  • $\frac{1+5+4+5}{4}=3.75,$1+5+4+54=3.75,
  • $\frac{1+5+4+5+2}{5}=3.4,\dots$1+5+4+5+25=3.4,

and so on.

The variance could also be progressively determined but we know that this would also tend to the variance of the five numbers, easily determined as the average of the squared deviations of each possible outcome from $3$3. Using computer software, we find this value to be exactly $2$2.

The probability distribution is clearly discrete and uniform as shown here:

 
Step 2

Suppose now, instead of sampling one number at a time, we sample two numbers at a time from the set, like this:

$2-4,1-3,5-2,...$24,13,52,...

and begin writing down the average of each pair like this:

$(2,4)\rightarrow3,(1,3)\rightarrow2,(5,2)\rightarrow3.5$(2,4)3,(1,3)2,(5,2)3.5…etc.

    

We find that the long term average of these averages (in other words the average of $3,2,3.5\dots$3,2,3.5etc) tends toward the average of $3$3 as well. We can convince you of this by the following argument.

Suppose we take the overall average of the first three $2$2-number averages above:

Then the overall average becomes:

Average $=$= $\frac{\frac{2+4}{2}+\frac{1+3}{2}+\frac{5+2}{2}}{3}$2+42+1+32+5+223
  $=$= $\frac{(2+4)+(1+3)+(5+2)}{2\times3}$(2+4)+(1+3)+(5+2)2×3
  $=$= $\frac{2+4+1+3+5+2}{6}$2+4+1+3+5+26
     

But this is just the average formed by six $1$1-number averages, and we have already agreed that the sequence of these tends to the theoretical average of $3$3 because of the fact that the single numbers are all equally likely to be selected.

From the counting principle, there are $5^2=25$52=25 subsets of $2$2 possible when selecting with replacement $2$2 numbers from $5$5.  These are listed in the table, along with the average shown in parentheses.

$1,1(1)$1,1(1) $2,1(1.5)$2,1(1.5) $3,1(2)$3,1(2) $4,1(2.5)$4,1(2.5) $5,1(3)$5,1(3)
$1,2(1.5)$1,2(1.5) $2,2(2)$2,2(2) $3,2(2.5)$3,2(2.5) $4,2(3)$4,2(3) $5,2(3.5)$5,2(3.5)
$1,3(2)$1,3(2) $2,3(2.5)$2,3(2.5) $3,3(3)$3,3(3) $4,3(3.5)$4,3(3.5) $5,3(4)$5,3(4)
$1,4(2.5)$1,4(2.5) $2,4(3)$2,4(3) $3,4(3.5)$3,4(3.5) $4,4(4)$4,4(4) $5,4(4.5)$5,4(4.5)
$1,5(3)$1,5(3) $2,5(3.5)$2,5(3.5) $3,5(4)$3,5(4) $4,5(4.5)$4,5(4.5) $5,5(5)$5,5(5)

 

Note that not all of these averages are different with the most common average being $3$3 occurring $5$5 times.

Using computer software we can verify that the average of these averages is still $3$3, but the variance has changed. The software shows a variance of just $1$1.

The distribution of these averages changes from a discrete uniform one to a discrete symmetric triangular one. Most of the pairs have an average of $3$3. Less and less averages of a certain value appear as we move further and further away from an average of $3$3

 
Step 3

The plot thickens. Suppose we now sample three numbers at a time. 

The $5^3=125$53=125 possible averages (too many to write them all down) range from $1$1 to $5$5 again, with most of them (in fact $19$19 of them) having the value $3$3.

Here is a frequency and probability distribution table of these $125$125 averages. 

$\overline{x}$x $f$f $P(\overline{x})$P(x)
$1$1 $1$1 $0.008$0.008
$1\frac{1}{3}$113 $3$3 $0.0194$0.0194
$1\frac{2}{3}$123 $6$6 $0.048$0.048
$2$2 $10$10 $0.08$0.08
$2\frac{1}{3}$213 $15$15 $0.12$0.12
$2\frac{2}{3}$223 $18$18 $0.144$0.144
$3$3 $19$19 $0.152$0.152
$3\frac{1}{3}$313 $18$18 $0.144$0.144
$3\frac{2}{3}$323 $15$15 $0.12$0.12
$4$4 $10$10 $0.08$0.08
$4\frac{1}{3}$413 $6$6 $0.048$0.048
$4\frac{2}{3}$423 $3$3 $0.0194$0.0194
$5$5 $1$1 $0.008$0.008

 

Once again we find the average of all possible averages stubbornly remains at $3$3, but the variance continues to reduce. The variance of these averages turns out to be exactly $\frac{2}{3}$23.

Something really interesting is happening to the sampling distribution. The distribution of averages looks more and more like a normal distribution, with most of the averages being $3$3 and the frequency of other averages reducing as the value of the average moves away from $3$3 above and below. 

 

Step 4 and beyond

Sampling four numbers at a time produces $5^4=625$54=625 averages, and the average of these averages again remains at $3$3. The variance continues to decrease. It now becomes $\frac{2}{4}=\frac{1}{2}$24=12. The probability of selecting three numbers with an average of $3$3 becomes, from a simple count, $\frac{85}{625}=0.136$85625=0.136.

Sampling $5$5 numbers produces $3125$3125 possible averages and the average of these averages is still $3$3 but the variance reduces further to $\frac{2}{5}=0.4$25=0.4.  The distribution of these averages looks approximately normal (Our particular population, being a discrete finite distribution with a least number 1 and greatest number 5, can never be truly normal).

Here is what the distribution of $\mu_{\overline{x}}$μx might look like for a sample of size $5$5:

Conclusions

If we look back at steps 1 to 4 and beyond, we notice two important things.

  1. The average of the $n$n-size sample averages remained constant at $3$3
  2. The variance of the $n$n-size sample averages became the variance of the population divided by $n$n.

If we call the average of the averages $\mu_{\overline{x}}$μx, and the variance of the averages $\sigma_{\overline{x}}^2$σ2x, then it is true that, for any $n$n-size sample drawn from a population consisting of the five numbers $1,2,3,4,5$1,2,3,4,5, we have:

  1. $\mu_{\overline{x}}=3=\mu$μx=3=μ
  2. $\sigma_{\overline{x}}^2=\frac{2}{n}=\frac{\sigma^2}{n}$σ2x=2n=σ2n  

 

The remarkable result that this investigation is leading to is known as the Central Limit Theorem.

Central limit theorem

The Central Limit Theorem states that:

If random samples of size $n$n are drawn from any population with mean $\mu$μ and variance $\sigma^2$σ2, the sampling distribution of $\overline{x}$x will be approximately normally distributed with a mean $\mu_{\overline{x}}=\mu$μx=μ and a variance $\sigma_{\overline{x}}^2=\frac{\sigma^2}{n}$σ2x=σ2n.

What a profound statement this is! 

Look back across our discussion and see that as the sample size increased, the average of the averages, $\mu_{\overline{x}}$μx, remained constant at $3$3 and the variance $\sigma_{\overline{x}}^2$σ2x,  reduced steadily as $\frac{2}{1}$21, $\frac{2}{2}$22, $\frac{2}{3}$23,$\frac{2}{4}$24 and $\frac{2}{5}$25. A sample of size $6$6 would show a variance of $\frac{2}{6}=\frac{1}{3}$26=13, and so on.

The Central limit theorem (CLT for short) will be put to use in later chapters. Sample means can be determined and certain probabilistic inferences can be made about the population mean $\mu$μ itself, even though it may not be known.  

 

Further notes

  1. As a general rule, the variance of the sampling distribution for samples of size $n$n from any size population will reduce as $n$n increases.
  2. As the variance reduces, the sampling distribution becomes less dispersed (more compacted) around the mean value.
  3. As the sample size increases, the more likely it will be that the collection of sample averages will resemble the true population average. Thus, the variation is expected to drop. The beauty of the Central Limit Theorem is that it tells us how it drops.
  4. Irrespective of the distribution of the population, the sampling distribution for large $n$n (usually taken as $n=30$n=30 or more) becomes asymptotically normal.
  5. If the variance of the sampling distribution is given by $\sigma_{\overline{x}}^2=\frac{\sigma^2}{n}$σ2x=σ2n, then the standard deviation is given by $\sigma_{\overline{x}}=\frac{\sqrt{\sigma^2}}{\sqrt{n}}$σx=σ2n.
  6. In many instances we take real life samples without replacement. For example we might conduct a survey of $100$100 people's height, drawn randomly from a population in a particular location. As each person's height is recorded, we exclude that person from being remeasured. In such situations, provided the sample is large enough, the sampling distribution will still be normal with the sampling mean and standard deviation given approximately by $\mu$μ and $\frac{\sigma}{\sqrt{n}}$σn.   

 

Worked example

Question 1

A regular tetrahedral dice has four sides labeled $1,2,3$1,2,3 and $4$4. Let $X$X be the outcome when the dice is rolled.

  1. What type of distribution does $X$X represent?
  2. Samples of size $64$64 are taken from the distribution (in other words, each sample constitutes a roll of the dice $16$16 times) and the means $\overline{x}$x of each sample are calculated and recorded. What type of distribution does $\overline{x}$x represent?
  3. Calculate the mean $\mu_{\overline{x}}$μx and standard deviation $\sigma_{\overline{x}}$σx of $\overline{x}$x.
 
Answers:
  1. $X$X has a discrete uniform probability distribution. Note that a continuous uniform probability distribution is a distribution where the random variable could assume a continuous range of values between a minimum and maximum value.
  2. With a large sample size like this, any population distribution at all would have a sampling distribution that was approximately normal. We are concerned with a uniform distribution, but even if the distribution was $U$U shaped, the corresponding sampling distribution for a sample size of $64$64 would be fairly close to normal.
  3. The mean of the sampling distribution, as we have seen, will be exactly the same as the mean of the population distribution. In other words, $\mu_{\overline{x}}=\frac{1+2+3+4}{4}=2.5$μx=1+2+3+44=2.5. The variance of the population can be evaluated as:
$\sigma^2$σ2 $=$= $\Sigma\frac{(X-\overline{x})^2}{n}$Σ(Xx)2n
  $=$= $\frac{(1-2.5)^2+(2-2.5)^2+(3-2.5)^2+(4-2.5)^2}{4}$(12.5)2+(22.5)2+(32.5)2+(42.5)24
  $=$= $\frac{5}{4}$54
  $=$= $1.25$1.25
     

Therefore the standard deviation $\sigma_{\overline{x}}=\sqrt{1.25}=\frac{\sqrt{5}}{2}\approx1.118$σx=1.25=521.118.

Hence the sampling distribution has a mean of $2.5$2.5 and a standard deviation of $1.118$1.118. This information can be used to predict the results of future samples of size $64$64 taken from the same population. For example, using the empirical rule, we can state that there is about a $68%$68% chance of a future sample of size $64$64 will have mean somewhere between $2.5\pm\times1.118$2.5±×1.118.  

Practice questions

Question 2

Consider a fair $6$6 sided dice, with faces labeled from $1$1 to $6$6. Let $X$X be the outcome when the dice is rolled.

  1. What type of distribution does $X$X represent?

    Continuous Uniform Distribution

    A

    Discrete Uniform Distribution

    B

    Normal Distribution

    C

    Exponential Distribution

    D
  2. Many samples of size $75$75 are taken from the distribution, and the means of each of the samples $\overline{X}$X calculated.

    What type of distribution does $\overline{X}$X approximately represent?

    Discrete Uniform Distribution

    A

    Exponential Distribution

    B

    Continuous Uniform Distribution

    C

    Normal Distribution

    D
  3. Calculate the mean of $\overline{X}$X.

  4. Calculate the standard deviation of $\overline{X}$X corresponding to a sample size of $75$75. Round your answer to $2$2 decimal places.

Question 3

A discrete random variable $X$X has a mean of $0.1$0.1 and a variance of $1.3$1.3. Samples of $60$60 observations of $X$X are taken and $\overline{X}$X, the mean of each sample, was calculated.

  1. What is the mean of $\overline{X}$X?

  2. What is the standard deviation of $\overline{X}$X? Round your answer to two decimal places.

  3. Using your answers to part (a) and part (b), calculate $P($P($0<\overline{X}<0.2$0<X<0.2$)$). Write your answer to two decimal places.

  4. Using your answers to part (a) and part (b), calculate $P(\overline{X}<$P(X<$0.3$0.3$|\overline{X}>$|X>$0.2$0.2$)$)

    Write your answer to two decimal places.

Question 4

The weight of small tins of tuna represented by the random variable $X$X is normally distributed with a mean of $90.4$90.4 g and a standard deviation of $6.5$6.5 g.

  1. If the cans are advertised as weighing $92$92 g, what is the probability a randomly chosen can is underweight? Round your answer to two decimal places.

  2. What is the expected value of $\overline{X}$X, the sample mean of a randomly chosen sample of size $50$50?

  3. Calculate the standard deviation for $\overline{X}$X. Round your answer to three decimal places.

  4. Calculate the probability, $p$p, that a randomly chosen sample of size $50$50 has a mean weight less than the advertised weight, using the central limit theorem.

    Give your answer to the nearest two decimal places.

  5. $45$45 samples, each of size $50$50 are taken.

    Calculate the probability, $q$q, that more than $41$41 samples each have a mean weight less than the advertised weight, using the central limit theorem.

    Give your answer to the nearest two decimal places.

 

Normal approximation to binomials

When graphing binomial distributions, we saw that the shape of a binomial distribution depends on the number of Bernoulli trials in the experiment ($n$n) and the probability of obtaining a success on any particular trial ($p$p).

In studying binomial distributions, we saw that if $n$n was too small or $p$p was too close to $0$0 or $1$1, the distributions were skewed. However, binomial distributions are approximately normal when $n$n is large enough and $p$p is not too close to $0$0 or $1$1.

Exploration

Use this applet to investigate which values of $n$n and $p$p work best with a normal approximation.

  1. Use the sliders to change the values of $n$n and $p$p.
  2. Check the boxes to show the normal curve and probabilities.
  3. What observations did you make? Were there any values of $n$n or $p$p that caused a problem?

As a general rule, both the mean number of successes $\mu=np$μ=np and the mean number of failures $n-\mu=n(1-p)$nμ=n(1p) must be greater than $5$5 for a good approximation by the normal curve.

The normal approximation to a binomial distribution is used because calculation with the help of a standard normal distribution table is often easier than calculation with the binomial formula, particularly when $n$n is large.

Worked example

Question 5

Suppose a balanced coin is tossed $21$21 times. We wish to determine the probability of getting exactly $7$7 heads. In this case $p=\frac{1}{2}$p=12 and $\sigma=\sqrt{21\times\frac{1}{2}\times\left(1-\frac{1}{2}\right)}=2.29$σ=21×12×(112)=2.29.

According to the binomial formula, we need to calculate

$P(k=7)=\binom{21}{7}\left(\frac{1}{2}\right)^7\left(1-\frac{1}{2}\right)^{21-7}$P(k=7)=(217)(12)7(112)217

This is easy enough to do with a good calculator but would be more difficult by hand as the numbers in the binomial coefficient are large. We find $P(k=7)=0.0554$P(k=7)=0.0554.

To work this out using the normal approximation, we have to represent the number of successes $(k=7)$(k=7) by the interval $(6.5,7.5)$(6.5,7.5). (This is called a continuity correction.)

Next, we obtain the $z$z-scores corresponding to $6.5$6.5 and $7.5$7.5 when the mean is $np=21\times0.5=10.5$np=21×0.5=10.5, and the standard deviation is $\sqrt{np(1-p)}=\sqrt{10.5\times0.5}=2.29$np(1p)=10.5×0.5=2.29.

The transformed interval is $\left(\frac{6.5-10.5}{2.29},\frac{7.5-10.5}{2.29}\right)\approx\left(-1.747,-1.31\right)$(6.510.52.29,7.510.52.29)(1.747,1.31).

We use the table for the standard normal probability density function to obtain the probability under the curve between $-1.747$1.747 and $-1.31$1.31. By symmetry, this is the same as the probability between $1.31$1.31 and $1.747$1.747. (You can find a table here.)

You should check that the probability obtained this way is $0.4599-0.4049=0.055$0.45990.4049=0.055 which is close to the result from the binomial formula.

 

Question 6

An aerosol spray has been designed to kill house flies. An individual fly exposed to the aerosol under test conditions has a probability of dying within one minute of $0.9$0.9. If $500$500 flies are exposed to the aerosol, what is the probability that at least $450$450 of them will be dead within one minute? 

We need to add the probabilities of $450,451,452,...,500$450,451,452,...,500 flies dying. This would be a very cumbersome calculation if done with the binomial formula. We use, instead, the normal approximation and look for the area beneath the normal density curve above $449.5$449.5 (using the continuity correction as before).

The mean of the binomial distribution is $np=500\times0.9=450$np=500×0.9=450 and the standard deviation is $\sqrt{500\times0.9\times0.1}\approx4.5$500×0.9×0.14.5. Thus, we need the area above $\frac{449.5-450}{4.5}\approx-0.11$449.54504.50.11 in the standard normal distribution.

This area is $0.5$0.5 plus the area between $0$0 and $0.11$0.11. So, according to the table, the probability must be $0.5+0.0438\approx0.54$0.5+0.04380.54.

If needed, we can work out from this that the probability that fewer than $450$450 flies will be dead after $1$1 minute is $0.46$0.46.

 

Summary

If $n$n is sufficiently large and $p$p is not too close to $0$0 or $1$1, (both $np>5$np>5 and $n(1-p)>5$n(1p)>5) then the binomial distribution will be approximately normal, with a mean of $\mu=np$μ=np and a standard deviation of $\sigma=\sqrt{np\left(1-p\right)}$σ=np(1p).

What is Mathspace

About Mathspace