NZ Level 8 (NZC) Level 3 (NCEA) [In development]
topic badge
Central limit theorem
Lesson

The standard version of the Central Limit Theorem, first proved by the French mathematician Pierre-Simon Laplace in 1810, states that if random samples of size $n$n are drawn from any population with mean $\mu$μ and variance $\sigma^2$σ2, the sampling distribution of $\overline{x}$x will be approximately normally distributed with a mean $\mu_{\overline{x}}=\mu$μx=μ and a variance $\sigma_{\overline{x}}^2=\frac{\sigma^2}{n}$σ2x=σ2n.

That statement is certainly a mouthful, so to unpick it, we will develop the idea a little more carefully. The Central Limit Theorem is one of the most important theorems in all of statistics, so it is really important to understand what it all means.

To do this we will build the concept using a much simplified scenario. we will use a population of only five numbers. In reality many populations are infinite in size, but using just five numbers makes it easy to build the idea. Our samples taken from that population will be smaller subsets of the five numbers, starting from single number sampling and progressing through to samples of size $4$4 and $5$5.

Step 1

Suppose we begin sampling, with replacement, in single units from the set of numbers $1,2,3,4,5$1,2,3,4,5.

Because our sample size is just one, the sample possibilities are $1,2,3,4$1,2,3,4 or $5$5, and the probability of each of these being selected is obviously $\frac{1}{5}$15.

We could imagine sampling single numbers continuously, so that a list of results might look something like this:

 

$1,5,4,5,2,3,1,2,3,5,4,1,2,2,3,4,5,2,4,3,\dots$1,5,4,5,2,3,1,2,3,5,4,1,2,2,3,4,5,2,4,3,

 

We expect the average of these draws to converge on the number $3$3 simply because the mean value of the equally likely outcomes is given by $\frac{\Sigma x}{n}=\frac{1+2+3+4+5}{5}=3$Σxn=1+2+3+4+55=3. This makes perfect sense because there is no reason for the random process to favour one number over any other number, so in the long term the average should tend toward $3$3.

In the draw shown above, as each number is drawn, we see the successive averages forming as:

  • $\frac{1}{1}=1,\frac{1+5}{2}=3$11=1,1+52=3
  • $\frac{1+5+4}{3}=3.333$1+5+43=3.333
  • $\frac{1+5+4+5}{4}=3.75,$1+5+4+54=3.75,
  • $\frac{1+5+4+5+2}{5}=3.4,\dots$1+5+4+5+25=3.4,

and so on.

The variance could also be progressively determined but we know that this would also tend to the variance of the five numbers, easily determined as the average of the squared deviations of each possible outcome from $3$3. Using computer software, we find this value to be exactly $2$2.

The probability distribution is clearly discrete and uniform as shown here:

 
Step 2

Suppose now, instead of sampling one number at a time, we sample two numbers at a time from the set, like this:

$2-4,1-3,5-2,...$24,13,52,...

and begin writing down the average of each pair like this:

$(2,4)\rightarrow3,(1,3)\rightarrow2,(5,2)\rightarrow3.5$(2,4)3,(1,3)2,(5,2)3.5…etc.

    

We find that the long term average of these averages (in other words the average of $3,2,3.5\dots$3,2,3.5etc) tends toward the average of $3$3 as well. We can convince you of this by the following argument.

Suppose we take the overall average of the first three $2$2-number averages above:

Then the overall average becomes:

Average $=$= $\frac{\frac{2+4}{2}+\frac{1+3}{2}+\frac{5+2}{2}}{3}$2+42+1+32+5+223
  $=$= $\frac{(2+4)+(1+3)+(5+2)}{2\times3}$(2+4)+(1+3)+(5+2)2×3
  $=$= $\frac{2+4+1+3+5+2}{6}$2+4+1+3+5+26
     

But this is just the average formed by six $1$1-number averages, and we have already agreed that the sequence of these tends to the theoretical average of $3$3 because of the fact that the single numbers are all equally likely to be selected.

From the counting principle, there are $5^2=25$52=25 subsets of $2$2 possible when selecting with replacement $2$2 numbers from $5$5.  These are listed in the table, along with the average shown in brackets.

$1,1(1)$1,1(1) $2,1(1.5)$2,1(1.5) $3,1(2)$3,1(2) $4,1(2.5)$4,1(2.5) $5,1(3)$5,1(3)
$1,2(1.5)$1,2(1.5) $2,2(2)$2,2(2) $3,2(2.5)$3,2(2.5) $4,2(3)$4,2(3) $5,2(3.5)$5,2(3.5)
$1,3(2)$1,3(2) $2,3(2.5)$2,3(2.5) $3,3(3)$3,3(3) $4,3(3.5)$4,3(3.5) $5,3(4)$5,3(4)
$1,4(2.5)$1,4(2.5) $2,4(3)$2,4(3) $3,4(3.5)$3,4(3.5) $4,4(4)$4,4(4) $5,4(4.5)$5,4(4.5)
$1,5(3)$1,5(3) $2,5(3.5)$2,5(3.5) $3,5(4)$3,5(4) $4,5(4.5)$4,5(4.5) $5,5(5)$5,5(5)

 

Note that not all of these averages are different with the most common average being $3$3 occurring $5$5 times.

Using computer software we can verify that the average of these averages is still $3$3, but the variance has changed. The software shows a variance of just $1$1.

The distribution of these averages changes from a discrete uniform one to a discrete symmetric triangular one. Most of the pairs have an average of $3$3. Less and less averages of a certain value appear as we move further and further away from an average of $3$3

 
Step 3

The plot thickens. Suppose we now sample three numbers at a time. 

The $5^3=125$53=125 possible averages (too many to write them all down) range from $1$1 to $5$5 again, with most of them (in fact $19$19 of them) having the value $3$3.

Here is a frequency and probability distribution table of these $125$125 averages. 

$\overline{x}$x $f$f $P(\overline{x})$P(x)
$1$1 $1$1 $0.008$0.008
$1\frac{1}{3}$113 $3$3 $0.0194$0.0194
$1\frac{2}{3}$123 $6$6 $0.048$0.048
$2$2 $10$10 $0.08$0.08
$2\frac{1}{3}$213 $15$15 $0.12$0.12
$2\frac{2}{3}$223 $18$18 $0.144$0.144
$3$3 $19$19 $0.152$0.152
$3\frac{1}{3}$313 $18$18 $0.144$0.144
$3\frac{2}{3}$323 $15$15 $0.12$0.12
$4$4 $10$10 $0.08$0.08
$4\frac{1}{3}$413 $6$6 $0.048$0.048
$4\frac{2}{3}$423 $3$3 $0.0194$0.0194
$5$5 $1$1 $0.008$0.008

 

Once again we find the average of all possible averages stubbornly remains at $3$3, but the variance continues to reduce. The variance of these averages turns out to be exactly $\frac{2}{3}$23.

Something really interesting is happening to the sampling distribution. The distribution of averages looks more and more like a normal distribution, with most of the averages being $3$3 and the frequency of other averages reducing as the value of the average moves away from $3$3 above and below. 

 

Step 4 and beyond

Sampling four numbers at a time produces $5^4=625$54=625 averages, and the average of these averages again remains at $3$3. The variance continues to decrease. It now becomes $\frac{2}{4}=\frac{1}{2}$24=12. The probability of selecting three numbers with an average of $3$3 becomes, from a simple count, $\frac{85}{625}=0.136$85625=0.136.

Sampling $5$5 numbers produces $3125$3125 possible averages and the average of these averages is still $3$3 but the variance reduces further to $\frac{2}{5}=0.4$25=0.4.  The distribution of these averages looks approximately normal (Our particular population, being a discrete finite distribution with a lowest number 1 and highest number 5, can never be truly normal).

Here is what the distribution of $\mu_{\overline{x}}$μx might look like for a sample of size $5$5:

Conclusions

If we look back at steps 1 to 4 and beyond, we notice two important things.

  1. The average of the $n$n-size sample averages remained constant at $3$3
  2. The variance of the $n$n-size sample averages became the variance of the population divided by $n$n.

If we call the average of the averages $\mu_{\overline{x}}$μx, and the variance of the averages $\sigma_{\overline{x}}^2$σ2x, then it is true that, for any $n$n-size sample drawn from a population consisting of the five numbers $1,2,3,4,5$1,2,3,4,5, we have:

  1. $\mu_{\overline{x}}=3=\mu$μx=3=μ
  2. $\sigma_{\overline{x}}^2=\frac{2}{n}=\frac{\sigma^2}{n}$σ2x=2n=σ2n  

 

The remarkable result that this investigation is leading to is known as the Central Limit Theorem.

 

The Central Limit Theorem again

The Central Limit Theorem states that:

If random samples of size $n$n are drawn from any population with mean $\mu$μ and variance $\sigma^2$σ2, the sampling distribution of $\overline{x}$x will be approximately normally distributed with a mean $\mu_{\overline{x}}=\mu$μx=μ and a variance $\sigma_{\overline{x}}^2=\frac{\sigma^2}{n}$σ2x=σ2n.

What a profound statement this is! 

Look back across our discussion and see that as the sample size increased, the average of the averages, $\mu_{\overline{x}}$μx, remained constant at $3$3 and the variance $\sigma_{\overline{x}}^2$σ2x,  reduced steadily as $\frac{2}{1}$21, $\frac{2}{2}$22, $\frac{2}{3}$23,$\frac{2}{4}$24 and $\frac{2}{5}$25. A sample of size $6$6 would show a variance of $\frac{2}{6}=\frac{1}{3}$26=13, and so on.

The Central limit theorem (CLT for short) will be put to use in later chapters. Sample means can be determined and certain probabilistic inferences can be made about the population mean $\mu$μ itself, even though it may not be known.  

 

Further notes

  1. As a general rule, the variance of the sampling distribution for samples of size $n$n from any size population will reduce as $n$n increases.
  2. As the variance reduces, the sampling distribution becomes less dispersed (more compacted) around the mean value.
  3. As the sample size increases, the more likely it will be that the collection of sample averages will resemble the true population average. Thus, the variation is expected to drop. The beauty of the Central Limit Theorem is that it tells us how it drops.
  4. Irrespective of the distribution of the population, the sampling distribution for large $n$n (usually taken as $n=30$n=30 or more) becomes asymptotically normal.
  5. If the variance of the sampling distribution is given by $\sigma_{\overline{x}}^2=\frac{\sigma^2}{n}$σ2x=σ2n, then the standard deviation is given by $\sigma_{\overline{x}}=\frac{\sqrt{\sigma^2}}{\sqrt{n}}$σx=σ2n.
  6. In many instances we take real life samples without replacement. For example we might conduct a survey of $100$100 people's height, drawn randomly from a population in a particular location. As each person's height is recorded, we exclude that person from being remeasured. In such situations, provided the sample is large enough, the sampling distribution will still be normal with the sampling mean and standard deviation given approximately by $\mu$μ and $\frac{\sigma}{\sqrt{n}}$σn.   

 

An example

 

Question

A regular tetrahedral dice has four sides labelled $1,2,3$1,2,3 and $4$4. Let $X$X be the outcome when the dice is rolled.

  1. What type of distribution does $X$X represent?
  2. Samples of size $64$64 are taken from the distribution (in other words, each sample constitutes a roll of the dice $16$16 times) and the means $\overline{x}$x of each sample are calculated and recorded. What type of distribution does $\overline{x}$x represent?
  3. Calculate the mean $\mu_{\overline{x}}$μx and standard deviation $\sigma_{\overline{x}}$σx of $\overline{x}$x.
 
Answers:

Q1

$X$X has a discrete uniform probability distribution. Note that a continuous uniform probability distribution is a distribution where the random variable could assume a continuous range of values between a minimum and maximum value.

Q2 

With a large sample size like this, any population distribution at all would have a sampling distribution that was approximately normal. We are concerned with a uniform distribution, but even if the distribution was $U$U shaped, the corresponding sampling distribution for a sample size of $64$64 would be fairly close to normal.

Q3

The mean of the sampling distribution, as we have seen, will be exactly the same as the mean of the population distribution. In other words, $\mu_{\overline{x}}=\frac{1+2+3+4}{4}=2.5$μx=1+2+3+44=2.5. The variance of the population can be evaluated as:

$\sigma^2$σ2 $=$= $\Sigma\frac{(X-\overline{x})^2}{n}$Σ(Xx)2n
  $=$= $\frac{(1-2.5)^2+(2-2.5)^2+(3-2.5)^2+(4-2.5)^2}{4}$(12.5)2+(22.5)2+(32.5)2+(42.5)24
  $=$= $\frac{5}{4}$54
  $=$= $1.25$1.25
     

Therefore the standard deviation $\sigma_{\overline{x}}=\sqrt{1.25}=\frac{\sqrt{5}}{2}\approx1.118$σx=1.25=521.118.

Hence the sampling distribution has a mean of $2.5$2.5 and a standard deviation of $1.118$1.118. This information can be used to predict the results of future samples of size $64$64 taken from the same population. For example, using the empirical rule, we can state that there is about a $68%$68% chance of a future sample of size $64$64 will have mean somewhere between $2.5\pm\times1.118$2.5±×1.118.  

 

 

 

 

Worked examples

Question 1

Consider a fair $6$6 sided dice, with faces labeled from $1$1 to $6$6. Let $X$X be the outcome when the dice is rolled.

  1. What type of distribution does $X$X represent?

    Continuous Uniform Distribution

    A

    Discrete Uniform Distribution

    B

    Normal Distribution

    C

    Exponential Distribution

    D

    Continuous Uniform Distribution

    A

    Discrete Uniform Distribution

    B

    Normal Distribution

    C

    Exponential Distribution

    D
  2. Many samples of size $75$75 are taken from the distribution, and the means of each of the samples $\overline{X}$X calculated.

    What type of distribution does $\overline{X}$X approximately represent?

    Discrete Uniform Distribution

    A

    Exponential Distribution

    B

    Continuous Uniform Distribution

    C

    Normal Distribution

    D

    Discrete Uniform Distribution

    A

    Exponential Distribution

    B

    Continuous Uniform Distribution

    C

    Normal Distribution

    D
  3. Calculate the mean of $\overline{X}$X.

  4. Calculate the standard deviation of $\overline{X}$X corresponding to a sample size of $75$75. Round your answer to $2$2 decimal places.

Question 2

A discrete random variable $X$X has a mean of $0.1$0.1 and a variance of $1.3$1.3. Samples of $60$60 observations of $X$X are taken and $\overline{X}$X, the mean of each sample, was calculated.

  1. What is the mean of $\overline{X}$X?

  2. What is the standard deviation of $\overline{X}$X? Round your answer to two decimal places.

  3. Using your answers to part (a) and part (b), calculate $P($P($0<\overline{X}<0.2$0<X<0.2$)$). Write your answer to two decimal places.

  4. Using your answers to part (a) and part (b), calculate $P(\overline{X}<$P(X<$0.3$0.3$|\overline{X}>$|X>$0.2$0.2$)$)

    Write your answer to two decimal places.

Question 3

The weight of small tins of tuna represented by the random variable $X$X is normally distributed with a mean of $90.4$90.4 g and a standard deviation of $6.5$6.5 g.

  1. If the cans are advertised as weighing $92$92 g, what is the probability a randomly chosen can is underweight? Round your answer to two decimal places.

  2. What is the expected value of $\overline{X}$X, the sample mean of a randomly chosen sample of size $50$50?

  3. Calculate the standard deviation for $\overline{X}$X. Round your answer to three decimal places.

  4. Calculate the probability, $p$p, that a randomly chosen sample of size $50$50 has a mean weight less than the advertised weight, using the central limit theorem.

    Give your answer to the nearest two decimal places.

  5. $45$45 samples, each of size $50$50 are taken.

    Calculate the probability, $q$q, that more than $41$41 samples each have a mean weight less than the advertised weight, using the central limit theorem.

    Give your answer to the nearest two decimal places.

 

Recall that the Central Limit Theorem states:

If random samples of size $n$n are drawn from any population with mean $\mu$μ and variance $\sigma^2$σ2, the sampling distribution of $\overline{x}$x will be approximately normally distributed with a mean $\mu_{\overline{x}}=\mu$μx=μ and a variance $\sigma_{\overline{x}}^2=\frac{\sigma^2}{n}$σ2x=σ2n.

 

The theorem allows us to make probability statements on the location of any sample mean from future samples that we take. Here is an explained example to follow.  

The weights of adult female cows from a very large Australian cattle station are determined to be normally distributed with a mean of $720$720 kg and a standard deviation of $150$150 kg.

In a major farm study, $36$36 adult female cows are randomly chosen and weighed using a large weighing machine. The average weight is recorded. This sampling procedure is repeated daily for $3$3 months using different cows on the property and each time the mean weight is recorded.

Consider the following four questions:

  1. What is the approximate probability that a sample mean exceeds $750$750 kg? 
  2. What is the probability that a sample mean lies between $700$700 kg and $740$740 kg?
  3. Given that the sample mean does not exceed $740$740 kg, what is the approximate probability that the sample mean is at least $700$700 kg?
  4. What is the probability that a sample mean is not within the range $670$670 kg and $770$770 kg?

 

Q1

Mathematically, we are determining the probability $P(\overline{x}\ge750)$P(x750) on the assumption that the sampling distribution is approximately normal with mean $\mu_{\overline{x}}=720$μx=720 and standard deviation determined as:

$\sigma_{\overline{x}}$σx $=$= $\frac{\sigma}{\sqrt{n}}$σn
  $=$= $\frac{150}{\sqrt{36}}$15036
  $=$= $25$25
     

To do this we are best to transform the problem to a standard normal problem and use tables or computer software to evaluate the relevant area.

So we have:

$P(\overline{x}\ge750)$P(x750) $\approx$ $P(z\ge\frac{750-\mu_{\overline{x}}}{\sigma_{\overline{x}}})$P(z750μxσx)
  $=$= $P(z\ge\frac{750-720}{25})$P(z75072025)
  $=$= $P(z\ge1.2)$P(z1.2)
  $=$= $0.1151$0.1151
     

Hence there is a little more than $11%$11% chance that $\overline{x}$x will exceed $750$750 kg. 

Q2

This interval can be transformed to an interval on the standard normal distribution so that:

$P(700\le\overline{x}\le740)$P(700x740) $\approx$ $P(\frac{700-720}{25}\le z\le\frac{740-720}{25})$P(70072025z74072025)
  $=$= $P(-0.8\le z\le0.8)$P(0.8z0.8)
  $=$= $2\times P(0\le z\le0.8)$2×P(0z0.8)
  $=$= $2\times0.2881$2×0.2881
  $=$= $0.5762$0.5762
     

This means that there is about a $58%$58% chance of a sample mean between $700$700 kg and $740$740 kg. 

Q3

This is a conditional probability statement.

It almost looks the same question as question 2, but there is an important difference.

What question 3 is asking can be mathematically described as $P(\overline{x}\ge700|\overline{x}\le740)$P(x700|x740).  

This can be reinterpreted as $\frac{P(700\le\overline{x}\le740)}{P(\overline{x}\le740)}$P(700x740)P(x740)

In other words, the numerator probability is exactly the same as that for question 2, but the total fraction increases because of the reduction of sample space probability present in the denominator. This is what the conditional part of the question is doing - reducing the total amount of probability available.

Thus, using the result of question 2, we have:

$\frac{P(700\le\overline{x}\le740)}{P(\overline{x}\le740)}$P(700x740)P(x740) $\approx$ $\frac{0.5762}{P(z\le\frac{740-720}{25})}$0.5762P(z74072025)
  $=$= $\frac{0.5762}{P(z\le0.8)}$0.5762P(z0.8)
  $=$= $\frac{0.5762}{0.7119}$0.57620.7119
  $=$= $0.8094$0.8094
     

This represents about an $81%$81% chance of a sample mean being at leat $700$700 kg given that the sample mean doesn't exceed $740$740 kg. 

Q4

This last question can be done a number of ways but perhaps the best way is to split it up into the sum of two equal probabilities:

$P(\overline{x}\le670\cap\overline{x}\ge770)$P(x670x770) $\approx$ $2\times P(\overline{x}\le670)$2×P(x670)
  $=$= $2\times P(z\le\frac{670-720}{25})$2×P(z67072025)
  $=$= $2\times P(z\le2)$2×P(z2)
  $=$= $2\times0.0228$2×0.0228
  $=$= $0.0456$0.0456
     

Thus there is about a $4\frac{1}{2}%$412% chance of this event happening.

Worked Examples

QUESTION 1

A discrete random variable $X$X has a mean of $0.1$0.1 and a variance of $1.3$1.3. Samples of $60$60 observations of $X$X are taken and $\overline{X}$X, the mean of each sample, was calculated.

  1. What is the mean of $\overline{X}$X?

  2. What is the standard deviation of $\overline{X}$X? Round your answer to two decimal places.

  3. Using your answers to part (a) and part (b), calculate $P($P($0<\overline{X}<0.2$0<X<0.2$)$). Write your answer to two decimal places.

  4. Using your answers to part (a) and part (b), calculate $P(\overline{X}<$P(X<$0.3$0.3$|\overline{X}>$|X>$0.2$0.2$)$)

    Write your answer to two decimal places.

QUESTION 2

The weight of small tins of tuna represented by the random variable $X$X is normally distributed with a mean of $90.4$90.4 g and a standard deviation of $6.5$6.5 g.

  1. If the cans are advertised as weighing $92$92 g, what is the probability a randomly chosen can is underweight? Round your answer to two decimal places.

  2. What is the expected value of $\overline{X}$X, the sample mean of a randomly chosen sample of size $50$50?

  3. Calculate the standard deviation for $\overline{X}$X. Round your answer to three decimal places.

  4. Calculate the probability, $p$p, that a randomly chosen sample of size $50$50 has a mean weight less than the advertised weight, using the central limit theorem.

    Give your answer to the nearest two decimal places.

  5. $45$45 samples, each of size $50$50 are taken.

    Calculate the probability, $q$q, that more than $41$41 samples each have a mean weight less than the advertised weight, using the central limit theorem.

    Give your answer to the nearest two decimal places.

Outcomes

S8-2

Make inferences from surveys and experiments: A determining estimates and confidence intervals for means, proportions, and differences, recognising the relevance of the central limit theorem B using methods such as resampling or randomisation to assess

91582

Use statistical methods to make a formal inference

What is Mathspace

About Mathspace