The standard version of the Central Limit Theorem, first proved by the French mathematician Pierre-Simon Laplace in 1810, states that if random samples of size $n$n are drawn from any population with mean $\mu$μ and variance $\sigma^2$σ2, the sampling distribution of $\overline{x}$x will be approximately normally distributed with a mean $\mu_{\overline{x}}=\mu$μx=μ and a variance $\sigma_{\overline{x}}^2=\frac{\sigma^2}{n}$σ2x=σ2n.
That statement is certainly a mouthful, so to unpick it, we will develop the idea a little more carefully. The Central Limit Theorem is one of the most important theorems in all of statistics, so it is really important to understand what it all means.
To do this we will build the concept using a much simplified scenario. we will use a population of only five numbers. In reality many populations are infinite in size, but using just five numbers makes it easy to build the idea. Our samples taken from that population will be smaller subsets of the five numbers, starting from single number sampling and progressing through to samples of size $4$4 and $5$5.
Suppose we begin sampling, with replacement, in single units from the set of numbers $1,2,3,4,5$1,2,3,4,5.
Because our sample size is just one, the sample possibilities are $1,2,3,4$1,2,3,4 or $5$5, and the probability of each of these being selected is obviously $\frac{1}{5}$15.
We could imagine sampling single numbers continuously, so that a list of results might look something like this:
$1,5,4,5,2,3,1,2,3,5,4,1,2,2,3,4,5,2,4,3,\dots$1,5,4,5,2,3,1,2,3,5,4,1,2,2,3,4,5,2,4,3,…
We expect the average of these draws to converge on the number $3$3 simply because the mean value of the equally likely outcomes is given by $\frac{\Sigma x}{n}=\frac{1+2+3+4+5}{5}=3$Σxn=1+2+3+4+55=3. This makes perfect sense because there is no reason for the random process to favor one number over any other number, so in the long term the average should tend toward $3$3.
In the draw shown above, as each number is drawn, we see the successive averages forming as:
and so on.
The variance could also be progressively determined but we know that this would also tend to the variance of the five numbers, easily determined as the average of the squared deviations of each possible outcome from $3$3. Using computer software, we find this value to be exactly $2$2.
The probability distribution is clearly discrete and uniform as shown here:
Suppose now, instead of sampling one number at a time, we sample two numbers at a time from the set, like this:
$2-4,1-3,5-2,...$2−4,1−3,5−2,...
and begin writing down the average of each pair like this:
$(2,4)\rightarrow3,(1,3)\rightarrow2,(5,2)\rightarrow3.5$(2,4)→3,(1,3)→2,(5,2)→3.5…etc.
We find that the long term average of these averages (in other words the average of $3,2,3.5\dots$3,2,3.5…etc) tends toward the average of $3$3 as well. We can convince you of this by the following argument.
Suppose we take the overall average of the first three $2$2-number averages above:
Then the overall average becomes:
Average | $=$= | $\frac{\frac{2+4}{2}+\frac{1+3}{2}+\frac{5+2}{2}}{3}$2+42+1+32+5+223 |
$=$= | $\frac{(2+4)+(1+3)+(5+2)}{2\times3}$(2+4)+(1+3)+(5+2)2×3 | |
$=$= | $\frac{2+4+1+3+5+2}{6}$2+4+1+3+5+26 | |
But this is just the average formed by six $1$1-number averages, and we have already agreed that the sequence of these tends to the theoretical average of $3$3 because of the fact that the single numbers are all equally likely to be selected.
From the counting principle, there are $5^2=25$52=25 subsets of $2$2 possible when selecting with replacement $2$2 numbers from $5$5. These are listed in the table, along with the average shown in parentheses.
$1,1(1)$1,1(1) | $2,1(1.5)$2,1(1.5) | $3,1(2)$3,1(2) | $4,1(2.5)$4,1(2.5) | $5,1(3)$5,1(3) |
---|---|---|---|---|
$1,2(1.5)$1,2(1.5) | $2,2(2)$2,2(2) | $3,2(2.5)$3,2(2.5) | $4,2(3)$4,2(3) | $5,2(3.5)$5,2(3.5) |
$1,3(2)$1,3(2) | $2,3(2.5)$2,3(2.5) | $3,3(3)$3,3(3) | $4,3(3.5)$4,3(3.5) | $5,3(4)$5,3(4) |
$1,4(2.5)$1,4(2.5) | $2,4(3)$2,4(3) | $3,4(3.5)$3,4(3.5) | $4,4(4)$4,4(4) | $5,4(4.5)$5,4(4.5) |
$1,5(3)$1,5(3) | $2,5(3.5)$2,5(3.5) | $3,5(4)$3,5(4) | $4,5(4.5)$4,5(4.5) | $5,5(5)$5,5(5) |
Note that not all of these averages are different with the most common average being $3$3 occurring $5$5 times.
Using computer software we can verify that the average of these averages is still $3$3, but the variance has changed. The software shows a variance of just $1$1.
The distribution of these averages changes from a discrete uniform one to a discrete symmetric triangular one. Most of the pairs have an average of $3$3. Less and less averages of a certain value appear as we move further and further away from an average of $3$3.
The plot thickens. Suppose we now sample three numbers at a time.
The $5^3=125$53=125 possible averages (too many to write them all down) range from $1$1 to $5$5 again, with most of them (in fact $19$19 of them) having the value $3$3.
Here is a frequency and probability distribution table of these $125$125 averages.
$\overline{x}$x | $f$f | $P(\overline{x})$P(x) |
---|---|---|
$1$1 | $1$1 | $0.008$0.008 |
$1\frac{1}{3}$113 | $3$3 | $0.0194$0.0194 |
$1\frac{2}{3}$123 | $6$6 | $0.048$0.048 |
$2$2 | $10$10 | $0.08$0.08 |
$2\frac{1}{3}$213 | $15$15 | $0.12$0.12 |
$2\frac{2}{3}$223 | $18$18 | $0.144$0.144 |
$3$3 | $19$19 | $0.152$0.152 |
$3\frac{1}{3}$313 | $18$18 | $0.144$0.144 |
$3\frac{2}{3}$323 | $15$15 | $0.12$0.12 |
$4$4 | $10$10 | $0.08$0.08 |
$4\frac{1}{3}$413 | $6$6 | $0.048$0.048 |
$4\frac{2}{3}$423 | $3$3 | $0.0194$0.0194 |
$5$5 | $1$1 | $0.008$0.008 |
Once again we find the average of all possible averages stubbornly remains at $3$3, but the variance continues to reduce. The variance of these averages turns out to be exactly $\frac{2}{3}$23.
Something really interesting is happening to the sampling distribution. The distribution of averages looks more and more like a normal distribution, with most of the averages being $3$3 and the frequency of other averages reducing as the value of the average moves away from $3$3 above and below.
Sampling four numbers at a time produces $5^4=625$54=625 averages, and the average of these averages again remains at $3$3. The variance continues to decrease. It now becomes $\frac{2}{4}=\frac{1}{2}$24=12. The probability of selecting three numbers with an average of $3$3 becomes, from a simple count, $\frac{85}{625}=0.136$85625=0.136.
Sampling $5$5 numbers produces $3125$3125 possible averages and the average of these averages is still $3$3 but the variance reduces further to $\frac{2}{5}=0.4$25=0.4. The distribution of these averages looks approximately normal (Our particular population, being a discrete finite distribution with a least number 1 and greatest number 5, can never be truly normal).
Here is what the distribution of $\mu_{\overline{x}}$μx might look like for a sample of size $5$5:
If we look back at steps 1 to 4 and beyond, we notice two important things.
If we call the average of the averages $\mu_{\overline{x}}$μx, and the variance of the averages $\sigma_{\overline{x}}^2$σ2x, then it is true that, for any $n$n-size sample drawn from a population consisting of the five numbers $1,2,3,4,5$1,2,3,4,5, we have:
The remarkable result that this investigation is leading to is known as the Central Limit Theorem.
The Central Limit Theorem states that:
If random samples of size $n$n are drawn from any population with mean $\mu$μ and variance $\sigma^2$σ2, the sampling distribution of $\overline{x}$x will be approximately normally distributed with a mean $\mu_{\overline{x}}=\mu$μx=μ and a variance $\sigma_{\overline{x}}^2=\frac{\sigma^2}{n}$σ2x=σ2n.
What a profound statement this is!
Look back across our discussion and see that as the sample size increased, the average of the averages, $\mu_{\overline{x}}$μx, remained constant at $3$3 and the variance $\sigma_{\overline{x}}^2$σ2x, reduced steadily as $\frac{2}{1}$21, $\frac{2}{2}$22, $\frac{2}{3}$23,$\frac{2}{4}$24 and $\frac{2}{5}$25. A sample of size $6$6 would show a variance of $\frac{2}{6}=\frac{1}{3}$26=13, and so on.
The Central limit theorem (CLT for short) will be put to use in later chapters. Sample means can be determined and certain probabilistic inferences can be made about the population mean $\mu$μ itself, even though it may not be known.
A regular tetrahedral dice has four sides labeled $1,2,3$1,2,3 and $4$4. Let $X$X be the outcome when the dice is rolled.
$\sigma^2$σ2 | $=$= | $\Sigma\frac{(X-\overline{x})^2}{n}$Σ(X−x)2n |
$=$= | $\frac{(1-2.5)^2+(2-2.5)^2+(3-2.5)^2+(4-2.5)^2}{4}$(1−2.5)2+(2−2.5)2+(3−2.5)2+(4−2.5)24 | |
$=$= | $\frac{5}{4}$54 | |
$=$= | $1.25$1.25 | |
Therefore the standard deviation $\sigma_{\overline{x}}=\sqrt{1.25}=\frac{\sqrt{5}}{2}\approx1.118$σx=√1.25=√52≈1.118.
Hence the sampling distribution has a mean of $2.5$2.5 and a standard deviation of $1.118$1.118. This information can be used to predict the results of future samples of size $64$64 taken from the same population. For example, using the empirical rule, we can state that there is about a $68%$68% chance of a future sample of size $64$64 will have mean somewhere between $2.5\pm\times1.118$2.5±×1.118.
Consider a fair $6$6 sided dice, with faces labeled from $1$1 to $6$6. Let $X$X be the outcome when the dice is rolled.
What type of distribution does $X$X represent?
Continuous Uniform Distribution
Discrete Uniform Distribution
Normal Distribution
Exponential Distribution
Many samples of size $75$75 are taken from the distribution, and the means of each of the samples $\overline{X}$X calculated.
What type of distribution does $\overline{X}$X approximately represent?
Discrete Uniform Distribution
Exponential Distribution
Continuous Uniform Distribution
Normal Distribution
Calculate the mean of $\overline{X}$X.
Calculate the standard deviation of $\overline{X}$X corresponding to a sample size of $75$75. Round your answer to $2$2 decimal places.
A discrete random variable $X$X has a mean of $0.1$0.1 and a variance of $1.3$1.3. Samples of $60$60 observations of $X$X are taken and $\overline{X}$X, the mean of each sample, was calculated.
What is the mean of $\overline{X}$X?
What is the standard deviation of $\overline{X}$X? Round your answer to two decimal places.
Using your answers to part (a) and part (b), calculate $P($P($0<\overline{X}<0.2$0<X<0.2$)$). Write your answer to two decimal places.
Using your answers to part (a) and part (b), calculate $P(\overline{X}<$P(X<$0.3$0.3$|\overline{X}>$|X>$0.2$0.2$)$)
Write your answer to two decimal places.
The weight of small tins of tuna represented by the random variable $X$X is normally distributed with a mean of $90.4$90.4 g and a standard deviation of $6.5$6.5 g.
If the cans are advertised as weighing $92$92 g, what is the probability a randomly chosen can is underweight? Round your answer to two decimal places.
What is the expected value of $\overline{X}$X, the sample mean of a randomly chosen sample of size $50$50?
Calculate the standard deviation for $\overline{X}$X. Round your answer to three decimal places.
Calculate the probability, $p$p, that a randomly chosen sample of size $50$50 has a mean weight less than the advertised weight, using the central limit theorem.
Give your answer to the nearest two decimal places.
$45$45 samples, each of size $50$50 are taken.
Calculate the probability, $q$q, that more than $41$41 samples each have a mean weight less than the advertised weight, using the central limit theorem.
Give your answer to the nearest two decimal places.
When graphing binomial distributions, we saw that the shape of a binomial distribution depends on the number of Bernoulli trials in the experiment ($n$n) and the probability of obtaining a success on any particular trial ($p$p).
In studying binomial distributions, we saw that if $n$n was too small or $p$p was too close to $0$0 or $1$1, the distributions were skewed. However, binomial distributions are approximately normal when $n$n is large enough and $p$p is not too close to $0$0 or $1$1.
Use this applet to investigate which values of $n$n and $p$p work best with a normal approximation.
As a general rule, both the mean number of successes $\mu=np$μ=np and the mean number of failures $n-\mu=n(1-p)$n−μ=n(1−p) must be greater than $5$5 for a good approximation by the normal curve.
The normal approximation to a binomial distribution is used because calculation with the help of a standard normal distribution table is often easier than calculation with the binomial formula, particularly when $n$n is large.
Suppose a balanced coin is tossed $21$21 times. We wish to determine the probability of getting exactly $7$7 heads. In this case $p=\frac{1}{2}$p=12 and $\sigma=\sqrt{21\times\frac{1}{2}\times\left(1-\frac{1}{2}\right)}=2.29$σ=√21×12×(1−12)=2.29.
According to the binomial formula, we need to calculate
$P(k=7)=\binom{21}{7}\left(\frac{1}{2}\right)^7\left(1-\frac{1}{2}\right)^{21-7}$P(k=7)=(217)(12)7(1−12)21−7
This is easy enough to do with a good calculator but would be more difficult by hand as the numbers in the binomial coefficient are large. We find $P(k=7)=0.0554$P(k=7)=0.0554.
To work this out using the normal approximation, we have to represent the number of successes $(k=7)$(k=7) by the interval $(6.5,7.5)$(6.5,7.5). (This is called a continuity correction.)
Next, we obtain the $z$z-scores corresponding to $6.5$6.5 and $7.5$7.5 when the mean is $np=21\times0.5=10.5$np=21×0.5=10.5, and the standard deviation is $\sqrt{np(1-p)}=\sqrt{10.5\times0.5}=2.29$√np(1−p)=√10.5×0.5=2.29.
The transformed interval is $\left(\frac{6.5-10.5}{2.29},\frac{7.5-10.5}{2.29}\right)\approx\left(-1.747,-1.31\right)$(6.5−10.52.29,7.5−10.52.29)≈(−1.747,−1.31).
We use the table for the standard normal probability density function to obtain the probability under the curve between $-1.747$−1.747 and $-1.31$−1.31. By symmetry, this is the same as the probability between $1.31$1.31 and $1.747$1.747. (You can find a table here.)
You should check that the probability obtained this way is $0.4599-0.4049=0.055$0.4599−0.4049=0.055 which is close to the result from the binomial formula.
An aerosol spray has been designed to kill house flies. An individual fly exposed to the aerosol under test conditions has a probability of dying within one minute of $0.9$0.9. If $500$500 flies are exposed to the aerosol, what is the probability that at least $450$450 of them will be dead within one minute?
We need to add the probabilities of $450,451,452,...,500$450,451,452,...,500 flies dying. This would be a very cumbersome calculation if done with the binomial formula. We use, instead, the normal approximation and look for the area beneath the normal density curve above $449.5$449.5 (using the continuity correction as before).
The mean of the binomial distribution is $np=500\times0.9=450$np=500×0.9=450 and the standard deviation is $\sqrt{500\times0.9\times0.1}\approx4.5$√500×0.9×0.1≈4.5. Thus, we need the area above $\frac{449.5-450}{4.5}\approx-0.11$449.5−4504.5≈−0.11 in the standard normal distribution.
This area is $0.5$0.5 plus the area between $0$0 and $0.11$0.11. So, according to the table, the probability must be $0.5+0.0438\approx0.54$0.5+0.0438≈0.54.
If needed, we can work out from this that the probability that fewer than $450$450 flies will be dead after $1$1 minute is $0.46$0.46.
If $n$n is sufficiently large and $p$p is not too close to $0$0 or $1$1, (both $np>5$np>5 and $n(1-p)>5$n(1−p)>5) then the binomial distribution will be approximately normal, with a mean of $\mu=np$μ=np and a standard deviation of $\sigma=\sqrt{np\left(1-p\right)}$σ=√np(1−p).