topic badge

8.15 Sample means and population means

Lesson

Statisticians often find themselves sampling from all sorts of populations in order to estimate some characteristic of it.

Without conducting a census of a population, we can never know with $100%$100% certainty about any population parameter, such as for example, the population mean $\mu$μ and population standard deviation $\sigma^2$σ2.

 

the effect of the sample size

If a random sample of size $n$n is taken from a population of mean $\mu$μ (whether known or unknown), and a sample mean $\overline{x}$x is determined, then that sample mean becomes an estimate of $\mu$μ. The quality of the estimate is clearly dependent on the size of the random sample taken.

Suppose for example we take a random sample of size $2$2, say without replacement, from the population of $100$100 numbers given by $1,2,3...,99,100$1,2,3...,99,100. As each number is equally likely to be selected, the two numbers chosen could have a sample mean anywhere between $1.5$1.5 (from a sample consisting of $1$1 and $2$2) and $99.5$99.5(from $99$99 and $100$100). In fact we know that the population mean $\mu=\Sigma\frac{x}{n}=\frac{5050}{100}=50.5$μ=Σxn=5050100=50.5, and neither $1.5$1.5 or $100$100 are useful estimates.

However if our sample size was $90$90 numbers, then the sample mean could be no lower than $\frac{1+2+3+...+90}{90}=45.5$1+2+3+...+9090=45.5 and no higher than $\frac{11+12+13+...+100}{90}=55.5$11+12+13+...+10090=55.5. That is to say, with large samples the variation in the sample mean naturally reduces and, as a consequence, the sample mean becomes a more reliable estimate.

This is also true even if the sampling is done with replacement. In our example, sampling this time with replacement means that there are $100^2=10000$1002=10000 possible samples of size $2$2 possible. The average of two whole numbers can either be a whole number itself or else a number ending in $0.5$0.5. This means that there are only $199$199 possible averages available for the $10000$10000 possible size $2$2 samples. Imagine these averages as containers with identifying labels on them - $4.5,86,50.5,99$4.5,86,50.5,99, etc.  If we had a mind to determine all of these sample averages, they would all be forced into one of these $199$199 containers.

Now they don't fall into the containers in equal numbers. From the Central Limit Theorem, we know that most of the averages fall centrally, congregated on and around $50.5$50.5. This is of course is the population mean $\mu$μ. This means that there is a higher chance of picking two numbers whose average is close $50.5$50.5 than any other average in the spectrum. 

We also know from the central limit theorem that the larger the sample size, the more congregated the sample means are around $\mu$μ and the more chance there is that the sample mean becomes a reliable estimate. For example, there are a million ($100^3$1003)  samples of size $3$3 with means that need distributing into just $298$298 containers, and the distribution is even more gathered around $\mu$μ than those of the size $2$2 samples. 

Unbiased estimators

The sample mean $\overline{x}$x is also said to be an unbiased estimator of $\mu$μ because the overall average of all possible sample means is the population mean itself. So for example in the case of the population given as $1,2,3,...,99,100$1,2,3,...,99,100, we can show that the arithmetic average of all $10000$10000 size-two sample means equals the population mean $\mu$μ. We can also show that the million size-three sample averages have an overall average which is equal to $\mu$μ. In fact this generally applies to any population no matter how large or how it is distributed.

The unbiased estimate of the population standard deviation $\sigma$σ is the sample standard deviation $s$s where $s=\sqrt{\frac{(x-\overline{x})^2}{n-1}}$s=(xx)2n1 with the denominator of the fraction inside the square root sign given as $n-1$n1 rather than $n$n.  

That is, if we were to take all possible random samples of size $n$n from a population and compute an unbiased statistic $s^2$s2 for each sample, the average of all the $s^2$s2 values should be equal to population variance $\sigma^2$σ2. To get this to happen, $n$n must be replaced by $n-1$n1 in the formula. 

 

Examples

Example 1

A random number generator generates random integers between $11$11 and $20$20 inclusive.

  1. What type of probability distribution does this scenario represent?
  2. What is the mean or expected value of this distribution of random numbers generated?
  3. A sample of $10$10 numbers are generated as $12,14,15,17,19,15,11,20,16,11$12,14,15,17,19,15,11,20,16,11.  Calculate the sample mean $\overline{x}$x and standard deviation $s$s of the sample. 

Q1

Each number is equally likely to be selected, so this is a discrete uniform distribution where each number has a probability of being selected of $0.1$0.1.

Q2

The mean is given by $\mu=\Sigma\frac{11+12+13+...+20}{10}=15.5$μ=Σ11+12+13+...+2010=15.5

Q3

The sample mean $\overline{x}=\Sigma\frac{12+14+15+...+11}{10}=\frac{140}{10}=14$x=Σ12+14+15+...+1110=14010=14.

The sample variance is given by:

$s^2=\frac{(x-\overline{x})^2}{n-1}=\frac{(12-14)^2+(14-14)^2+...+(11-14)^2}{10-1}=\frac{88}{9}\approx9.7778$s2=(xx)2n1=(1214)2+(1414)2+...+(1114)2101=8899.7778

Therefore the sample standard deviation $s=\sqrt{9.7778}=3.1269$s=9.7778=3.1269

Example 2

This example shows how to evaluate the population mean $\mu$μ and standard deviation $\sigma$σ for a continuous probability density function. 

We take as our example, the probability density function given by $y=\frac{x^2}{21}$y=x221 over the domain $1\le x\le4$1x4 and $0$0 everywhere else.

The particular function is shown here:

 

 

Checking first that the shaded area is unity, we have:

Area $=$= $\int_1^4\frac{x^2}{21}dx$41x221dx
  $=$= $\left[\frac{x^3}{63}\right]_1^4$[x363]41
  $=$= $\frac{64}{63}-\frac{1}{63}$6463163
  $=$= $1$1
     

The mean or expected value is given by:

$\mu$μ $=$= $\int_1^4xf\left(x\right)dx$41xf(x)dx
  $=$= $\int_1^4\frac{x^3}{21}dx$41x321dx
  $=$= $\left[\frac{x^4}{84}\right]_1^4$[x484]41
  $=$= $\frac{256-1}{84}$256184
  $=$= $3\frac{1}{28}$3128
  $\approx$ $3.0357$3.0357

The population variance given by $\sigma^2=E[X^2]-\mu^2$σ2=E[X2]μ2 is evaluated as follows:

$\sigma^2$σ2 $=$= $\int_1^4x^2f\left(x\right)dx-\mu^2$41x2f(x)dxμ2
  $=$= $\int_1^4\frac{x^4}{21}dx-\mu^2$41x421dxμ2
  $=$= $\left[\frac{x^5}{105}\right]_1^4-\mu^2$[x5105]41μ2
  $=$= $\frac{1024-1}{105}-(3.0357)^2$10241105(3.0357)2
  $\approx$ $0.527959$0.527959
     

Hence $\sigma=\sqrt{(}0.527959)=0.7261514$σ=(0.527959)=0.7261514

 

 

QUESTION 3

Consider a fair $6$6 sided dice, with faces labeled from $1$1 to $6$6. Let $X$X be the outcome when the dice is rolled.

  1. What type of distribution does $X$X represent?

    Continuous Uniform Distribution

    A

    Discrete Uniform Distribution

    B

    Normal Distribution

    C

    Exponential Distribution

    D
  2. Many samples of size $75$75 are taken from the distribution, and the means of each of the samples $\overline{X}$X calculated.

    What type of distribution does $\overline{X}$X approximately represent?

    Discrete Uniform Distribution

    A

    Exponential Distribution

    B

    Continuous Uniform Distribution

    C

    Normal Distribution

    D
  3. Calculate the mean of $\overline{X}$X.

  4. Calculate the standard deviation of $\overline{X}$X corresponding to a sample size of $75$75. Round your answer to $2$2 decimal places.

QUESTION 4

A discrete random variable $X$X has a mean of $0.1$0.1 and a variance of $1.3$1.3. Samples of $60$60 observations of $X$X are taken and $\overline{X}$X, the mean of each sample, was calculated.

  1. What is the mean of $\overline{X}$X?

  2. What is the standard deviation of $\overline{X}$X? Round your answer to two decimal places.

  3. Using your answers to part (a) and part (b), calculate $P($P($0<\overline{X}<0.2$0<X<0.2$)$). Write your answer to two decimal places.

  4. Using your answers to part (a) and part (b), calculate $P(\overline{X}<$P(X<$0.3$0.3$|\overline{X}>$|X>$0.2$0.2$)$)

    Write your answer to two decimal places.

QUESTION 5

The weight of small tins of tuna represented by the random variable $X$X is normally distributed with a mean of $90.4$90.4 g and a standard deviation of $6.5$6.5 g.

  1. If the cans are advertised as weighing $92$92 g, what is the probability a randomly chosen can is underweight? Round your answer to two decimal places.

  2. What is the expected value of $\overline{X}$X, the sample mean of a randomly chosen sample of size $50$50?

  3. Calculate the standard deviation for $\overline{X}$X. Round your answer to three decimal places.

  4. Calculate the probability, $p$p, that a randomly chosen sample of size $50$50 has a mean weight less than the advertised weight, using the central limit theorem.

    Give your answer to the nearest two decimal places.

  5. $45$45 samples, each of size $50$50 are taken.

    Calculate the probability, $q$q, that more than $41$41 samples each have a mean weight less than the advertised weight, using the central limit theorem.

    Give your answer to the nearest two decimal places.

What is Mathspace

About Mathspace