NZ Level 8 (NZC) Level 3 (NCEA) [In development]
Sample means and population means
Lesson

Statisticians often find themselves sampling from all sorts of populations in order to estimate some characteristic of it.

Without conducting a census of a population, we can never know with $100%$100% certainty about any population parameter, such as for example, the population mean $\mu$μ and population standard deviation $\sigma^2$σ2.

### the effect of the sample size

If a random sample of size $n$n is taken from a population of mean $\mu$μ (whether known or unknown), and a sample mean $\overline{x}$x is determined, then that sample mean becomes an estimate of $\mu$μ. The quality of the estimate is clearly dependent on the size of the random sample taken.

Suppose for example we take a random sample of size $2$2, say without replacement, from the population of $100$100 numbers given by $1,2,3...,99,100$1,2,3...,99,100. As each number is equally likely to be selected, the two numbers chosen could have a sample mean anywhere between $1.5$1.5 (from a sample consisting of $1$1 and $2$2) and $99.5$99.5(from $99$99 and $100$100). In fact we know that the population mean $\mu=\Sigma\frac{x}{n}=\frac{5050}{100}=50.5$μ=Σxn=5050100=50.5, and neither $1.5$1.5 or $100$100 are useful estimates.

However if our sample size was $90$90 numbers, then the sample mean could be no lower than $\frac{1+2+3+...+90}{90}=45.5$1+2+3+...+9090=45.5 and no higher than $\frac{11+12+13+...+100}{90}=55.5$11+12+13+...+10090=55.5. That is to say, with large samples the variation in the sample mean naturally reduces and, as a consequence, the sample mean becomes a more reliable estimate.

This is also true even if the sampling is done with replacement. In our example, sampling this time with replacement means that there are $100^2=10000$1002=10000 possible samples of size $2$2 possible. The average of two whole numbers can either be a whole number itself or else a number ending in $0.5$0.5. This means that there are only $199$199 possible averages available for the $10000$10000 possible size $2$2 samples. Imagine these averages as containers with identifying labels on them - $4.5,86,50.5,99$4.5,86,50.5,99, etc.  If we had a mind to determine all of these sample averages, they would all be forced into one of these $199$199 containers.

Now they don't fall into the containers in equal numbers. From the Central Limit Theorem, we know that most of the averages fall centrally, congregated on and around $50.5$50.5. This is of course is the population mean $\mu$μ. This means that there is a higher chance of picking two numbers whose average is close $50.5$50.5 than any other average in the spectrum.

We also know from the central limit theorem that the larger the sample size, the more congregated the sample means are around $\mu$μ and the more chance there is that the sample mean becomes a reliable estimate. For example, there are a million ($100^3$1003)  samples of size $3$3 with means that need distributing into just $298$298 containers, and the distribution is even more gathered around $\mu$μ than those of the size $2$2 samples.

### Unbiased estimators

The sample mean $\overline{x}$x is also said to be an unbiased estimator of $\mu$μ because the overall average of all possible sample means is the population mean itself. So for example in the case of the population given as $1,2,3,...,99,100$1,2,3,...,99,100, we can show that the arithmetic average of all $10000$10000 size-two sample means equals the population mean $\mu$μ. We can also show that the million size-three sample averages have an overall average which is equal to $\mu$μ. In fact this generally applies to any population no matter how large or how it is distributed.

The unbiased estimate of the population standard deviation $\sigma$σ is the sample standard deviation $s$s where $s=\sqrt{\frac{(x-\overline{x})^2}{n-1}}$s=(xx)2n1 with the denominator of the fraction inside the square root sign given as $n-1$n1 rather than $n$n.

That is, if we were to take all possible random samples of size $n$n from a population and compute an unbiased statistic $s^2$s2 for each sample, the average of all the $s^2$s2 values should be equal to population variance $\sigma^2$σ2. To get this to happen, $n$n must be replaced by $n-1$n1 in the formula.

#### Examples

##### Example 1

A random number generator generates random integers between $11$11 and $20$20 inclusive.

1. What type of probability distribution does this scenario represent?
2. What is the mean or expected value of this distribution of random numbers generated?
3. A sample of $10$10 numbers are generated as $12,14,15,17,19,15,11,20,16,11$12,14,15,17,19,15,11,20,16,11.  Calculate the sample mean $\overline{x}$x and standard deviation $s$s of the sample.

Q1

Each number is equally likely to be selected, so this is a discrete uniform distribution where each number has a probability of being selected of $0.1$0.1.

Q2

The mean is given by $\mu=\Sigma\frac{11+12+13+...+20}{10}=15.5$μ=Σ11+12+13+...+2010=15.5

Q3

The sample mean $\overline{x}=\Sigma\frac{12+14+15+...+11}{10}=\frac{140}{10}=14$x=Σ12+14+15+...+1110=14010=14.

The sample variance is given by:

$s^2=\frac{(x-\overline{x})^2}{n-1}=\frac{(12-14)^2+(14-14)^2+...+(11-14)^2}{10-1}=\frac{88}{9}\approx9.7778$s2=(xx)2n1=(1214)2+(1414)2+...+(1114)2101=8899.7778

Therefore the sample standard deviation $s=\sqrt{9.7778}=3.1269$s=9.7778=3.1269

##### Example 2

This example shows how to evaluate the population mean $\mu$μ and standard deviation $\sigma$σ for a continuous probability density function.

We take as our example, the probability density function given by $y=\frac{x^2}{21}$y=x221 over the domain $1\le x\le4$1x4 and $0$0 everywhere else.

The particular function is shown here:

Checking first that the shaded area is unity, we have:

 Area $=$= $\int_1^4\frac{x^2}{21}dx$∫41​x221​dx $=$= $\left[\frac{x^3}{63}\right]_1^4$[x363​]41​ $=$= $\frac{64}{63}-\frac{1}{63}$6463​−163​ $=$= $1$1

The mean or expected value is given by:

 $\mu$μ $=$= $\int_1^4xf\left(x\right)dx$∫41​xf(x)dx $=$= $\int_1^4\frac{x^3}{21}dx$∫41​x321​dx $=$= $\left[\frac{x^4}{84}\right]_1^4$[x484​]41​ $=$= $\frac{256-1}{84}$256−184​ $=$= $3\frac{1}{28}$3128​ $\approx$≈ $3.0357$3.0357

The population variance given by $\sigma^2=E[X^2]-\mu^2$σ2=E[X2]μ2 is evaluated as follows:

 $\sigma^2$σ2 $=$= $\int_1^4x^2f\left(x\right)dx-\mu^2$∫41​x2f(x)dx−μ2 $=$= $\int_1^4\frac{x^4}{21}dx-\mu^2$∫41​x421​dx−μ2 $=$= $\left[\frac{x^5}{105}\right]_1^4-\mu^2$[x5105​]41​−μ2 $=$= $\frac{1024-1}{105}-(3.0357)^2$1024−1105​−(3.0357)2 $\approx$≈ $0.527959$0.527959

Hence $\sigma=\sqrt{(}0.527959)=0.7261514$σ=(0.527959)=0.7261514

##### QUESTION 3

Consider a fair $6$6 sided dice, with faces labeled from $1$1 to $6$6. Let $X$X be the outcome when the dice is rolled.

1. What type of distribution does $X$X represent?

Continuous Uniform Distribution

A

Discrete Uniform Distribution

B

Normal Distribution

C

Exponential Distribution

D

Continuous Uniform Distribution

A

Discrete Uniform Distribution

B

Normal Distribution

C

Exponential Distribution

D
2. Many samples of size $75$75 are taken from the distribution, and the means of each of the samples $\overline{X}$X calculated.

What type of distribution does $\overline{X}$X approximately represent?

Discrete Uniform Distribution

A

Exponential Distribution

B

Continuous Uniform Distribution

C

Normal Distribution

D

Discrete Uniform Distribution

A

Exponential Distribution

B

Continuous Uniform Distribution

C

Normal Distribution

D
3. Calculate the mean of $\overline{X}$X.

4. Calculate the standard deviation of $\overline{X}$X corresponding to a sample size of $75$75. Round your answer to $2$2 decimal places.

##### QUESTION 4

A discrete random variable $X$X has a mean of $0.1$0.1 and a variance of $1.3$1.3. Samples of $60$60 observations of $X$X are taken and $\overline{X}$X, the mean of each sample, was calculated.

1. What is the mean of $\overline{X}$X?

2. What is the standard deviation of $\overline{X}$X? Round your answer to two decimal places.

3. Using your answers to part (a) and part (b), calculate $P($P($0<\overline{X}<0.2$0<X<0.2$)$). Write your answer to two decimal places.

4. Using your answers to part (a) and part (b), calculate $P(\overline{X}<$P(X<$0.3$0.3$|\overline{X}>$|X>$0.2$0.2$)$)

##### QUESTION 5

The weight of small tins of tuna represented by the random variable $X$X is normally distributed with a mean of $90.4$90.4 g and a standard deviation of $6.5$6.5 g.

1. If the cans are advertised as weighing $92$92 g, what is the probability a randomly chosen can is underweight? Round your answer to two decimal places.

2. What is the expected value of $\overline{X}$X, the sample mean of a randomly chosen sample of size $50$50?

3. Calculate the standard deviation for $\overline{X}$X. Round your answer to three decimal places.

4. Calculate the probability, $p$p, that a randomly chosen sample of size $50$50 has a mean weight less than the advertised weight, using the central limit theorem.

5. $45$45 samples, each of size $50$50 are taken.

Calculate the probability, $q$q, that more than $41$41 samples each have a mean weight less than the advertised weight, using the central limit theorem.

### Outcomes

#### S8-2

Make inferences from surveys and experiments: A determining estimates and confidence intervals for means, proportions, and differences, recognising the relevance of the central limit theorem B using methods such as resampling or randomisation to assess

#### 91582

Use statistical methods to make a formal inference