Without conducting a census of a population, we can never know with $100%$100% certainty about any population parameter, such as for example, the population mean $\mu$μ and population standard deviation $\sigma^2$σ2.

the effect of the sample size

If a random sample of size $n$n is taken from a population of mean $\mu$μ (whether known or unknown), and a sample mean $\overline{x}$x is determined, then that sample mean becomes an estimate of $\mu$μ. The quality of the estimate is clearly dependent on the size of the random sample taken.

Suppose for example we take a random sample of size $2$2, say without replacement, from the population of $100$100 numbers given by $1,2,3...,99,100$1,2,3...,99,100. As each number is equally likely to be selected, the two numbers chosen could have a sample mean anywhere between $1.5$1.5 (from a sample consisting of $1$1 and $2$2) and $99.5$99.5(from $99$99 and $100$100). In fact we know that the population mean $\mu=\Sigma\frac{x}{n}=\frac{5050}{100}=50.5$μ=Σxn=5050100=50.5, and neither $1.5$1.5 or $100$100 are useful estimates.

However if our sample size was $90$90 numbers, then the sample mean could be no lower than $\frac{1+2+3+...+90}{90}=45.5$1+2+3+...+9090=45.5 and no higher than $\frac{11+12+13+...+100}{90}=55.5$11+12+13+...+10090=55.5. That is to say, with large samples the variation in the sample mean naturally reduces and, as a consequence, the sample mean becomes a more reliable estimate.

This is also true even if the sampling is done with replacement. In our example, sampling this time with replacement means that there are $100^2=10000$1002=10000 possible samples of size $2$2 possible. The average of two whole numbers can either be a whole number itself or else a number ending in $0.5$0.5. This means that there are only $199$199 possible averages available for the $10000$10000 possible size $2$2 samples. Imagine these averages as containers with identifying labels on them - $4.5,86,50.5,99$4.5,86,50.5,99, etc. If we had a mind to determine all of these sample averages, they would all be forced into one of these $199$199 containers.

Now they don't fall into the containers in equal numbers. From the Central Limit Theorem, we know that most of the averages fall centrally, congregated on and around $50.5$50.5. This is of course is the population mean $\mu$μ. This means that there is a higher chance of picking two numbers whose average is close $50.5$50.5 than any other average in the spectrum.

We also know from the central limit theorem that the larger the sample size, the more congregated the sample means are around $\mu$μ and the more chance there is that the sample mean becomes a reliable estimate. For example, there are a million ($100^3$1003) samples of size $3$3 with means that need distributing into just $298$298 containers, and the distribution is even more gathered around $\mu$μ than those of the size $2$2 samples.

Unbiased estimators

The sample mean $\overline{x}$x is also said to be an unbiased estimator of $\mu$μ because the overall average of all possible sample means is the population mean itself. So for example in the case of the population given as $1,2,3,...,99,100$1,2,3,...,99,100, we can show that the arithmetic average of all $10000$10000 size-two sample means equals the population mean $\mu$μ. We can also show that the million size-three sample averages have an overall average which is equal to $\mu$μ. In fact this generally applies to any population no matter how large or how it is distributed.

The unbiased estimate of the population standard deviation $\sigma$σ is the sample standard deviation $s$s where $s=\sqrt{\frac{(x-\overline{x})^2}{n-1}}$s=√(x−x)2n−1 with the denominator of the fraction inside the square root sign given as $n-1$n−1 rather than $n$n.

That is, if we were to take all possible random samples of size $n$n from a population and compute an unbiased statistic $s^2$s2 for each sample, the average of all the $s^2$s2 values should be equal to population variance $\sigma^2$σ2. To get this to happen, $n$n must be replaced by $n-1$n−1 in the formula.

Examples

Example 1

A random number generator generates random integers between $11$11 and $20$20 inclusive.

What type of probability distribution does this scenario represent?
What is the mean or expected value of this distribution of random numbers generated?
A sample of $10$10 numbers are generated as $12,14,15,17,19,15,11,20,16,11$12,14,15,17,19,15,11,20,16,11. Calculate the sample mean $\overline{x}$x and standard deviation $s$s of the sample.

Each number is equally likely to be selected, so this is a discrete uniform distribution where each number has a probability of being selected of $0.1$0.1.

The mean is given by $\mu=\Sigma\frac{11+12+13+...+20}{10}=15.5$μ=Σ11+12+13+...+2010=15.5

The sample mean $\overline{x}=\Sigma\frac{12+14+15+...+11}{10}=\frac{140}{10}=14$x=Σ12+14+15+...+1110=14010=14.

The sample variance is given by:

$s^2=\frac{(x-\overline{x})^2}{n-1}=\frac{(12-14)^2+(14-14)^2+...+(11-14)^2}{10-1}=\frac{88}{9}\approx9.7778$s2=(x−x)2n−1=(12−14)2+(14−14)2+...+(11−14)210−1=889≈9.7778

Therefore the sample standard deviation $s=\sqrt{9.7778}=3.1269$s=√9.7778=3.1269.

Example 2

This example shows how to evaluate the population mean $\mu$μ and standard deviation $\sigma$σ for a continuous probability density function.

We take as our example, the probability density function given by $y=\frac{x^2}{21}$y=x221 over the domain $1\le x\le4$1≤x≤4 and $0$0 everywhere else.

The particular function is shown here:

Checking first that the shaded area is unity, we have:

Area	$=$=	$\int_1^4\frac{x^2}{21}dx$∫41`x`221`dx`
	$=$=	$\left[\frac{x^3}{63}\right]_1^4$[`x`363]41
	$=$=	$\frac{64}{63}-\frac{1}{63}$6463−163
	$=$=	$1$1

The mean or expected value is given by:

$\mu$`μ`	$=$=	$\int_1^4xf\left(x\right)dx$∫41`xf`(`x`)`dx`
	$=$=	$\int_1^4\frac{x^3}{21}dx$∫41`x`321`dx`
	$=$=	$\left[\frac{x^4}{84}\right]_1^4$[`x`484]41
	$=$=	$\frac{256-1}{84}$256−184
	$=$=	$3\frac{1}{28}$3128
	$\approx$≈	$3.0357$3.0357

The population variance given by $\sigma^2=E[X^2]-\mu^2$σ2=E[X2]−μ2 is evaluated as follows:

$\sigma^2$`σ`2	$=$=	$\int_1^4x^2f\left(x\right)dx-\mu^2$∫41`x`2`f`(`x`)`dx`−`μ`2
	$=$=	$\int_1^4\frac{x^4}{21}dx-\mu^2$∫41`x`421`dx`−`μ`2
	$=$=	$\left[\frac{x^5}{105}\right]_1^4-\mu^2$[`x`5105]41−`μ`2
	$=$=	$\frac{1024-1}{105}-(3.0357)^2$1024−1105−(3.0357)2
	$\approx$≈	$0.527959$0.527959

Hence $\sigma=\sqrt{(}0.527959)=0.7261514$σ=√(0.527959)=0.7261514.

QUESTION 3

Consider a fair $6$6 sided dice, with faces labeled from $1$1 to $6$6. Let $X$X be the outcome when the dice is rolled.

What type of distribution does $X$X represent?
Continuous Uniform Distribution
A
Discrete Uniform Distribution
B
Normal Distribution
C
Exponential Distribution
D
Many samples of size $75$75 are taken from the distribution, and the means of each of the samples $\overline{X}$X calculated.

What type of distribution does $\overline{X}$X approximately represent?
Discrete Uniform Distribution
A
Exponential Distribution
B
Continuous Uniform Distribution
C
Normal Distribution
D
Calculate the mean of $\overline{X}$X.
Calculate the standard deviation of $\overline{X}$X corresponding to a sample size of $75$75. Round your answer to $2$2 decimal places.

QUESTION 4

A discrete random variable $X$X has a mean of $0.1$0.1 and a variance of $1.3$1.3. Samples of $60$60 observations of $X$X are taken and $\overline{X}$X, the mean of each sample, was calculated.

What is the mean of $\overline{X}$X?
What is the standard deviation of $\overline{X}$X? Round your answer to two decimal places.
Using your answers to part (a) and part (b), calculate $P($P($0<\overline{X}<0.2$0<X<0.2$)$). Write your answer to two decimal places.
Using your answers to part (a) and part (b), calculate $P(\overline{X}<$P(X<$0.3$0.3$|\overline{X}>$|X>$0.2$0.2$)$)

Write your answer to two decimal places.

QUESTION 5

The weight of small tins of tuna represented by the random variable $X$X is normally distributed with a mean of $90.4$90.4 g and a standard deviation of $6.5$6.5 g.

If the cans are advertised as weighing $92$92 g, what is the probability a randomly chosen can is underweight? Round your answer to two decimal places.
What is the expected value of $\overline{X}$X, the sample mean of a randomly chosen sample of size $50$50?
Calculate the standard deviation for $\overline{X}$X. Round your answer to three decimal places.
Calculate the probability, $p$p, that a randomly chosen sample of size $50$50 has a mean weight less than the advertised weight, using the central limit theorem.

Give your answer to the nearest two decimal places.
$45$45 samples, each of size $50$50 are taken.

Calculate the probability, $q$q, that more than $41$41 samples each have a mean weight less than the advertised weight, using the central limit theorem.

Give your answer to the nearest two decimal places.

8.15 Sample means and population means