Statisticians often find themselves sampling from all sorts of populations in order to estimate some characteristic of it.
Without conducting a census of a population, we can never know with $100%$100% certainty about any population parameter, such as for example, the population mean $\mu$μ and population standard deviation $\sigma^2$σ2.
If a random sample of size $n$n is taken from a population of mean $\mu$μ (whether known or unknown), and a sample mean $\overline{x}$x is determined, then that sample mean becomes an estimate of $\mu$μ. The quality of the estimate is clearly dependent on the size of the random sample taken.
Suppose for example we take a random sample of size $2$2, say without replacement, from the population of $100$100 numbers given by $1,2,3...,99,100$1,2,3...,99,100. As each number is equally likely to be selected, the two numbers chosen could have a sample mean anywhere between $1.5$1.5 (from a sample consisting of $1$1 and $2$2) and $99.5$99.5(from $99$99 and $100$100). In fact we know that the population mean $\mu=\Sigma\frac{x}{n}=\frac{5050}{100}=50.5$μ=Σxn=5050100=50.5, and neither $1.5$1.5 or $100$100 are useful estimates.
However if our sample size was $90$90 numbers, then the sample mean could be no lower than $\frac{1+2+3+...+90}{90}=45.5$1+2+3+...+9090=45.5 and no higher than $\frac{11+12+13+...+100}{90}=55.5$11+12+13+...+10090=55.5. That is to say, with large samples the variation in the sample mean naturally reduces and, as a consequence, the sample mean becomes a more reliable estimate.
This is also true even if the sampling is done with replacement. In our example, sampling this time with replacement means that there are $100^2=10000$1002=10000 possible samples of size $2$2 possible. The average of two whole numbers can either be a whole number itself or else a number ending in $0.5$0.5. This means that there are only $199$199 possible averages available for the $10000$10000 possible size $2$2 samples. Imagine these averages as containers with identifying labels on them - $4.5,86,50.5,99$4.5,86,50.5,99, etc. If we had a mind to determine all of these sample averages, they would all be forced into one of these $199$199 containers.
Now they don't fall into the containers in equal numbers. From the Central Limit Theorem, we know that most of the averages fall centrally, congregated on and around $50.5$50.5. This is of course is the population mean $\mu$μ. This means that there is a higher chance of picking two numbers whose average is close $50.5$50.5 than any other average in the spectrum.
We also know from the central limit theorem that the larger the sample size, the more congregated the sample means are around $\mu$μ and the more chance there is that the sample mean becomes a reliable estimate. For example, there are a million ($100^3$1003) samples of size $3$3 with means that need distributing into just $298$298 containers, and the distribution is even more gathered around $\mu$μ than those of the size $2$2 samples.
The sample mean $\overline{x}$x is also said to be an unbiased estimator of $\mu$μ because the overall average of all possible sample means is the population mean itself. So for example in the case of the population given as $1,2,3,...,99,100$1,2,3,...,99,100, we can show that the arithmetic average of all $10000$10000 size-two sample means equals the population mean $\mu$μ. We can also show that the million size-three sample averages have an overall average which is equal to $\mu$μ. In fact this generally applies to any population no matter how large or how it is distributed.
The unbiased estimate of the population standard deviation $\sigma$σ is the sample standard deviation $s$s where $s=\sqrt{\frac{(x-\overline{x})^2}{n-1}}$s=√(x−x)2n−1 with the denominator of the fraction inside the square root sign given as $n-1$n−1 rather than $n$n.
That is, if we were to take all possible random samples of size $n$n from a population and compute an unbiased statistic $s^2$s2 for each sample, the average of all the $s^2$s2 values should be equal to population variance $\sigma^2$σ2. To get this to happen, $n$n must be replaced by $n-1$n−1 in the formula.
A random number generator generates random integers between $11$11 and $20$20 inclusive.
Q1
Each number is equally likely to be selected, so this is a discrete uniform distribution where each number has a probability of being selected of $0.1$0.1.
Q2
The mean is given by $\mu=\Sigma\frac{11+12+13+...+20}{10}=15.5$μ=Σ11+12+13+...+2010=15.5
Q3
The sample mean $\overline{x}=\Sigma\frac{12+14+15+...+11}{10}=\frac{140}{10}=14$x=Σ12+14+15+...+1110=14010=14.
The sample variance is given by:
$s^2=\frac{(x-\overline{x})^2}{n-1}=\frac{(12-14)^2+(14-14)^2+...+(11-14)^2}{10-1}=\frac{88}{9}\approx9.7778$s2=(x−x)2n−1=(12−14)2+(14−14)2+...+(11−14)210−1=889≈9.7778
Therefore the sample standard deviation $s=\sqrt{9.7778}=3.1269$s=√9.7778=3.1269.
This example shows how to evaluate the population mean $\mu$μ and standard deviation $\sigma$σ for a continuous probability density function.
We take as our example, the probability density function given by $y=\frac{x^2}{21}$y=x221 over the domain $1\le x\le4$1≤x≤4 and $0$0 everywhere else.
The particular function is shown here:
Checking first that the shaded area is unity, we have:
Area | $=$= | $\int_1^4\frac{x^2}{21}dx$∫41x221dx |
$=$= | $\left[\frac{x^3}{63}\right]_1^4$[x363]41 | |
$=$= | $\frac{64}{63}-\frac{1}{63}$6463−163 | |
$=$= | $1$1 | |
The mean or expected value is given by:
$\mu$μ | $=$= | $\int_1^4xf\left(x\right)dx$∫41xf(x)dx |
$=$= | $\int_1^4\frac{x^3}{21}dx$∫41x321dx | |
$=$= | $\left[\frac{x^4}{84}\right]_1^4$[x484]41 | |
$=$= | $\frac{256-1}{84}$256−184 | |
$=$= | $3\frac{1}{28}$3128 | |
$\approx$≈ | $3.0357$3.0357 |
The population variance given by $\sigma^2=E[X^2]-\mu^2$σ2=E[X2]−μ2 is evaluated as follows:
$\sigma^2$σ2 | $=$= | $\int_1^4x^2f\left(x\right)dx-\mu^2$∫41x2f(x)dx−μ2 |
$=$= | $\int_1^4\frac{x^4}{21}dx-\mu^2$∫41x421dx−μ2 | |
$=$= | $\left[\frac{x^5}{105}\right]_1^4-\mu^2$[x5105]41−μ2 | |
$=$= | $\frac{1024-1}{105}-(3.0357)^2$1024−1105−(3.0357)2 | |
$\approx$≈ | $0.527959$0.527959 | |
Hence $\sigma=\sqrt{(}0.527959)=0.7261514$σ=√(0.527959)=0.7261514.
Consider a fair $6$6 sided dice, with faces labeled from $1$1 to $6$6. Let $X$X be the outcome when the dice is rolled.
What type of distribution does $X$X represent?
Continuous Uniform Distribution
Discrete Uniform Distribution
Normal Distribution
Exponential Distribution
Many samples of size $75$75 are taken from the distribution, and the means of each of the samples $\overline{X}$X calculated.
What type of distribution does $\overline{X}$X approximately represent?
Discrete Uniform Distribution
Exponential Distribution
Continuous Uniform Distribution
Normal Distribution
Calculate the mean of $\overline{X}$X.
Calculate the standard deviation of $\overline{X}$X corresponding to a sample size of $75$75. Round your answer to $2$2 decimal places.
A discrete random variable $X$X has a mean of $0.1$0.1 and a variance of $1.3$1.3. Samples of $60$60 observations of $X$X are taken and $\overline{X}$X, the mean of each sample, was calculated.
What is the mean of $\overline{X}$X?
What is the standard deviation of $\overline{X}$X? Round your answer to two decimal places.
Using your answers to part (a) and part (b), calculate $P($P($0<\overline{X}<0.2$0<X<0.2$)$). Write your answer to two decimal places.
Using your answers to part (a) and part (b), calculate $P(\overline{X}<$P(X<$0.3$0.3$|\overline{X}>$|X>$0.2$0.2$)$)
Write your answer to two decimal places.
The weight of small tins of tuna represented by the random variable $X$X is normally distributed with a mean of $90.4$90.4 g and a standard deviation of $6.5$6.5 g.
If the cans are advertised as weighing $92$92 g, what is the probability a randomly chosen can is underweight? Round your answer to two decimal places.
What is the expected value of $\overline{X}$X, the sample mean of a randomly chosen sample of size $50$50?
Calculate the standard deviation for $\overline{X}$X. Round your answer to three decimal places.
Calculate the probability, $p$p, that a randomly chosen sample of size $50$50 has a mean weight less than the advertised weight, using the central limit theorem.
Give your answer to the nearest two decimal places.
$45$45 samples, each of size $50$50 are taken.
Calculate the probability, $q$q, that more than $41$41 samples each have a mean weight less than the advertised weight, using the central limit theorem.
Give your answer to the nearest two decimal places.