When considering taking a random sample from a population, one of the main questions we have is how large should the sample be so that we can be reasonably confident that our sample is representative of what's happening in the population.
In this investigation, instead of actually conducting surveys and and samples, we will conduct simulations using technology to explore what happens when we sample from various parent distributions. We will also explore what happens to the distribution of sample proportions as we increase the size of our sample.
In this section we will explore taking samples from various probability distributions we have learnt this year. You can download the CASIO ClassPad app on your phone to complete this investigation. Before we get into the exploration, we first need to learn to use the app to perform the simulations.
Casio ClassPad
How to use the CASIO Classpad to complete the following tasks involving generating random integers and samples from a uniform distribution.
Generate a random integer between $5$5 and $25$25.
The continuous random variable $X$X is uniformly distributed over the domain $80$80 to $95$95. Simulate selecting $50$50 values from $X$X and storing them in a list on your CAS.
Calculate the mean and sample standard deviation of your simulated random sample.
Display your simulated random sample as a histogram on your CAS.
Casio ClassPad
How to use the CASIO Classpad to complete the following tasks involving generating random samples from a normal distribution.
Consider the random variable $X$X which is normally distributed with $\mu=60$μ=60 and $\sigma=10$σ=10
Simulate taking a random sample of size $40$40 from $X$X.
Calculate the mean and sample standard deviation of your random sample.
Display your simulated random sample as a histogram on your CAS.
Casio ClassPad
How to use the CASIO Classpad to complete the following tasks involving generating random samples from a Bernoulli and binomial distribution.
Let $X$X be the Bernoulli random variable with $p=0.6$p=0.6
Simulate selecting $10$10 values from the distribution.
Let $Y$Y be the binomial random variable with $n=15$n=15 and $p=0.4$p=0.4.
Simulate a single random selection from this distribution.
For the same random variable $Y$Y above, simulate a random sample of size $100$100 from the distribution.
For the random variable $Y$Y, simulate another sample, this time of size $50$50, and store the sample proportions in a list on your CAS where the sample proportions, $\hat{P}$^P is given by $\hat{P}=\frac{Y}{50}$^P=Y50
Consider the continuous random variable X, distributed uniformly over the interval 20 to 25.
Draw a sketch for the probability density function of X and use your knowledge of the uniform distribution to calculate the mean and standard deviation for X.
We will now simulate taking a few samples from this parent distribution and we will then compare and contrast the shape, centre and spread of our samples to those of the parent distribution.
Sample one: Simulate taking a sample of size 20 from X.
Graph the histogram of your sample, using interval widths of 1, starting at 20, and calculate the mean and standard deviation of your data. Record these in a table such as the one shown below.
Sample two: Simulate taking a sample of size 70 from X.
Again graph the distribution using the same histogram parameters, and calculate the mean and standard deviation and record the results.
Sample three: Simulate taking a sample of size 150 from X.
Again graph the distribution using the same histogram parameters, and calculate the mean and standard deviation and record the results.
Mean | Standard deviation | Histogram | |
---|---|---|---|
Your sample one | |||
Your sample two | |||
Your sample three |
Before you compare and contrast your samples, we have performed the simulations ourselves and our table of results is shown below. You might like to use these to assist you in your reflection.
Mean | Standard deviation | Histogram | |
---|---|---|---|
Our sample one | 21.912 | 1.156 | |
Our sample two | 22.317 | 1.453 | |
Our sample three | 22.587 | 1.455 |
Consider the continuous normal random variable X, with a mean of 80 and a standard deviation of 5.
Draw a sketch of the probability distribution for X.
We will now simulate taking a few samples from this parent distribution and we will then compare and contrast the shape, centre and spread of our samples to those of the parent distribution.
Sample one: Simulate taking a sample of size 20 from X.
Graph the histogram of your sample, using interval widths of 5, starting at 65, and calculate the mean and standard deviation of your data. Record these in a table such as the one shown below.
Sample two: Simulate taking a sample of size 70 from X.
Again graph the distribution using the same histogram parameters, and calculate the mean and standard deviation and record the results.
Sample three: Simulate taking a sample of size 150 from X.
Again graph the distribution using the same histogram parameters, and calculate the mean and standard deviation and record the results.
Mean | Standard deviation | Histogram | |
---|---|---|---|
Your sample one | |||
Your sample two | |||
Your sample three |
Before you compare and contrast your samples, we have performed the simulations ourselves and our table of results is shown below. You might like to use these to assist you in your reflection.
Mean | Standard deviation | Histogram | |
---|---|---|---|
Our sample one | 79.921 | 4.550 | |
Our sample two | 79.433 | 4.476 | |
Our sample three | 80.319 | 5.060 |
Consider the binomial random variable X, with n=20 and p=0.4.
Draw a sketch for the density of X and use your knowledge of the binomial distribution to calculate the mean and standard deviation for X.
We will now simulate taking a few samples from this parent distribution and we will then compare and contrast the shape, centre and spread of our samples to those of the parent distribution.
Sample one: Simulate taking a sample of size 20 from X.
Graph the histogram of your sample, using interval widths of 1, starting at 0, and calculate the mean and standard deviation of your data. Record these in a table such as the one shown below.
Sample two: Simulate taking a sample of size 70 from X.
Again graph the distribution using the same histogram parameters, and calculate the mean and standard deviation and record the results.
Sample Three: Simulate taking a sample of size 150 from X.
Again graph the distribution using the same histogram parameters, and calculate the mean and standard deviation and record the results.
Mean | Standard deviations | Histogram | |
---|---|---|---|
Your sample one | |||
Your sample two | |||
Your sample three |
Before you compare and contrast your samples, we have performed the simulations ourselves and our table of results is shown below. You might like to use these to assist you in your reflection.
Mean | Standard deviation | Histogram | |
---|---|---|---|
Our sample one | 8.100 | 2.198 | |
Our sample two | 8.071 | 2.038 | |
Our sample three | 8.173 | 2.309 |
By examining the three sets of simulations for the three different distributions, comment on the following:
In the next set of simulations, we will be looking at a few different characteristics in the population and we'll simulate taking samples of different sizes from the population. We'll then examine the proportion in each sample which exhibits the characteristic we're observing.
Simulation one
Consider all the integers between 1 and 12 inclusive. We'll simulate taking n numbers from this population (with replacement), and we'll also observe the number of even numbers in each sample or simulation. Just like last time, we'll do this along with you.
Sample 1:
Your sample of 20 integers (be sure to store them as a list):
Our sample of 20 integers: 9, 1, 10, 4, 2, 7, 12, 2, 9, 10, 9, 6, 7, 6, 8, 7, 2, 4, 1, 3
Our sample | Your sample | |
---|---|---|
Number of even numbers | 11 | |
Proportion of even numbers | 0.55 |
Sample 2:
Now we'll sample 100 integers. Let's not list these, but just tabulate our results.
Our sample | Your sample | |
---|---|---|
Number of even numbers | 52 | |
Proportion of even numbers | 0.52 |
So far, taking one sample at a time and observing the proportion in each sample, is a little slow going. We could instead simulate this entire situation by modelling with a binomial distribution.
Let's do the following:
Hint: For nice histograms, use the increment step as \frac{1}{n}.
n=20 | n=100 | n=200 | |
---|---|---|---|
Our graph | |||
Our mean | 0.496 | 0.503 | 0.501 |
Your graph | |||
Your mean |
Simulation two
Our reflections are incomplete unless we also observe what happens if p changes. So this time, let's repeat what we've been doing above (sampling with replacement from the integers 1 to 12 inclusive), this time observing the number in each sample less than 4.
We'll skip straight to modelling this with a binomial distribution, where p=0.25.
Let's do the following:
n=20 | n=100 | n=200 | |
---|---|---|---|
Our graph | |||
Our mean | 0.244 | 0.246 | 0.249 |
Your graph | |||
Your mean |
Before doing a final reflection, let's try this process one more time.
Simulation three
So this time, let's repeat what we've been doing above (sampling with replacement from the integers 1 to 12 inclusive), this time observing the number in each sample less than 12.
We'll skip straight to modelling this with a binomial distribution, where p=\frac{11}{12}\approx 0.9167.
Let's do the following:
n=20 | n=100 | n=200 | |
---|---|---|---|
Our graph | |||
Our mean | 0.907 | 0.914 | 0.919 |
Your graph | |||
Your mean |
In the above experiments we focused on comparing the shape and mean of the distribution of sample proportions. Let's look one more time at the graphs from our three experiments above but this time focus on the impact of increasing the sample size on the standard deviation of the graph. To quickly visualise the difference the graphs have been placed on the same scale.
n=20 | n=100 | n=200 | |
---|---|---|---|
Simulation 1 | |||
Standard deviation | 0.1170 | 0.0528 | 0.0325 |
Simulation 2 | |||
Standard deviation | 0.1028 | 0.0461 | 0.0295 |
Simulation 3 | |||
Standard deviation | 0.0625 | 0.0259 | 0.0194 |
investigate the variability of random samples from various types of distributions, including uniform, normal and Bernoulli, using graphical displays of real and simulated data
simulate repeated random sampling, for a variety of values of p and a range of sample sizes, to illustrate the distribution of ˆp and the approximate standard normality of (ˆp−p)/(sqrt(ˆp(1−ˆp)/n) where the closeness of the approximation depends on both n and p
use simulation to illustrate variations in confidence intervals between samples and to show that most but not all confidence intervals contain p