# INVESTIGATION - Sampling and simulation

Lesson

When considering taking a random sample from a population, one of the main questions we have is how large should the sample be so that we can be reasonably confident that our sample is representative of what's happening in the population.

In this investigation, instead of actually conducting surveys and and samples, we will conduct simulations using technology to explore what happens when we sample from various parent distributions. We will also explore what happens to the distribution of sample proportions as we increase the size of our sample.

## Simulating random sampling from various distributions

In this section we will explore taking samples from various probability distributions we have learnt this year. You can download the CASIO ClassPad app on your phone to complete this investigation.  Before we get into the exploration, we first need to learn to use the app to perform the simulations.

### Simulating a uniform distribution

Casio ClassPad

How to use the CASIO Classpad to complete the following tasks involving generating random integers and samples from a uniform distribution.

1. Generate a random integer between $5$5 and $25$25.

2. The continuous random variable $X$X is uniformly distributed over the domain $80$80 to $95$95.  Simulate selecting $50$50 values from $X$X and storing them in a list on your CAS.

3. Calculate the mean and sample standard deviation of your simulated random sample.

4. Display your simulated random sample as a histogram on your CAS.

### Simulating a normal distribution

Casio ClassPad

How to use the CASIO Classpad to complete the following tasks involving generating random samples from a normal distribution.

Consider the random variable $X$X which is normally distributed with $\mu=60$μ=60 and $\sigma=10$σ=10

1. Simulate taking a random sample of size $40$40 from $X$X.

2. Calculate the mean and sample standard deviation of your random sample.

3. Display your simulated random sample as a histogram on your CAS.

### Simulating a Bernoulli or binomial distribution

Casio ClassPad

How to use the CASIO Classpad to complete the following tasks involving generating random samples from a Bernoulli and binomial distribution.

1. Let $X$X be the Bernoulli random variable with $p=0.6$p=0.6

Simulate selecting $10$10 values from the distribution.

2. Let $Y$Y be the binomial random variable with $n=15$n=15 and $p=0.4$p=0.4.

Simulate a single random selection from this distribution.

3. For the same random variable $Y$Y above, simulate a random sample of size $100$100 from the distribution.

4. For the random variable $Y$Y, simulate another sample, this time of size $50$50, and store the sample proportions in a list on your CAS where the sample proportions, $\hat{P}$^P is given by $\hat{P}=\frac{Y}{50}$^P=Y50

## Part 1: Comparing the variability of samples

### Comparing the variability of samples from a uniform distribution

Consider the continuous random variable X, distributed uniformly over the interval 20 to 25.

Draw a sketch for the probability density function of X and use your knowledge of the uniform distribution to calculate the mean and standard deviation for X.

We will now simulate taking a few samples from this parent distribution and we will then compare and contrast the shape, centre and spread of our samples to those of the parent distribution.

Sample one: Simulate taking a sample of size 20 from X.

Graph the histogram of your sample, using interval widths of 1, starting at 20, and calculate the mean and standard deviation of your data. Record these in a table such as the one shown below.

Sample two: Simulate taking a sample of size 70 from X.

Again graph the distribution using the same histogram parameters, and calculate the mean and standard deviation and record the results.

Sample three: Simulate taking a sample of size 150 from X.

Again graph the distribution using the same histogram parameters, and calculate the mean and standard deviation and record the results.

Mean Standard deviation Histogram

Before you compare and contrast your samples, we have performed the simulations ourselves and our table of results is shown below. You might like to use these to assist you in your reflection.

Mean Standard deviation Histogram
Our sample one 21.912 1.156
Our sample two 22.317 1.453
Our sample three 22.587 1.455

#### Reflection

1. Write a sentence comparing and contrasting the mean of each sample as compared to the mean of the parent distribution.
2. Write a sentence comparing and contrasting the standard deviation of each sample as compared to the standard deviation of X.
3. Write a sentence comparing and contrasting the shape of each histogram with the shape of X.
4. Overall, is there anything you notice about the samples as the size of the sample increases?

### Comparing the variability of samples from a normal distribution

Consider the continuous normal random variable X, with a mean of 80 and a standard deviation of 5.

Draw a sketch of the probability distribution for X.

We will now simulate taking a few samples from this parent distribution and we will then compare and contrast the shape, centre and spread of our samples to those of the parent distribution.

Sample one: Simulate taking a sample of size 20 from X.

Graph the histogram of your sample, using interval widths of 5, starting at 65, and calculate the mean and standard deviation of your data. Record these in a table such as the one shown below.

Sample two: Simulate taking a sample of size 70 from X.

Again graph the distribution using the same histogram parameters, and calculate the mean and standard deviation and record the results.

Sample three: Simulate taking a sample of size 150 from X.

Again graph the distribution using the same histogram parameters, and calculate the mean and standard deviation and record the results.

Mean Standard deviation Histogram

Before you compare and contrast your samples, we have performed the simulations ourselves and our table of results is shown below. You might like to use these to assist you in your reflection.

Mean Standard deviation Histogram
Our sample one 79.921 4.550
Our sample two 79.433 4.476
Our sample three 80.319 5.060

#### Reflection

1. Write a sentence comparing and contrasting the mean of each sample as compared to the mean of the parent distribution.
2. Write a sentence comparing and contrasting the standard deviation of each sample as compared to the standard deviation of X.
3. Write a sentence comparing and contrasting the shape of each histogram with the shape of X.
4. Overall, is there anything you notice about the samples as the size of the sample increases?

### Comparing the variability of samples from a binomial distribution

Consider the binomial random variable X, with n=20 and p=0.4.

Draw a sketch for the density of X and use your knowledge of the binomial distribution to calculate the mean and standard deviation for X.

We will now simulate taking a few samples from this parent distribution and we will then compare and contrast the shape, centre and spread of our samples to those of the parent distribution.

Sample one: Simulate taking a sample of size 20 from X.

Graph the histogram of your sample, using interval widths of 1, starting at 0, and calculate the mean and standard deviation of your data. Record these in a table such as the one shown below.

Sample two: Simulate taking a sample of size 70 from X.

Again graph the distribution using the same histogram parameters, and calculate the mean and standard deviation and record the results.

Sample Three: Simulate taking a sample of size 150 from X.

Again graph the distribution using the same histogram parameters, and calculate the mean and standard deviation and record the results.

Mean Standard deviations Histogram

Before you compare and contrast your samples, we have performed the simulations ourselves and our table of results is shown below. You might like to use these to assist you in your reflection.

Mean Standard deviation Histogram
Our sample one 8.100 2.198
Our sample two 8.071 2.038
Our sample three 8.173 2.309

#### Reflection

1. Write a sentence comparing and contrasting the mean of each sample as compared to the mean of the parent distribution.
2. Write a sentence comparing and contrasting the standard deviation of each sample as compared to the standard deviation of X.
3. Write a sentence comparing and contrasting the shape of each histogram with the shape of X.
4. Overall, is there anything your notice about the samples as the size of the sample increases?

#### Conclusion

By examining the three sets of simulations for the three different distributions, comment on the following:

1. When comparing samples with the parent distribution, what can be observed?
2. When comparing samples with other samples, what can be observed?
3. As the size of the sample increases, what do you observe about the sample as compared with the parent distribution?

## Part 2: Sample proportions and repeated sampling

In the next set of simulations, we will be looking at a few different characteristics in the population and we'll simulate taking samples of different sizes from the population. We'll then examine the proportion in each sample which exhibits the characteristic we're observing.

Simulation one

Consider all the integers between 1 and 12 inclusive. We'll simulate taking n numbers from this population (with replacement), and we'll also observe the number of even numbers in each sample or simulation. Just like last time, we'll do this along with you.

Sample 1:

Your sample of 20 integers (be sure to store them as a list):

Our sample of 20 integers: 9, 1, 10, 4, 2, 7, 12, 2, 9, 10, 9, 6, 7, 6, 8, 7, 2, 4, 1, 3

Number of even numbers 11
Proportion of even numbers 0.55

Sample 2:

Now we'll sample 100 integers. Let's not list these, but just tabulate our results.

Number of even numbers 52
Proportion of even numbers 0.52

### Comparing shape and mean of sets of sample proportions

So far, taking one sample at a time and observing the proportion in each sample, is a little slow going. We could instead simulate this entire situation by modelling with a binomial distribution.

Let's do the following:

• Simulate samples of size 100 of various values of n (as per the table), and we know p=0.5 for this scenario
• Divide each of our samples by n, to obtain the proportion for each sample
• Graph the distribution of the sample proportions
• Calculate the mean of the sample proportions
• Obtain a graph of the simulation which clearly displays the shape of the graph

Hint: For nice histograms, use the increment step as \frac{1}{n}.

n=20 n=100 n=200
Our graph
Our mean 0.496 0.503 0.501

#### Reflection

1. As we increase the size of each sample, what do you notice about the shape of the distribution of the sample proportions?
2. As we increase the size of each sample, what do you notice about the average sample proportion value?

Simulation two

Our reflections are incomplete unless we also observe what happens if p changes. So this time, let's repeat what we've been doing above (sampling with replacement from the integers 1 to 12 inclusive), this time observing the number in each sample less than 4.

We'll skip straight to modelling this with a binomial distribution, where p=0.25.

Let's do the following:

• Simulate samples of size 100 of various values of n (as per the table), and we know p=0.25 for this scenario
• Divide each of our samples by n, to obtain the proportion for each sample
• Graph the distribution of the sample proportions
• Calculate the mean of the sample proportions
n=20 n=100 n=200
Our graph
Our mean 0.244 0.246 0.249

Before doing a final reflection, let's try this process one more time.

Simulation three

So this time, let's repeat what we've been doing above (sampling with replacement from the integers 1 to 12 inclusive), this time observing the number in each sample less than 12.

We'll skip straight to modelling this with a binomial distribution, where p=\frac{11}{12}\approx 0.9167.

Let's do the following:

• Simulate samples of size 100 samples of various values of n (as per the table), and we know p=\frac{11}{12}\approx 0.9167 for this scenario
• Divide each of our samples by n, to obtain the proportion for each sample
• Graph the distribution of the sample proportions
• Calculate the mean of the sample proportions
n=20 n=100 n=200
Our graph
Our mean 0.907 0.914 0.919

#### Reflection

1. For various values of p, as we increase n, the graph of the sample proportions appears to approach what type of graph?
2. For various values of p, as we increase n, the mean of the sample proportions appears to approach what value?

### Comparing variability in sets of sample proportions

In the above experiments we focused on comparing the shape and mean of the distribution of sample proportions. Let's look one more time at the graphs from our three experiments above but this time focus on the impact of increasing the sample size on the standard deviation of the graph. To quickly visualise the difference the graphs have been placed on the same scale.

n=20 n=100 n=200
Simulation 1
Standard deviation 0.1170 0.0528 0.0325
Simulation 2
Standard deviation 0.1028 0.0461 0.0295
Simulation 3
Standard deviation 0.0625 0.0259 0.0194

#### Reflection

1. For various values of p, as we increase n, what happens to the standard deviation? Can you justify why this occurs?
2. For the values above, the closer p was to 0 or 1 the smaller the standard deviation was. Is this always the case?
3. For each simulation above for the case where n=1, we would have the standard deviation: \sigma=\sqrt{p\left(1-p\right)}. Calculate this for each simulation and try to find a rule for how the standard deviation is impacted by the sample size, n. You may need to run each simulation with more varying values of n to discover a pattern.

### Outcomes

#### 4.5.1.3

investigate the variability of random samples from various types of distributions, including uniform, normal and Bernoulli, using graphical displays of real and simulated data

#### 4.5.2.3

simulate repeated random sampling, for a variety of values of p and a range of sample sizes, to illustrate the distribution of ˆp and the approximate standard normality of (ˆp−p)/(sqrt(ˆp(1−ˆp)/n) where the closeness of the approximation depends on both n and p

#### 4.5.3.4

use simulation to illustrate variations in confidence intervals between samples and to show that most but not all confidence intervals contain p