topic badge

12.015 Sampling methods

Lesson

 

Types of sampling

Four common methods used to sample a population are:

  • Simple random sampling
  • Systematic sampling
  • Self-selected sampling
  • Stratified sampling

Each method has advantages and disadvantages, depending on the situation. 

Simple random sampling

In this method, a sample is formed by selecting members from the population at random, where each member of the population has an equally likely chance of being selected. In simple cases, a sample could be created by drawing names from a hat. For most samples though, it is more common to use a random number generator. 

 

Generating random numbers

Calculators and spreadsheet applications usually have a random number generator that can generate random decimals between $0$0 and $1$1.

These can be used to create random numbers within any range of values. For example, if we require a random number between:

  • $1$1 and $50$50, we would multiply the randomly-generated decimal by $50$50, then round the answer up to the next whole number.
  • $20$20 and $30$30, we would multiply the randomly-generated decimal by $10$10, add $20$20, then round the answer up to the next whole number.

Creating a simple random sample from a list is quite straightforward as the name suggests. However, disadvantages of this method can include the time or expense needed to gather the full list of a specific population and the bias that could occur when the sample set is not large enough to adequately represent the full population.

Worked example

Example 1

Sam wants to randomly select five people to use in a sample from a population of twenty. He begins by assigning each person in the population, a number from $1$1 to $20$20. Using the random number generator on his calculator, he then generates five random numbers between $0$0 and $1$1:

$0.532$0.532, $0.805$0.805, $0.686$0.686, $0.774$0.774, $0.272$0.272

Help Sam,  by converting these values to numbers between $1$1 and $20$20, so he can use them to select members for his sample. 

Think: To get random numbers between $1$1 and $20$20, we multiply each value by $20$20 and round up to the next whole number:

Do:

First random number $=$= $0.532\times20$0.532×20    
  $=$= $10.64$10.64    
  $=$= $11$11   (Rounded up to the next whole number)

Repeating this for the remaining values gives us five random numbers between $1$1 and $20$20:

$11$11, $17$17, $14$14, $16$16, $6$6

Anyone who was assigned these numbers in the population would be selected for the sample.

Reflect:  In this case, we wanted five unique random numbers. If our calculator generated two numbers that were similar, we could end up with the same value when converted to a number between $1$1 and $20$20. In that situation, we would keep generating additional random numbers until we had five that were unique. 

Websites can also produce random numbers. Try the example above again, this time using this generator

 

Systematic sampling

In this method, a sample is formed by choosing a random starting point, then selecting members from the population at regular intervals (i.e. every $5$5th member). In other words, it uses a 'system' for selection. For example, we may choose every fifth name from a list or call every tenth business in the phone book. This method is often favoured by manufacturers for sampling products on a production line.

The image to the left shows every $3$3rd person being picked. 

Systematic sampling provides a useful mechanism to select the sample in an efficient organised manner. A possible issue with this sampling technique is when the sample interval coincides with a trait that causes the sample to no longer be random. For example, when selecting every $5$5th chip packet on a conveyor belt to check weight, when by coincidence the machine has a fault causing every $5$5th packet to be underfilled. 

 

Worked example

example 2

A machine produces $400$400 items a day. At what interval should an item be selected in order to obtain a systematic sample of $25$25 items?

Think: To obtain $25$25 samples at even intervals, what size will each interval be?

Do:

Interval required $=$= $\frac{\text{population size }}{\text{sample size }}$population size sample size
  $=$= $\frac{400}{25}$40025
  $=$= $16$16

Therefore, every $16$16th item should be selected.

Self-selected sampling

In this method, a sample is formed by members of the population who volunteer themselves for selection. This is a common sampling method in the field of medicine where volunteers may be asked to take part in a medical trial. 

This method tends to be used in situations where it may be difficult to randomly select people, perhaps due to ethical or logistical reasons. As a result, self-selected sampling may not be truly representative of the wider population.

 

Stratified sampling

In this method, a sample is formed by dividing the population into subgroups (or strata) and then selecting a random sample proportionally from each subgroup. That is, if a subgroup makes up $25%$25% of the population it should also make up $25%$25% of the sample. This is particularly appropriate when there are clearly defined subgroups that are likely to have different opinions or traits, and we want to ensure each subgroup is fairly represented in the sample.

While stratified sampling can be more complex to perform, it can help ensure the sample is representative of the wider population. It can also provide useful statistics on the subgroups and highlight differences between them. A disadvantage is researchers must ensure every member of a population being studied can be classified into one, and only one, subgroup - so this method cannot be applied in all situations.

 

Stratified sample

The number surveyed from a particular subgroup in a stratified sample, can be calculated as follows:

$\text{Number of subgroup to survey}$Number of subgroup to survey $=$= $\text{Proportion of population in subgroup}\times\text{Sample size}$Proportion of population in subgroup×Sample size
  $=$= $\frac{\text{Number with subgroup trait}}{\text{Population size}}\times\text{Sample size}$Number with subgroup traitPopulation size×Sample size

This calculation should be rounded to the nearest integer, since we cannot survey part of a member of the population.

 

Worked example

Example 3

Martha wants to use a stratified sample of $50$50 members to survey her schools population.  If there are $70$70 teachers and $1100$1100 students at her school, 

(a) How many teachers should be in the sample? 

Think: The total school population is $1170$1170 people. If teachers make up $70$70 out of $1170$1170, we need to find the same proportion of teachers in a sample of $50$50 people.

Do:

Number of teachers in sample $=$= $\frac{70}{1170}\times50$701170×50    
  $=$= $2.991$2.991...    
  $=$= $3$3   (Rounded to the nearest whole number)

 

(b) How many students should be in the sample?

Think: We use the same approach to find the number of students in the sample.

Do:

Number of students in sample $=$= $\frac{1100}{1170}\times50$11001170×50    
  $=$= $47.008$47.008...    
  $=$= $47$47   (Rounded to the nearest whole number)

 

Reflect: We need to check that the number of teachers and students add to the required sample size of $50$50.

 i.e. $3$3 teachers $+$+ $47$47 students $=$= $50$50.

 

Practice questions

QUESTION 1

Choosing every $5$5th person on the class roll to take part in a survey is an example of:

  1. Stratified Sampling

    A

    Random Sampling

    B

    Systematic Sampling

    C

QUESTION 2

Users of a particular streaming service can be in one of four categories - Standard, Family, Premium or Business. The table shows the number of people in each category:

Category Number of People
Standard $3500$3500
Family $1500$1500
Premium $2000$2000
Business $3000$3000
  1. How many customers are there across all the categories?

  2. If a stratified sample of $400$400 is to be taken from the group, what proportion of people will be chosen?

  3. For the sample to be stratified, how many Standard customers should be chosen?

  4. For the sample to be stratified, how many Family customers should be chosen?

  5. For the sample to be stratified, how many Premium customers should be chosen?

  6. For the sample to be stratified, how many Business customers should be chosen?

 

Capture-recapture method

In ecological studies, a sampling technique, called capture-recapture, is used to estimate the number of individuals in a population. It involves tagging, releasing and then recapturing after a certain time has elapsed in order to estimate the size of a population.

Suppose a region is home to an unknown number of animals of a particular species. A researcher might capture some of the animals, tag them and then release them back into the environment. Sometime later when the released animals can be assumed to have become well-mixed with the rest of the population, another sample of the animals is captured. Some of these are likely to be the previously tagged individuals.

The proportion of tagged individuals in the second sample is likely to be approximately the same as the proportion that the original sample size is of the whole population size. This method assumes that the population is "closed". In other words, the two visits to the study area are close enough in time so that no individuals die, are born, or move into or out of the study area between visits. The model also assumes that no marks fall off animals between visits to the field site by the researcher, and that the researcher takes a random sample both times. Randomness of a wild sample may be difficult to guarantee in practice. Perhaps tagged animals were slower and more likely to be caught in both samples. Can you think of any other factors that may cause the population estimate to be inaccurate?

Worked example

Example 4

A sample was of $50$50 fish were caught in a lake. These were tagged and released. Some time later another $48$48 fish were caught. Of these, four were found to be tagged. Estimate the number of fish in the lake.

Think: The proportion of tagged fish in the second sample should be equal to the proportion of $50$50 fish out of the total population.

Do: The fraction of tagged fish in the second sample is $\frac{4}{48}=\frac{1}{12}$448=112 and it is known that there are $50$50 tagged fish in the lake.

$\frac{\text{Number tagged sample 1}}{\text{Total population}}$Number tagged sample 1Total population $=$= $\frac{\text{Number tagged sample 2}}{\text{Total number sample 2}}$Number tagged sample 2Total number sample 2

 

$\frac{50}{\text{Total population}}$50Total population $=$= $\frac{1}{12}$112

 

$\text{Total population}$Total population $=$= $50\times12$50×12

Cross multiply to remove fractions.

  $=$= $600$600

 

 

Therefore, there is estimated to be $600$600 fish in the lake.

 

Capture recapture

$\frac{\text{Number tagged sample 1}}{\text{Total population}}$Number tagged sample 1Total population $=$=$\frac{\text{Number tagged sample 2}}{\text{Total number sample 2}}$Number tagged sample 2Total number sample 2

Practice questions

QUESTION 3

An oil spill has spread over an area of $1650$1650 square kilometres. A team of biologists scan an area of $150$150 square kilometres, and find $272$272 dead marine animals. Find $y$y, the estimated number of dead marine animals over the entire area of the oil spill.

QUESTION 4

A local council wanted to monitor the number of rabbits in the area. They used the capture-recapture technique to estimate the population of rabbits. $219$219 rabbits were caught, tagged and released. Later, $42$42 rabbits were caught at random. $15$15 of these $42$42 rabbits had been tagged.

  1. Find $k$k, the estimated population of rabbits. Round your answer to the nearest whole number if necessary.

  2. Local council B conducted a similar study and found they had $15%$15% fewer rabbits. What was the estimated population of rabbits in council area B? Round to the nearest whole number if necessary.

Outcomes

2.3.2.2

investigate the different kinds of samples [complex]

What is Mathspace

About Mathspace