topic badge

1.01 Random sampling

Lesson

Collecting data from every member of a population is the most accurate way of gathering information, but it is not always the most practical and can be very expensive. For these reasons, data is often gathered from a smaller group, or sample, that can be used to estimate the characteristics of the wider population. 

The size of the sample is an important consideration. If the sample is too large, it may be too expensive or time-consuming to collect the data. If it is too small, the sample may not be representative of the population.

A biased or fair sample

Bias in a sample means that certain characteristics of the population are over- or under-represented. One way to reduce bias is to use some form of randomisation in the sampling method.

A random sample is formed by selecting members from the population at random, where each member of the population has an equally likely chance of being selected. In simple cases, a sample could be created by drawing names from a hat. For most samples though, it is more common to use a random number generator. 

Generating random numbers

Calculators and spreadsheet applications usually have a random number generator that can generate random decimals between $0$0 and $1$1.

These can be used to create random numbers within any range of values. For example, if we require a random number between:

  • $1$1 and $50$50, we would multiply the randomly-generated decimal by $50$50, then round the answer up to the next whole number.
  • $20$20 and $30$30, we would multiply the randomly-generated decimal by $30$30, add $20$20, then round the answer up to the next whole number.

 

Practice questions

Question 1

"Do you prefer this rad shirt or the ordinary one on the shelves at the moment?"

  1. Is the question biased or fair?

    Biased

    A

    Fair

    B

    Biased

    A

    Fair

    B
  2. Identify the type of bias.

    It is an emotive question.

    A

    It uses specific terminology.

    B

    It is a leading question.

    C

    It is an emotive question.

    A

    It uses specific terminology.

    B

    It is a leading question.

    C

Question 2

A school principal wants to estimate the number of students who ride a bicycle to school. Which sample(s) should be used to not introduce bias?

  1. All students who are in the school band.

    A

    $8$8 students in the hallway.

    B

    Ten students from each grade, chosen at random.

    C

    $130$130 students during the lunch periods.

    D

    All students who are in the school band.

    A

    $8$8 students in the hallway.

    B

    Ten students from each grade, chosen at random.

    C

    $130$130 students during the lunch periods.

    D

 

Data collection methods

There are several different methods for collecting sample information. The design of the data collection method and the way it is implemented can have a big impact on the quality of the data obtained.

The most common methods include the following:

  • Questionnaires and surveys
  • Experiments and simulations
  • Observational studies
  • Data logging (most websites do this automatically)

 

Data collection methods

Survey - a method for data collection where some members of a population give their responses on certain characteristics, behaviors, or opinions.

Experiment - a method for data collection where a sample is divided into two groups- an experimental group and a control group.

Observational study - a method for data collection where members of a sample are measured or observed without being altered by the study.

 

Sampling and measurement error

In any measurement procedure, there is inevitably some error. This is true when an experiment is being done to measure some physical quantity, and it is true in the case of surveys designed to quantify people's voting intentions or brand preferences.

We distinguish two kinds of error: systematic error and random error

Systematic error

Systematic error occurs when there is something wrong with the measuring apparatus or with the procedure being used. An instrument may not have been correctly calibrated, for example, or a scale may have been consistently misread. In a survey, bias may have been introduced by poor design of the survey questions or by the manner in which the sample of the population to be surveyed was selected.

Once detected, systematic error can usually be reduced by improvements in the experimental procedure and by taking steps to eliminate bias due to the survey design and to the sampling method.

Random error

Random error can be thought of as the remaining unexplained error after known sources of systematic error have been removed.

When several careful measurements of the same physical quantity are made, slightly different results may be reported at each trial due to small but uncontrolled fluctuations in the experimental conditions and by the impossibility of reading a measurement scale beyond some level of precision. The reported result of such a measurement is given as the mean of all the observed results together with an indication of how far the observations vary about the mean. The amount of variation is usually given by a statistic called the variance or by its square root, the standard deviation.

The accuracy of the result of a series of repeated measurements is increased by increasing the number of trials of the experiment, thereby reducing the standard deviation of the measurement data.

 

In a sample survey, it is to be expected that repeated sampling will give a somewhat different result for each sample. Small samples are more variable than larger ones. Techniques exist for determining how large a sample should be in order to be confident that the true value of the quantity being estimated is reasonably close to the value observed in the sample. 

Polling conducted before an election, for example, often gives the percentage of voters expected to vote for a particular candidate and also a margin of error which is to do with the accuracy of the sampling process and, in particular, with the size of the sample of the population that was surveyed.

Practice questions

Question 3

Tina wants to know which new smart phone she should buy and decides to base her decision on other people’s opinions. She decides to interview some people at the Retirement Home where her grandparents live one Sunday afternoon.

  1. Is this an example of Sampling Error or Measurement Error?

    Sampling error

    A

    Measurement error

    B

    Sampling error

    A

    Measurement error

    B
  2. What type of Sampling Error is this?

    The sample is too small

    A

    The sample is not random

    B

    The sample does not adequately represent the population.

    C

    The sample is too small

    A

    The sample is not random

    B

    The sample does not adequately represent the population.

    C

Question 4

Adults attending a local cinema were asked the following question:

“How many times did you see a movie at this cinema last year?”

  1. Is this an example of Sampling Error or Measurement Error?

    Sampling error

    A

    Measurement error

    B

    Sampling error

    A

    Measurement error

    B
  2. What type of Measurement Error is this?

    The scale provided is inadequate.

    A

    This is an assumption based question.

    B

    Poor and/or leading question wording.

    C

    The scale provided is inadequate.

    A

    This is an assumption based question.

    B

    Poor and/or leading question wording.

    C
  3. What is the main reason why this question is poor?

    Poor and/or leading question wording

    A

    The people being surveyed need more information.

    B

    Relies too heavily on respondent memory.

    C

    Poor and/or leading question wording

    A

    The people being surveyed need more information.

    B

    Relies too heavily on respondent memory.

    C

Outcomes

S.IC.A.1

Understand statistics as a process for making inferences about population parameters based on a random sample from that population.

S.IC.B.3

Recognize the purposes of and differences among sample surveys, experiments, and observational studies; explain how randomization relates to each.

What is Mathspace

About Mathspace