topic badge

6.06 Statistical studies and sampling

Lesson

Concept summary

Collecting data from every member of a population is the most accurate way of gathering information, but it is not always the most practical and can be very expensive. Typically, a sample survey is instead done on a subset of the population to make it quicker and less expensive.

When summarizing the data, we use different terms if the data came from a sample or the whole population.

Parameter

A number that summarizes data from a population

Statistic

A number that summarizes data from a sample, which may be applied to the population if the sampling was unbiased

If a survey is not representative of the entire population, we say that the survey has bias. There are a number of potential sources of bias that we should avoid:

  • Poor sampling techniques: If the people being surveyed do not resemble the population, the survey is likely to be biased.

    • For example, surveying train travellers about their opinions on public transport will likely give very different results than a census of the entire population.
  • Too small of a sample: In general, the bigger the number of people being surveyed, the closer the results will be to a census.

    • For example, asking one person's opinion will not tell you anything about the population's opinion.
  • Poor question wording: The question asked should answer the purpose of the study.

    • For example, asking, "Do you approve of the current governing party?" does not give the same results as asking, "Will you vote for the current governing party in the next election?"
  • Using loaded or leading questions: Avoid questions which use emotive language or might otherwise influence the results of the survey.

    • For example, asking, "Do you watch the most popular sport, soccer?" will be biased unlike asking, "Do you watch soccer?". These are referred to as "leading questions" as they lead the person being surveyed to a particular answer.

When we want to draw conclusions based on a collection of data, there are two methods that can be used: an observational study and an experiment.

Observational study

A hands-off study where a group is watched and monitored with no outside intervention.

Experiment

A planned method that is randomly applied to a group with the intent of finding cause and effect relationships.

When conclusions are drawn, we can assess the validity by considering the Law of Large Numbers.

Law of Large Numbers

The larger the sample that conclusions are being drawn from, the more representative of a population the conclusions are; and therefore, the conclusions are more valid.

Worked examples

Example 1

Mario surveyed 20 students from his math class to find out whether students at his school think they should make statistics a mandatory class. 70\% of students said yes, and 30\% of students said no. The school has 3000 students.

a

State if 70\% is a statistic or parameter. Explain how you know.

Approach

We know that a statistic is a number that summarizes data from a sample, which may be applied to the population if the sampling was unbiased.

We also know that a parameter is a number that summarizes data from a population.

Therefore we need to determine whether 70\% summarises data from a sample or a population.

Solution

Mario has interviewed 20 students from his math class in order to determine what students from his entire school. Because there are more than 20 students at the entire school, we know that 20 students is a sample. This means that 70\% summarises data from a sample.

Therefore 70\% is a statistic.

b

State the method that Mario used to gather data. Explain your reasoning.

Approach

From part (a) we know that Mario has interviewed a sample of students from his school, as opposed to the entire school population.

Solution

Because Mario interviewed a sample, we know that he used a sample survey to collect data.

c

State some potential sources of bias in this survey.

Approach

We know that a survey has bias if it is not representative of the entire population. Some ways this can happen are:

  • Sample size is too small
  • Poor sampling technique - such as using a convenience sample or other non-random methods
  • Poor questioning - such as using leading questions

Solution

Mario has interviewed students in a math class, it's possible that they may enjoy math more than other students at school and therefore be more inclined to make a statistics course mandatory. This could be a potential source of bias.

We know that Mario sampled 20 students, but the school population is 2 \, 300. If we compare this to the overall amount of students, this sample might be to small to represent the population. This is another source of potential bias.

We don't know the question Mario asked when surveying his sample so we can't say whether the question was leading, emotive or poorly worded. So therefore, depending on the question asked, this could be another potential source of bias.

d

Write an invalid conclusion based on Mario's survey results. Explain why the claim is invalid.

Approach

We can create an invalid conclusion by claiming something that is not necessarily a true fact - one way to do this is by making a conclusion based on the data when there is a source of potential bias in the data.

Solution

An example of an invalid claim could be:

70\% of all students at Mario's school want to make statistics a mandatory course.

Reflection

There are many invalid conclusions that we could make based on Mario's survey results. It's important to always consider the conclusion and look to the data for confirmation.

e

Mario's school has 3000 students. Using the Law of Large Numbers in your reasoning, explain why or why not you think Mario's sample is representative of the entire population.

Approach

The Law of Large Numbers states that the larger the sample that conclusions are being drawn from, the more representative of a population the conclusions are. Therefore we need to compare Mario's sample size to the population size.

Solution

Mario has only interviewed 20 students out of a total of 3000 - this is a big difference. Therefore by the Law of Large Numbers Mario's sample is not representative of the entire population.

Outcomes

M3.S.IC.A.1

Recognize the purposes of and differences among sample surveys, experiments, and observational studies.*

M3.S.IC.A.2

Identify potential sources of bias in statistical studies.*

M3.S.IC.A.3

Distinguish between a statistic and a parameter. Evaluate reports based on data and recognize when poor conclusions are drawn from well-collected data.*

M3.S.CP.B.4

Use the Law of Large Numbers to assess the validity of a statistical claim. *

M3.MP2

Reason abstractly and quantitatively.

M3.MP3

Construct viable arguments and critique the reasoning of others.

M3.MP4

Model with mathematics.

M3.MP5

Use appropriate tools strategically.

M3.MP6

Attend to precision.

What is Mathspace

About Mathspace