topic badge
Australia
Year 9

10.01 Data collection

Lesson

Introduction

Statistical data can be divided two types, categorical and numerical.

Categorical data

Data that is collected as a set of words is called categorical data.

Imagine asking someone for their favourite colour, country of birth, or gender. Their answer would always be a word. We can also think of categorical data as values which can be sorted into groups or categories.

Examples

Example 1

A class was surveyed about where they went on their most recent holiday. What kind of data are the survey results?

A
Categorical
B
Discrete numerical
C
Continuous numerical
Worked Solution
Create a strategy

Determine whether the answers would be words or numbers. If they would answer with a word, the data is categorical. If they would respond with a number, the data is numerical.

Apply the idea

Names of places are words. So the correct answer is option A.

Idea summary

Categorical data is made up of words.

  • Examples include: favourite country, types of pet owned, mode of transport, and preferred film genre.

Numerical data

When the data is a set of numbers, it is called numerical data.

Imagine asking someone for their height, their age, how many pets they own, or how long they spend on social media each day. Their answers would always be a number.

Numerical data is divided into two types, continuous and discrete.

Discrete numerical data is counted, so its values are separated. If you asked someone to tell you their shoe size they might say "10", they might even say "10 and a half", but they would not say "10 and seven sixteenths". If you have to count to find the answer, the data is discrete.

A large ruler measuring the height of an elephant.

Continuous data is measured, so it can take any value within a range - there are an infinite number of possible values. If we measure an animal's height, we might find any reasonable value, limited only by the precision of our ruler. If you have to measure to find the answer, the data is continuous.

Examples

Example 2

All the students in your school take a survey with the four questions below. Which two will have discrete numerical data as their results?

A
How many pets do you own?
B
How many times have you broken your arm?
C
How long does it take for you to get to school everyday?
D
What kinds of pet do you own?
Worked Solution
Create a strategy

Consider the answers to these questions. If the answers are numbers, would they be found from counting (discrete), or would be fount from measuring (continuous)?

Apply the idea

Option A requires counting of pets.

Option B requires counting of times they broke their arm.

Option C requires measuring time.

Option D would be word answers.

So options A and B would have discrete numerical data as their results.

Idea summary

Numerical data is made up of two types:

  • Discrete numerical data is counted.
    • Examples include: number of pets owned, number of books read, population of a city, and goals scored in a season.
  • Continuous numerical data is measured.
    • Examples include: Height of a person, weight of an animal, length of a movie, and time taken to run a race.

Data collection

We get the best data from a census because it includes the entire population. However, it's not always possible to conduct a census, so we often get our data from surveys instead.

When we take a survey it is important that the results are representative of the population. This means that the results that we get for any question we ask of the survey would be the same as if we asked it of a census. This also means that the mean, median, mode and range of the survey should be very close to the same results of the census (although getting exactly the same results is almost impossible).

If a survey is not representative, we call it biased. There are a number of potential sources of bias that we should avoid:

  • Consider who is being surveyed. If the people being surveyed do not resemble the population, the survey is likely to be biased. For example, surveying train travellers about their opinions on public transport will likely give very different results than a census of the entire population.

  • Also consider how many people are being surveyed. Asking one person's opinion will not tell you anything about anyone else's opinion. In general, the bigger the number of people being surveyed, the closer the results will be to a census.

  • Make sure that the questions being asked actually address the question at hand. For example, asking, "Do you approve of the current governing party?" does not give the same results as asking, "Will you vote for the current governing party in the next election?"

  • Avoid questions which use emotive language or might otherwise influence the results of the survey. For example, asking, "Do you watch the most popular sport, soccer?" will be biased unlike asking, "Do you watch soccer?". These are referred to as "leading questions" as they lead the person being surveyed to a particular answer.

Once we have collected data we need to find a way to organise and display it.

Examples

Example 3

Consider the survey question and the sample and determine whether the outcomes are likely to be biased or not.

a

Yvonne is asking people on her soccer team, "What's your favourite sport?"

Worked Solution
Create a strategy

Consider the following:

  • Consider who is being surveyed.

  • How many people are being surveyed.

  • Whether the question being asked actually address the question at hand.

  • Consider whether the question is leading.

Apply the idea

Since the question being asked is about the favourite sport, the soccer team will probably answer soccer since they play it. So the outcomes are likely to be biased.

b

Lachlan randomly selected people from his school to find about the school sports. He asked, "What's your favourite school sport?"

Worked Solution
Apply the idea

Lachlan randomly selected students so there is no bias with the system. The question is not leading. So the outcomes are not likely to be biased.

c

Tricia randomly selected people from her school and asked, "The local AFL team is donating money to our school this term. What's your favourite sport?"

Worked Solution
Apply the idea

The question uses leading language by stating the donation of the local AFL team, so people may feel pressured to choose AFL. So the outcomes are likely to be biased.

Example 4

Which one of the following data types is discrete?

A
Your height
B
The time it takes to swim 200 meters
C
Daily temperature
D
The number of pets in your family
Worked Solution
Create a strategy

Choose the option that can be counted but are distinct and separate from each other.

Apply the idea

The correct answer is option D: The number of pets in your family.

Example 5

Classify this data into its correct category: Weights of kittens

A
Quantitative Discrete
B
Qualitative Nominal
C
Quantitative Continuous
D
Qualitative Ordinal
Worked Solution
Create a strategy

Determine if the data is numerical or in categories.

Apply the idea

A weight of a kitten can be measured. So it is numerical or quantitative.

Weight is a measurement that can have any number of decimal places, so it is continuous. The correct answer is Option C.

Example 6

What type of data is each of the following:

a

The time spent watching TV each day.

A
Categorical
B
Numerical
Worked Solution
Create a strategy

Determine if the type of data will be words or numbers.

Apply the idea

The data refers to the time spent that is counted or measured using a number.

So the type of data is numerical. So, the correct answer is B.

b

Favourite TV show.

A
Categorical
B
Numerical
Worked Solution
Apply the idea

The data refers to a name of a TV show that is a word.

So the type of data is categorical. The correct answer is A.

Idea summary

There are a number of potential sources of bias that we should avoid:

  • Consider who is being surveyed. If the people being surveyed do not resemble the population, the survey is likely to be biased.

  • Also consider how many people are being surveyed. Asking one person's opinion will not tell you anything about anyone else's opinion.

  • Make sure that the questions being asked actually address the question at hand.

  • Avoid questions which use emotive language or might otherwise influence the results of the survey.

Outcomes

AC9M9ST01

analyse reports of surveys in digital media and elsewhere for information on how data was obtained to estimate population means and medians

AC9M9ST02

analyse how different sampling methods can affect the results of surveys and how choice of representation can be used to support a particular point of view

AC9M9ST04

choose appropriate forms of display or visualisation for a given type of data; justify selections and interpret displays for a given context

What is Mathspace

About Mathspace