topic badge

13.01 Types of data

Lesson

Introduction

Statistical data can be divided two types, categorical and numerical.

Categorical data

Data that is collected as a set of words is called categorical data.

Imagine asking someone for their favourite colour, country of birth, or gender. Their answer would always be a word. We can also think of categorical data as values which can be sorted into groups or categories.

Examples

Example 1

A class was surveyed about where they went on their most recent holiday. What kind of data are the survey results?

A
Categorical
B
Discrete numerical
C
Continuous numerical
Worked Solution
Create a strategy

Determine whether the answers would be words or numbers. If they would answer with a word, the data is categorical. If they would respond with a number, the data is numerical.

Apply the idea

Names of places are words. So the correct answer is option A.

Idea summary

Categorical data is made up of words.

  • Examples include: favourite country, types of pet owned, mode of transport, and preferred film genre.

Numerical data

When the data is a set of numbers, it is called numerical data.

Imagine asking someone for their height, their age, how many pets they own, or how long they spend on social media each day. Their answers would always be a number.

Numerical data is divided into two types, continuous and discrete.

Discrete numerical data is counted, so its values are separated. If you asked someone to tell you their shoe size they might say "10", they might even say "10 and a half", but they would not say "10 and seven sixteenths". If you have to count to find the answer, the data is discrete.

A large ruler measuring the height of an elephant.

Continuous data is measured, so it can take any value within a range - there are an infinite number of possible values. If we measure an animal's height, we might find any reasonable value, limited only by the precision of our ruler. If you have to measure to find the answer, the data is continuous.

Examples

Example 2

All the students in your school take a survey with the four questions below. Which two will have discrete numerical data as their results?

A
How many pets do you own?
B
How many times have you broken your arm?
C
How long does it take for you to get to school everyday?
D
What kinds of pet do you own?
Worked Solution
Create a strategy

Consider the answers to these questions. If the answers are numbers, would they be found from counting (discrete), or would be fount from measuring (continuous)?

Apply the idea

Option A requires counting of pets.

Option B requires counting of times they broke their arm.

Option C requires measuring time.

Option D would be word answers.

So options A and B would have discrete numerical data as their results.

Idea summary

Numerical data is made up of two types:

  • Discrete numerical data is counted.
    • Examples include: number of pets owned, number of books read, population of a city, and goals scored in a season.
  • Continuous numerical data is measured.
    • Examples include: Height of a person, weight of an animal, length of a movie, and time taken to run a race.

Question biases

When we conduct surveys, the responses together form a set of data that we can analyse. It is essential that survey questions are clear, direct, and use neutral language. If our survey is biased, the respondents may feel pressured to give certain answers, or end up confused, and our analysis could be very misleading.

There are three broad biases to watch out for:

Emotive or leading language is language which is not neutral and evokes an emotional reaction from the responder.

Imagine playing a song for someone and asking them: "Don't you think this song is totally amazing?". They may feel pressured to say "Yes".

If you asked them "Do you actually like this terrible song?", they may feel pressured to say "No".

Instead, asking them the simple, neutral question "Do you like this song?" is the best way to find out their true opinion.

Questions may make false assumptions about the responder which can make them difficult or impossible to answer.

Consider asking someone to write down a response to this question: "Does your dog like going for walks?". How would someone answer this question if they don't have a dog?

Neither "Yes" nor "No" would make sense, so they may leave the answer blank. If you look at their response later, how would you know why they left it blank?

To solve this problem, we can break the question into two parts. The first question could be, "Do you own a dog?". The second question could be, "If you own a dog, does your dog like going for walks?". This removes the confusion.

On the other hand, sometimes there might be more than one question combined into a single question, leaving the respondent confused or unsure of how to answer.

Consider this question: "Do you like cats and dogs?".

We could improve the question by having four options: "yes", "yes both", "yes but only cats" and "yes but only dogs".

But an even better way would be to split the question into two questions, "Do you like cats?" and "Do you like dogs?", both with yes or no answers. This method is the most clear and direct way of obtaining the same data.

Generally speaking, we want each question in a survey to only ask one question.

By avoiding these biases we can be more sure about the data we collect to analyse later.

Examples

Example 3

A survey asks the question below.

The Prime Minister believes that taxes are too high, do you think taxes are too high?

What makes this a poor survey question?

A
The question asks more than one question.
B
The question makes a false assumption.
C
The question uses emotive or leading language.
Worked Solution
Create a strategy

Think about how you would feel if someone asked you this question. Would you feel confused? Would you feel pressured to give one answer more than the other?

Apply the idea

The question uses leading language by stating what the Prime Minister believes. So the correct answer is option C.

Reflect and check

Here is a better version of the question:

"What do you think about current tax levels?"

We reconstruct the question to be more neutral as it will not evoke any emotional reaction.

Idea summary

There are three broad biases to watch out for:

  • Emotive or leading language is language which is not neutral and evokes an emotional reaction from the responder.
  • Questions may make false assumptions about the responder which can make them difficult or impossible to answer.
  • There might be more than one question combined into a single question, leaving the respondent confused or unsure of how to answer.

Outcomes

VCMSP268

Identify and investigate issues involving numerical data collected from primary and secondary sources

What is Mathspace

About Mathspace