Statistical data can be divided two types, categorical and numerical.
Data that is collected as a set of words is called categorical data.
Imagine asking someone for their favourite colour, country of birth, or gender. Their answer would always be a word. We can also think of categorical data as values which can be sorted into groups or categories.
A class was surveyed about where they went on their most recent holiday. What kind of data are the survey results?
Categorical data is made up of words.
When the data is a set of numbers, it is called numerical data.
Imagine asking someone for their height, their age, how many pets they own, or how long they spend on social media each day. Their answers would always be a number.
Numerical data is divided into two types, continuous and discrete.
Discrete numerical data is counted, so its values are separated. If you asked someone to tell you their shoe size they might say "10", they might even say "10 and a half", but they would not say "10 and seven sixteenths". If you have to count to find the answer, the data is discrete.
All the students in your school take a survey with the four questions below. Which two will have discrete numerical data as their results?
Numerical data is made up of two types:
When we conduct surveys, the responses together form a set of data that we can analyse. It is essential that survey questions are clear, direct, and use neutral language. If our survey is biased, the respondents may feel pressured to give certain answers, or end up confused, and our analysis could be very misleading.
There are three broad biases to watch out for:
Emotive or leading language is language which is not neutral and evokes an emotional reaction from the responder.
Imagine playing a song for someone and asking them: "Don't you think this song is totally amazing?". They may feel pressured to say "Yes".
If you asked them "Do you actually like this terrible song?", they may feel pressured to say "No".
Instead, asking them the simple, neutral question "Do you like this song?" is the best way to find out their true opinion.
Questions may make false assumptions about the responder which can make them difficult or impossible to answer.
Consider asking someone to write down a response to this question: "Does your dog like going for walks?". How would someone answer this question if they don't have a dog?
Neither "Yes" nor "No" would make sense, so they may leave the answer blank. If you look at their response later, how would you know why they left it blank?
To solve this problem, we can break the question into two parts. The first question could be, "Do you own a dog?". The second question could be, "If you own a dog, does your dog like going for walks?". This removes the confusion.
On the other hand, sometimes there might be more than one question combined into a single question, leaving the respondent confused or unsure of how to answer.
Consider this question: "Do you like cats and dogs?".
We could improve the question by having four options: "yes", "yes both", "yes but only cats" and "yes but only dogs".
But an even better way would be to split the question into two questions, "Do you like cats?" and "Do you like dogs?", both with yes or no answers. This method is the most clear and direct way of obtaining the same data.
Generally speaking, we want each question in a survey to only ask one question.
By avoiding these biases we can be more sure about the data we collect to analyse later.
A survey asks the question below.
The Prime Minister believes that taxes are too high, do you think taxes are too high?
What makes this a poor survey question?
There are three broad biases to watch out for: