# 11.01 Summarising data

Statistical data can be divided two types: categorical and numerical. There are four ways of summarising numerical data: the mode, mean, median and range.

### Categorical data

Data that is collected as a set of words is called categorical data

Imagine asking someone for their favourite colour, country of birth, or gender. Their answer would always be a word. We can also think of categorical data as values which can be sorted into groups or categories.

### Numerical data

When the data is a set of numbers, it is called numerical data.

Imagine asking someone for their height, their age, or how long they spend on social media each day. Their answers would always be a number.

Numerical data is divided into two types: continuous and discrete.

Discrete numerical data is counted, so its values are separated. If you asked someone to tell you how many pets they have they might say "$4$4", but they would not say "$4$4 and seven sixteenths".

Continuous data is measured, so it can take any value within a range - there are an infinite number of possible values. If we measure an animal's height, we might find any reasonable value, limited only by the precision of our ruler.

Data types

Categorical data is made up of words.

Numerical data is made up of numbers.

• Discrete numerical data is counted.
• Continuous numerical data is measured.

### The mode

The mode of a data set is the most commonly occurring score.

#### Exploration

A class quiz is marked out of $10$10, and ten students receive the following marks:

$10,7,6,9,7,8,6,7,7,8$10,7,6,9,7,8,6,7,7,8

To find the mode we can count how many times each score occurred (the frequency). It can help to order the scores first:

$6,6,7,7,7,7,8,8,9,10$6,6,7,7,7,7,8,8,9,10

The score $6$6 appears twice, which means it has a frequency of $2$2. Similarly, $7$7 has a frequency of $4$4, $8$8 has a frequency of $2$2 and $9$9 and $10$10 each have a frequency of $1$1.

This means that $7$7 has the highest frequency, therefore the mode is $7$7.

Mode

The mode of a data set is the result with the highest frequency.

The frequency is the number of times that a score occurs.

If there are multiple results that share the highest frequency then there will be more than one mode.

### The mean

The mean of a data set is an average score.

#### Exploration

Three friends are planning a trip to Alice Springs. They plan to fly there, and discover that the airline imposes a weight limit on their luggage of $20$20 kg per person. On the night before the flight they weigh their luggage and find that their luggage weights form this data set:

$17,18,22$17,18,22

One of them has packed too much. They decide to share their luggage around so that they all carry the same amount. How much does each person carry now?

Thinking about it using more mathematical language, we are sharing the total luggage equally among three groups. As a mathematical expression, we find:

$\frac{17+18+22}{3}=\frac{57}{3}=19$17+18+223=573=19

Each person carries $19$19 kg. This amount is the mean of the data set.

Summary

If we replace every number in a numerical data set with the mean, the sum of the numbers in the data set will be the same.

To calculate the mean, use the formula:

$\text{mean}=\frac{\text{sum of scores}}{\text{number of scores}}$mean=sum of scoresnumber of scores

#### Worked example

Find the mean of this data set:

$4,7,1,2,3$4,7,1,2,3

Think: There are $5$5 scores, so we should add these numbers all together and divide by $5$5.

Do: $4+7+1+2+3=17$4+7+1+2+3=17, and $17\div5=3.4$17÷​5=3.4.

Reflect: Even though all the numbers in the data set are whole numbers, the mean is a decimal. If the data set was produced from a survey question "How many siblings do you have?", we would say the mean number of siblings was $3.4$3.4, even though it isn't possible to have $0.4$0.4 siblings! The mean is a way to summarise data - it is not part of the data set itself.

### The median

The median of a data set is another kind of average.

#### Exploration

Seven people were asked about their weekly income, and their responses form this data set:

