One common way of collecting data is through a survey. Conducting a survey involves choosing a question to ask and then recording the answer. This is great for collecting information, but at the end we are left with a long list of answers that can be difficult to interpret.

This is where tables come in. We can use tables to organise our data so that we can interpret it at a glance.

Tables, frequencies and modes

When conducting a survey, the three main steps are:

Gathering the data
Organising the data
Interpreting the data

We looked at what questions we should ask when gathering different types of data. Now we are going to look at how tables can be used to help us organise and interpret those data.

Exploration

Suppose that Melanie wanted to find the least common colour of car in her neighbourhood. To help her find an answer to this, she conducted a survey by observing the colours of the cars passing through her street.

By sitting in front of her house and recording the colour of the first $20$20 cars that drove past, Melanie obtained the following data:

white, black, white, black, black, blue, blue, white, red, white,

white, blue, orange, blue, white, white, orange, red, blue, red

In order to better interpret her data, Melanie converted this list of colours into a frequency table, counting the number of cars corresponding to a particular colour and writing that number in the frequency column next to that colour.

Car colour	Frequency
Red	$3$3
Black	$3$3
White	$7$7
Orange	$2$2
Blue	$5$5

Looking at her table, Melanie found that orange was the least common car colour in her neighbourhood as it had the lowest frequency.

Frequency

The frequency of a result is the number of times that it appears in the list of data.

Melanie has answered her initial question, but she realises she can use the same data to answer other questions about the colours of cars in her neighbourhood.

a) What percentage of the cars were black?

When comparing quantities of a large population, it is easier to compare them by using percentages, rather than their exact quantities.

We can read from the table that $3$3 cars were black. Since Melanie recorded the colour of $20$20 cars, this means that $3$3 out of $20$20 of the cars were black. We can express this as the fraction $\frac{3}{20}$320. To convert this to a percentage we can multiply the numerator and denominator by $5$5:

$\frac{3}{20}\times\frac{5}{5}$320×55	$=$=	$\frac{15}{100}$15100
	$=$=	$15%$15%

b) What was the most common colour of car?

Looking at the table, we can see that the result with the highest frequency is the colour white, so this was the most common colour. This means that the mode of the data is "white".

Mode

The mode of a data set is the result with the highest frequency.

If there are multiple results that share the highest frequency then there will be more than one mode.

Frequency tables

When representing the frequency of different results in our data, we often choose to use a frequency table.

Frequency table

A frequency table communicates the frequency of each result from a set of data. This is often represented as a column table with the far-left column describing the result and any columns to the right recording frequencies of different result types.

As seen in the exploration, frequency tables can help us find the least or most common results among categorical data. They can also allow us to calculate what fraction of the data a certain result represents.

When working with numerical data, frequency tables can also help us to answer other questions that we might have about how the data are distributed.

Practice question

Question 1

Thomas conducted a survey on the average number of hours his classmates exercised per day and displayed his data in the table below.

No. exercise hours	Frequency
$0$0	$2$2
$1$1	$12$12
$2$2	$7$7
$3$3	$5$5
$4$4	$0$0
$5$5	$3$3

How many classmates did Thomas survey?
What is the mode of the data?
How many classmates exercised for less than three hours?
How many classmates exercised for at least three hours?

In the practice question above, Thomas found that there were no classmates who exercised for $4$4 hours. Instead of leaving the frequency blank, Thomas put $0$0 as the frequency. If he had left this information out of the table then we would not know how many classmates fit this category.

To calculate the total number of data points we add up all the frequencies. Then to calculate the total for "less than" some number, we add up the frequencies for the results that are less than that number. Similarly, when calculating the total for "at least" some number, we add up the frequencies that are more than or equal to that number.

Grouped frequency tables

When the data are more spread out, sometimes it doesn't make sense to record the frequency for each separate result and instead we group results together to get a grouped frequency table.

Grouped frequency table

A grouped frequency table combines multiple results into a single group. We can find the frequency of a group by adding all the frequencies of the results contained in that group.

Exploration

A teacher wants to express the heights (in cm) of her students in a table using the following data points:

$189,154,146,162,165,156,192,175,167,174$189,154,146,162,165,156,192,175,167,174

$161,153,184,177,155,192,169,166,148,170$161,153,184,177,155,192,169,166,148,170

$168,151,186,152,195,169,143,164,170,177$168,151,186,152,195,169,143,164,170,177

She realises that if each result has its own frequency then the table would have too many rows, so instead she grouped the results into sets of $10$10 cm. As a result, her grouped frequency table looked like this:

Height (cm)	Frequency
$140-149$140−149
$150-159$150−159
$160-169$160−169
$170-179$170−179
$180-189$180−189
$190-199$190−199

To fill in the frequency for each group, the teacher counted the number of results that fell into the range of each group.

For example, the group $150-159$150−159 would include the results:

$154,156,153,155,151,152$154,156,153,155,151,152

Since there are $6$6 results that fall into the range of this group, this group has a frequency of $6$6.

Using this method, the teacher filled in the grouped frequency table to get:

Height (cm)	Frequency
$140-149$140−149	$3$3
$150-159$150−159	$6$6
$160-169$160−169	$9$9
$170-179$170−179	$6$6
$180-189$180−189	$3$3
$190-199$190−199	$3$3

Looking at the table, she can see that the modal class is the group $160-169$160−169, since it has the highest frequency.

By adding the frequencies in the bottom two rows she could also see that $6$6 students were at least $180$180 cm tall. There are $30$30 students in the class in total, so she now knows that $\frac{6}{30}$630 of her students, or $20%$20% of the class, are taller than $180$180 cm.

Modal class

The modal class in a grouped frequency table is the group that has the highest frequency.

If there are multiple groups that share the highest frequency then there will be more than one modal class.

As we can see, grouped frequency tables are useful when the data are more spread out. While the teacher could have obtained the same information from a normal frequency table, the grouping of the results condensed the data into an easier to interpret form.

However, the drawback of a grouped frequency table is that the data becomes less precise, since we have grouped multiple data points together rather than looking at them individually.

Practice questions

Question 2

$Yvonne$ asks $15$15 of her friends what their favourite colour is. She writes down their answer. Here is what she wrote down:

$blue, pink, blue, yellow, green, pink, pink, yellow, green, blue, yellow, pink, yellow, pink, pink$

Count the number of each colour and fill in the table.

Colour Number of Friends

pink $\editable{}$

green $\editable{}$

blue $\editable{}$

yellow $\editable{}$
Which colour is the mode?
$pink$
A
yellow
B
green
C
$blue$
D

Colour	Number of Friends
pink	$\editable{}$
green	$\editable{}$
blue	$\editable{}$
yellow	$\editable{}$

Question 3

A survey of $30$30 people asked them how many $video games$ they had $played$ in the past month. Select true or false for each of the following statements:

Number of $video games$ $played$	Frequency
$0$0$-$−$4$4	$5$5
$5$5$-$−$9$9	$12$12
$10$10$-$−$14$14	$9$9
$15$15$-$−$19$19	$4$4

"We know that $25$25 people $played$ $10$10 or more $video games$ ."
True
A
False
B
"We know that $17$17 people $played$ $7$7 or fewer $video games$ ."
True
A
False
B
"$21$21 people $played$ more than $4$4 but less than $15$15 $video games$ ."
True
A
False
B
"The modal class was $$5$ 5 - $9$ 9$ $video games$ ."
True
A
False
B

Representing data can be tricky because we want to make it easy to interpret without losing any information. Most of the time we are forced to compromise, either making the data simpler to express in a simpler manner or instead having more complex ways to express our data.

The line plot

The line plot is a useful way to express discrete data in a visually simple manner. The main advantages of the line plot are that we can find the most common score and range very easily, as well as quickly see how the data is distributed.

Range of a data set

The range of a data set is the difference between the highest score and the lowest score:

$\text{Range}$Range

$=$=

$\text{Highest score}-\text{Lowest score}$Highest score−Lowest score

The line plot is particularly suited to discrete data where the frequency of results are often greater than one.

Line plot

In a line plot, each dot represents one data point belonging to the result that it is placed above.

The most common score of a line plot will be the score with the most dots. Since a line plot stacks vertically, the highest column(s) will belong to the most common score.

Let's have a look at an example of a line plot.

Worked Example

example 1

Consider the line plot below.

a) What is the most common score for this line plot?

b) What is the highest score?

c) What is the lowest score?

d) How many scores are there all together?

Think: The most common score will have the highest column of dots, and the lowest and highest scores will be at the left and right end of the line plot.

Do: Since the stack of dots above "$1$1" is the highest, we know that the most common is "$1$1".

The highest score is $6$6, since that is the score most to the right on the line, and the lowest score is $0$0. since that is the score most to the right on the line.

By counting the total number of dots in the line plot, we find that there are $19$19 dots. So there are $19$19 data scores.

Reflect: The useful thing about a line plot is that we can very easily plot data onto it by adding a dot into the relevant column for each score. This is surprisingly easy since it does not involve ordering the data or counting each specific result.

Have a go at this in the practice question below.

The stem-and-leaf plot

The stem-and-leaf plot is an example of a way to express data in a more complicated way so that we can express more information visually. In particular, the stem-and-leaf plot is used when we have lots of numerical data points.

A stem-and-leaf plot is made up of two components, the stem and the leaf. The stem is usually used to represent the tens part of a score while the leaf is used to represent the ones part of the score.

For example, the score $52$52 would be expressed on a stem-and-leaf plot like so:

Stem

Leaf

$5$5

$2$2

Key: $5$5$\mid$∣$2$2$=$=$52$52

As we can see, we expressed the score by writing the ones digit in the row corresponding to its tens digit. In other words, we attached the leaf, $2$2, to its stem, $5$5, to make the score $52$52.

What is useful about the stem-and-leaf plot is that we can record as many scores as we like by writing the leaves in the appropriate rows. As such, we could express the data set:

$52,46,31,57,49,51,52,30$52,46,31,57,49,51,52,30

with this stem-and-leaf plot:

Stem

Leaf

$3$3

$0$0$1$1

$4$4

$6$6$9$9

$5$5

$1$1$2$2$2$2$7$7

Key: $5$5$\mid$∣$2$2$=$=$52$52

As we can see, each score has been expressed on the plot as a ones digit written in its tens row.

Notice that the leaves have been arranged in ascending order from left to right. We need to do this so that we can find the median without jumping back and forth across our rows.

It is also worth noting that if there is more than one of the same score, in this case $52$52 appears twice, each score should have its own leaf.

Worked Example

example 2

Consider the stem-and-leaf plot below.

Stem

Leaf

$1$1

$1$1$3$3$7$7

$2$2

$0$0$2$2$2$2$5$5$8$8$8$8$8$8$8$8$9$9

$3$3

$0$0$1$1$2$2$7$7$7$7$9$9$9$9

$4$4

$1$1$2$2$6$6

$5$5

$0$0$3$3$7$7

$6$6

$1$1$6$6

$7$7

$2$2$6$6

$8$8

$1$1

Key: $5$5$\mid$∣$2$2$=$=$52$52

a) What is the smallest score?

b) What is the biggest score?

c) How many scores are there?

Think: The smallest score is represented by the first number in the leaves and the stem in the first row. The largest score is represented by the last number in the leaves and the stem in the last row.

Do: The greatest score is represented by the rightmost leaf in the bottom row. Attaching this leaf to its stem gives us $81$81. Similarly, the least score is represented by the leftmost leaf in the top row which gives us $11$11. By counting the total number of leaves in the plot, we find that there are $30$30 scores.

Practice questions

Question 4

A city council selected a number of houses at random. They determined the fastest travel time from each house to the nearest hospital, and produced these results (in minutes):
$25,37,16,27,27,35,21,18,19,49,14,19,31,42,18$25,37,16,27,27,35,21,18,19,49,14,19,31,42,18

Represent this data in an ordered stem-and-leaf plot, with one leaf for each score and commas between each leaf:

Stem Leaf

$1$1 $\editable{}$

$2$2 $\editable{}$

$3$3 $\editable{}$

$4$4 $\editable{}$

Key: $2$2$\mid$∣$5$5$=$=$25$25

Using the key for stem-and-leaf plots

While stem-and-leaf plots are used primarily to store data of two digit numbers, there are some cases where the stem and leaf might mean something different. It is for this reason that we should always check the key before translating the leaves into scores.

For example, in the stem-and-leaf plot below the stem represents the whole number of kilometres while the leaf represents tenths of a kilometre.

Stem

Leaf

$1$1

$3$3$6$6$7$7

$2$2

$0$0$2$2$2$2$7$7

$3$3

$8$8$9$9

Key: $2$2$\mid$∣$7$7$=$=$2.7$2.7 km

There are also cases of the stem representing the number of tens as usual, except it uses two digit numbers in the stem to express three digit numbers. In this case, the score $128$128 is represented by the leaf "$8$8" attached to the "$12$12" stem.

Stem

Leaf

$9$9

$2$2$5$5

$10$10

$0$0$3$3$3$3$9$9

$11$11

$7$7$8$8

$12$12

$3$3$4$4$6$6$8$8

$13$13

$1$1$1$1$4$4

Key: $12$12$\mid$∣$8$8$=$=$128$128

In both cases, we need the key to tell us how to interpret the stem-and-leaf plot since the data is different from our usual two digit scores.

Outcomes

7.D1.1

Explain why percentages are used to represent the distribution of a variable for a population or sample in large sets of data, and provide examples.

7.D1.2

Collect qualitative data and discrete and continuous quantitative data to answer questions of interest, and organize the sets of data as appropriate, including using percentages.

13.02 Displaying data

Tables, frequencies and modes

Exploration

Frequency tables

Practice question

Question 1

Grouped frequency tables

Exploration

Practice questions

Question 2

Question 3

The line plot

Worked Example

example 1

The stem-and-leaf plot

Worked Example

example 2

Practice questions

Question 4

Using the key for stem-and-leaf plots

Outcomes

7.D1.1

7.D1.2

What is Mathspace

About Mathspace