topic badge

7.025 Organise and display data

Lesson

Organising data

Before displaying data with a graph, we first need to organise it into meaningful groups. This is usually accomplished by organising the data in tables, including frequency tables. Continuous numerical data is usually best organised in grouped frequency tables.

Frequency and grouped frequency tables

Frequency tables are the best choice to organise categorical data and discrete numerical data when there is a small number of possible values.

Grouped frequency tables are best for continuous numerical data and discrete numerical data when the data can take a large number of possible values. The frequency recorded for a group is the sum of the frequencies for all data values contained in the group.

The tables below show examples of a frequency table used for categorical data, and a grouped frequency table used for continuous numerical data.

Frequency table

Colour of cars in the school carpark
Colour Frequency
white $14$14
red $2$2
blue $3$3
black $1$1
yellow $1$1

Grouped frequency table

Height of year 9 students
Height (cm) Frequency
$145-150$145150 $3$3
$150-155$150155 $10$10
$155-160$155160 $8$8
$160-165$160165 $13$13
$165-170$165170 $1$1

 

Grouped data

When we group data, we create class intervals, which tell us the range of scores in a particular group. Class intervals should all be equal size, and there should not be gaps between intervals.

For example, if our class interval is $1-5$15, we know that this class contains any values from $1$1 to $5$5, inclusive.  If the class interval is expressed as $1-<5$1<5, it includes any score that is greater than or equal to $1$1 and less than $5$5.

To help make it easier to work with our data, we usually find the class centre which is taken as the representative value of the class interval when we analyse the data. The class centre is the middle score of each class interval. For the interval $1-5$15, the class centre would be $\frac{1+5}{2}=3$1+52=3.

Selecting the interval width is important. If the intervals are too narrow there will be many gaps so the shape of the distribution will not be visible. If the intervals are too wide the shape of distribution will not be apparent. As a guide, $6$6 to  $12$12 intervals will typically be most useful for moderate size data sets.

 

Practice questions

Question 1

Find the class centre for the class interval $17$17-$22$22.

Question 2

What would be the most appropriate way of representing data from:

  1. A survey conducted of $1000$1000 people, asking them how many languages they speak?

    Leaving the data ungrouped and constructing a frequency table

    A

    Grouping the responses and constructing a frequency table

    B
  2. A survey conducted of $1000$1000 people, asking them how many different countries they know the names of?

    Grouping the responses and constructing a frequency table

    A

    Leaving the data ungrouped and constructing a frequency table

    B

Question 3

As part of a fuel watch initiative, the price of petrol at a service station was recorded each day for $21$21 days. The frequency table shows the findings.

Price (in cents per litre) Class Centre Frequency
$130.9$130.9-$135.9$135.9 $133.4$133.4 $6$6
$135.9$135.9-$140.9$140.9 $138.4$138.4 $5$5
$140.9$140.9-$145.9$145.9 $143.4$143.4 $5$5
$145.9$145.9-$150.9$150.9 $148.4$148.4 $5$5
  1. What was the highest price that could have been recorded?

  2. How many days was the price above $140.9$140.9 cents?

 

Displaying data

Once we have organised the data, we need to present the data in a form that will be easy to read, understand and analyse.

Displaying data

Some common ways of displaying statistical data are listed below.

  • histograms
  • bar charts and column graphs
  • dot plots
  • stem and leaf plots

The best type of display to be used will depend on the type of data and purpose of the investigation.

Another type of statistical graph, the box and whisker plot is used to display statistical summary data, and will be described in a later section.

 

Histograms, bar charts and column graphs

These graphs represent the frequency of data values as the length of horizontal bars or vertical columns.

Column graphs (also known as bar graphs) are usually used to display categorical data. Try the interactive tool below to practice creating column graphs for a data set.

 

Histograms are similar to column graphs, with vertical columns used to display numerical data. The main difference between a column graph and histogram is that histograms do not have spaces between the columns.

The reason that histograms do not have gaps between columns is that the class intervals are not separate categories. Instead, the columns represent the frequency of values observed in the class intervals for continuous data. The width of the columns indicates the range of values in the class intervals.

Worked examples

Example 1

Each student in a class was surveyed and asked about the colour of their eyes. The data is categorical and the results are displayed in a column graph (left) and horizontal bar chart (right) below:

 

Example 2

Each student in a class was surveyed and asked their heights. The data is numerical and the results are displayed in a histogram below:

The data that was collected in this survey is continuous data because it can take any value within a range. The height of each column represents the frequency of each data item within each interval.

 

Practice questions

Question 4

Data is represented in a column graph as shown:

DataScoreFrequency1020283032343638

  1. Complete the following frequency table:

    Score Frequency
    $28$28 $\editable{}$
    $30$30 $\editable{}$
    $32$32 $\editable{}$
    $34$34 $\editable{}$
    $36$36 $\editable{}$
    $38$38 $\editable{}$

Question 5

In product testing, the time when faults are detected in producing a certain machinery is recorded over the course of a day. The frequency table shows the results.

Working hours Frequency
$0-3$03 $10$10
$4-7$47 $14$14
$8-11$811 $20$20
$12-15$1215 $16$16
  1. Construct a histogram to represent the data.

    Faulty MachineryWorking hoursFrequency10201.55.59.513.5

 

Dot plots

Dot plots are a graphical way of displaying the distribution of numerical or categorical data on a simple scale with dots representing the frequency of data values. They are best used for small to medium size sets of data and are good for visually highlighting how the data is spread and whether there are any gaps in the data or outliers. We will look at identifying outliers in more detail in our next lesson.

In a dot plot, each individual value is represented by a single dot, displayed above a horizontal line. When data values are identical, the dots are stacked vertically. The graph appears similar to a pictograph or column graph with the number of dots representing the total count.

  • To correctly display the distribution of the data, the dots must be evenly spaced in columns above the line
  • The scale or categories on the horizontal line should be evenly spaced
  • A dot plot does not have a vertical axis
  • The dot plot should be appropriately labelled

 

Practice questions

Question 6

Here is a dot plot of the number of goals scored in each of Bob’s soccer games.

  1. How many times were five goals scored?

  2. Which number of goals were scored equally and most often?

    $1$1

    A

    $0$0

    B

    $4$4

    C

    $3$3

    D

    $2$2

    E

    $5$5

    F
  3. How many games were played in total?

Question 7

The goals scored by a football team in their matches are represented in the following dot plot.

  1. Complete the following frequency distribution table.

    Goals scored Frequency
    $0$0 $\editable{}$
    $1$1 $\editable{}$
    $2$2 $\editable{}$
    $3$3 $\editable{}$
    $4$4 $\editable{}$
    $5$5 $\editable{}$

 

Stem and leaf plot

A stem and leaf plot, or stem plot, is used for organising and displaying numerical data. It is appropriate for small to moderately sized data sets. The graph is similar to a column graph on its side. An advantage of a stem and leaf plot over a column graph is the individual scores are retained and further calculations can be made accurately.

In a stem and leaf plot, the right-most digit in each data value is split from the other digits, to become the 'leaf'. The remaining digits become the 'stem'.

The values in a stem and leaf plot should be arranged in ascending order (from lowest to highest) from the centre out. To emphasise this, it is often called an ordered stem and leaf plot.

The data values $10,13,16,21,26,27,28,35,35,36,41,41,45,46,49,50,53,56,58$10,13,16,21,26,27,28,35,35,36,41,41,45,46,49,50,53,56,58 are displayed in the stem and leaf plot below.

  • The stems are arranged in ascending order, to form a column, with the lowest value at the top
  • The leaf values are arranged in ascending order from the stem out, in rows, next to their corresponding stem
  • A single vertical line separates the stem and leaf values
  • There are no commas or other symbols between the leaves, only a space between them
  • In order to correctly display the distribution of the data, the leaves must line up in imaginary columns, with each data value directly below the one above
  • A stem and leaf plot includes a key that describes the way in which the stem and the leaf combine to form the data value

 

Practice questions

Question 8

Which of the following is true of a stem-and-leaf plot?

Stem Leaf
$0$0 $7$7
$1$1  
$2$2  
$3$3 $1$1 $3$3 $3$3 $3$3
$4$4 $1$1 $2$2 $3$3 $4$4 $9$9
$5$5 $1$1 $2$2 $4$4 $5$5 $5$5
$6$6 $0$0
 
Key: $1$1$\mid$$2$2$=$=$12$12
A stem-and-leaf plot is displayed. The plot is divided into two columns: "Stem" on the left, and "Leaf" on the right. The "Stem" column lists the digits in the order 0, 1, 2, 3, 4, 5, and 6, starting with 0 at the topmost column. Each digit in the "Stem" column is paired with aligned with a group of digits in the "Leaf" column. For stem 0, the leaf is 7. For stems 1 and 2, there are no leaves. For stem 3, the leaves are 1, 3, 3, and 3. For stem 4, the leaves are 1, 2, 3, 4, and 9. For stem 5, the leaves are 1, 2, 4, 5, and 5. For stem 6, the leaf is 0. Below the plot is a row named "Key," which explains the notation. On the "Key" row, it is written that 1 | 2 = 12.
  1. The scores are ordered.

    A

    A stem-and-leaf plot does not give an idea of outliers and clusters.

    B

    It is only appropriate for data where scores have high frequencies.

    C

    The individual scores cannot be read on a stem-and-leaf plot.

    D

Question 9

The stem-and-leaf plot below shows the age of people to enter through the gates of a concert in the first $5$5 seconds.

Stem Leaf
$1$1 $1$1 $2$2 $4$4 $5$5 $6$6 $6$6 $7$7 $9$9 $9$9
$2$2 $2$2 $3$3 $5$5 $5$5 $7$7
$3$3 $1$1 $3$3 $8$8 $9$9
$4$4  
$5$5 $8$8
 
Key: $1$1$\mid$$2$2$=$=$12$12
years old
  1. How many people passed through the gates in the first $5$5 seconds?

  2. What was the age of the youngest person?

    The youngest person was $\editable{}$ years old.

  3. What was the age of the oldest person?

    The oldest person was $\editable{}$ years old.

  4. What proportion of the concert-goers were under $20$20 years old?

Outcomes

ACMGM029

with the aid of an appropriate graphical display (chosen from dot plot, stem plot, bar chart or histogram), describe the distribution of a numerical dataset in terms of modality (uni or multimodal), shape (symmetric versus positively or negatively skewed), location and spread and outliers, and interpret this information in the context of the data

What is Mathspace

About Mathspace