Before displaying data with a graph, we first need to organise it into meaningful groups. This is usually accomplished by organising the data in tables, including frequency tables. Continuous numerical data is usually best organised in grouped frequency tables.
Frequency tables are the best choice to organise categorical data and discrete numerical data when there is a small number of possible values.
Grouped frequency tables are best for continuous numerical data and discrete numerical data when the data can take a large number of possible values. The frequency recorded for a group is the sum of the frequencies for all data values contained in the group.
The tables below show examples of a frequency table used for categorical data, and a grouped frequency table used for continuous numerical data.
Frequency table
Colour | Frequency |
---|---|
white | $14$14 |
red | $2$2 |
blue | $3$3 |
black | $1$1 |
yellow | $1$1 |
Grouped frequency table
Height (cm) | Frequency |
---|---|
$145-150$145−150 | $3$3 |
$150-155$150−155 | $10$10 |
$155-160$155−160 | $8$8 |
$160-165$160−165 | $13$13 |
$165-170$165−170 | $1$1 |
When we group data, we create class intervals, which tell us the range of scores in a particular group. Class intervals should all be equal size, and there should not be gaps between intervals.
For example, if our class interval is $1-5$1−5, we know that this class contains any values from $1$1 to $5$5, inclusive. If the class interval is expressed as $1-<5$1−<5, it includes any score that is greater than or equal to $1$1 and less than $5$5.
To help make it easier to work with our data, we usually find the class centre which is taken as the representative value of the class interval when we analyse the data. The class centre is the middle score of each class interval. For the interval $1-5$1−5, the class centre would be $\frac{1+5}{2}=3$1+52=3.
Selecting the interval width is important. If the intervals are too narrow there will be many gaps so the shape of the distribution will not be visible. If the intervals are too wide the shape of distribution will not be apparent. As a guide, $6$6 to $12$12 intervals will typically be most useful for moderate size data sets.
Find the class centre for the class interval $17$17-$22$22.
What would be the most appropriate way of representing data from:
A survey conducted of $1000$1000 people, asking them how many languages they speak?
Leaving the data ungrouped and constructing a frequency table
Grouping the responses and constructing a frequency table
A survey conducted of $1000$1000 people, asking them how many different countries they know the names of?
Grouping the responses and constructing a frequency table
Leaving the data ungrouped and constructing a frequency table
As part of a fuel watch initiative, the price of petrol at a service station was recorded each day for $21$21 days. The frequency table shows the findings.
Price (in cents per litre) | Class Centre | Frequency |
---|---|---|
$130.9$130.9-$135.9$135.9 | $133.4$133.4 | $6$6 |
$135.9$135.9-$140.9$140.9 | $138.4$138.4 | $5$5 |
$140.9$140.9-$145.9$145.9 | $143.4$143.4 | $5$5 |
$145.9$145.9-$150.9$150.9 | $148.4$148.4 | $5$5 |
What was the highest price that could have been recorded?
How many days was the price above $140.9$140.9 cents?
Once we have organised the data, we need to present the data in a form that will be easy to read, understand and analyse.
Some common ways of displaying statistical data are listed below.
The best type of display to be used will depend on the type of data and purpose of the investigation.
Another type of statistical graph, the box and whisker plot is used to display statistical summary data, and will be described in a later section.
These graphs represent the frequency of data values as the length of horizontal bars or vertical columns.
Column graphs (also known as bar graphs) are usually used to display categorical data. Try the interactive tool below to practice creating column graphs for a data set.
Histograms are similar to column graphs, with vertical columns used to display numerical data. The main difference between a column graph and histogram is that histograms do not have spaces between the columns.
The reason that histograms do not have gaps between columns is that the class intervals are not separate categories. Instead, the columns represent the frequency of values observed in the class intervals for continuous data. The width of the columns indicates the range of values in the class intervals.
Each student in a class was surveyed and asked about the colour of their eyes. The data is categorical and the results are displayed in a column graph (left) and horizontal bar chart (right) below:
Each student in a class was surveyed and asked their heights. The data is numerical and the results are displayed in a histogram below:
The data that was collected in this survey is continuous data because it can take any value within a range. The height of each column represents the frequency of each data item within each interval.
Data is represented in a column graph as shown:
Complete the following frequency table:
Score | Frequency |
---|---|
$28$28 | $\editable{}$ |
$30$30 | $\editable{}$ |
$32$32 | $\editable{}$ |
$34$34 | $\editable{}$ |
$36$36 | $\editable{}$ |
$38$38 | $\editable{}$ |
In product testing, the time when faults are detected in producing a certain machinery is recorded over the course of a day. The frequency table shows the results.
Working hours | Frequency |
---|---|
$0-3$0−3 | $10$10 |
$4-7$4−7 | $14$14 |
$8-11$8−11 | $20$20 |
$12-15$12−15 | $16$16 |
Construct a histogram to represent the data.
Dot plots are a graphical way of displaying the distribution of numerical or categorical data on a simple scale with dots representing the frequency of data values. They are best used for small to medium size sets of data and are good for visually highlighting how the data is spread and whether there are any gaps in the data or outliers. We will look at identifying outliers in more detail in our next lesson.
In a dot plot, each individual value is represented by a single dot, displayed above a horizontal line. When data values are identical, the dots are stacked vertically. The graph appears similar to a pictograph or column graph with the number of dots representing the total count.
Here is a dot plot of the number of goals scored in each of Bob’s soccer games.
How many times were five goals scored?
Which number of goals were scored equally and most often?
$1$1
$0$0
$4$4
$3$3
$2$2
$5$5
How many games were played in total?
The goals scored by a football team in their matches are represented in the following dot plot.
Complete the following frequency distribution table.
Goals scored | Frequency |
---|---|
$0$0 | $\editable{}$ |
$1$1 | $\editable{}$ |
$2$2 | $\editable{}$ |
$3$3 | $\editable{}$ |
$4$4 | $\editable{}$ |
$5$5 | $\editable{}$ |
A stem and leaf plot, or stem plot, is used for organising and displaying numerical data. It is appropriate for small to moderately sized data sets. The graph is similar to a column graph on its side. An advantage of a stem and leaf plot over a column graph is the individual scores are retained and further calculations can be made accurately.
In a stem and leaf plot, the right-most digit in each data value is split from the other digits, to become the 'leaf'. The remaining digits become the 'stem'.
The values in a stem and leaf plot should be arranged in ascending order (from lowest to highest) from the centre out. To emphasise this, it is often called an ordered stem and leaf plot.
The data values $10,13,16,21,26,27,28,35,35,36,41,41,45,46,49,50,53,56,58$10,13,16,21,26,27,28,35,35,36,41,41,45,46,49,50,53,56,58 are displayed in the stem and leaf plot below.
Which of the following is true of a stem-and-leaf plot?
Stem | Leaf | |
$0$0 | $7$7 | |
$1$1 | ||
$2$2 | ||
$3$3 | $1$1 $3$3 $3$3 $3$3 | |
$4$4 | $1$1 $2$2 $3$3 $4$4 $9$9 | |
$5$5 | $1$1 $2$2 $4$4 $5$5 $5$5 | |
$6$6 | $0$0 | |
|
The scores are ordered.
A stem-and-leaf plot does not give an idea of outliers and clusters.
It is only appropriate for data where scores have high frequencies.
The individual scores cannot be read on a stem-and-leaf plot.
The stem-and-leaf plot below shows the age of people to enter through the gates of a concert in the first $5$5 seconds.
Stem | Leaf | |
$1$1 | $1$1 $2$2 $4$4 $5$5 $6$6 $6$6 $7$7 $9$9 $9$9 | |
$2$2 | $2$2 $3$3 $5$5 $5$5 $7$7 | |
$3$3 | $1$1 $3$3 $8$8 $9$9 | |
$4$4 | ||
$5$5 | $8$8 | |
|
How many people passed through the gates in the first $5$5 seconds?
What was the age of the youngest person?
The youngest person was $\editable{}$ years old.
What was the age of the oldest person?
The oldest person was $\editable{}$ years old.
What proportion of the concert-goers were under $20$20 years old?