Some frequency tables have an extra column for cumulative frequency. It is a running total of the frequencies. In other words, cumulative frequency is the total of that row's frequency and all the other frequencies from the previous scores in the data set.
Sam recorded the number of pets owned by $10$10 people in his class. Here is the regular frequency table.
Number of Pets | Frequency |
---|---|
$0$0 | $2$2 |
$1$1 | $5$5 |
$2$2 | $2$2 |
$3$3 | $1$1 |
Now let's look at how we would calculate and add in a cumulative frequency column. Remember, we add each frequency to the previous frequency total. The first value in the cumulative frequency table will be the same as the value in the frequency column (since there's no previous value to add it to).
Number of Pets | Frequency | Cumulative Frequency |
---|---|---|
$0$0 | $2$2 | $2$2 |
$1$1 | $5$5 | $2+5=7$2+5=7 |
$2$2 | $2$2 | $7+2=9$7+2=9 |
$3$3 | $1$1 | $9+1=10$9+1=10 |
Notice that the final value in the cumulative frequency column is the same as the total number of people that were surveyed? That's how we know we've got our frequency scores all right.
The number of sightings of the Northern Lights were recorded across various Canadian locations over a period of $1$1 month. The numbers below represent the number of sightings at each location.
$12,8,9,8,11,7,7,11,10,9,9,11,7,10,11,7,8,9,11,9$12,8,9,8,11,7,7,11,10,9,9,11,7,10,11,7,8,9,11,9
Complete the table.
Number of Sightings | Number of Locations ($f$f) | Cumulative Frequency ($cf$cf) |
---|---|---|
$7$7 | $\editable{}$ | $\editable{}$ |
$8$8 | $\editable{}$ | $\editable{}$ |
$9$9 | $\editable{}$ | $\editable{}$ |
$10$10 | $\editable{}$ | $\editable{}$ |
$11$11 | $\editable{}$ | $\editable{}$ |
$12$12 | $\editable{}$ | $\editable{}$ |
In how many locations were there at least $11$11 sightings?
In how many locations were there less than $11$11 sightings?
What was the median number of sightings across all $20$20 locations?
This is a line that:
Consider the following cumulative frequency table:
Class | Frequency | Cumulative frequency |
---|---|---|
$0-10$0−10 | $5$5 | $5$5 |
$11-20$11−20 | $16$16 | $21$21 |
$21-30$21−30 | $10$10 | $31$31 |
$31-40$31−40 | $7$7 | $38$38 |
$41-50$41−50 | $4$4 | $42$42 |
The cumulative frequency graph would be:
We can find the median of the data set using the cumulative frequency graph by:
- Finding the middle point on the cumulative frequency axis (half the total number of scores)
- Drawing a horizontal line to the polygon and then a vertical line down to the horizontal axis
We can find the percentiles and quartiles of the data set using the cumulative frequency graph by:
- Find the corresponding percentage of the total number of scores and find that number on the cumulative frequency axis. e.g. For the $20$20th percentile, find $20%$20% of the total number of scores.
- Drawing a horizontal line to the polygon and then a vertical line down to the horizontal axis
Worked example
Consider the cumulative frequency graph given below:
Use the graph to estimate:
a) The median.
b) The $90$90th percentile.
c) The lower quartile.
d) The upper quartile.
e) The interquartile range.
a)
Think: The median represents the $50%$50% mark of the data, because $50%$50% of the data lies above and below it.
Do: We need to find $50%$50% of the total number of scores, which according to the graph is $50$50.
$50%$50% of $50$50 | $=$= | $50%\times50$50%×50 |
$=$= | $25$25 |
Now to find the median we draw a horizontal line from $25$25 on the vertical axis until it hits the cumulative frequency graph, then draw a line vertically down:
We can see that the dashed line hits the horizontal axis in the column bounded by $20$20 and $25$25. So we take the average of these numbers to find the median. Therefore the median is $22.5.$22.5.
Note: If the column was just labelled with one number, rather than two numbers at the end points, then that one number would be the median.
b)
Think: The $90$90th percentile is the $90%$90% mark of the data, because $90%$90% of the data lies below it.
Do: We need to find $90%$90% of the total number of scores, which according to the graph is $50$50.
$90%$90% of $50$50 | $=$= | $90%\times50$90%×50 |
$=$= | $45$45 |
Now to find the percentile we draw a horizontal line from $45$45 on the vertical axis until it hits the cumulative frequency graph, then draw a line vertically down:
We can see that the dashed line hits the horizontal axis in the column bounded by $30$30 and $35$35. So we take the average of these numbers to find the percentile. Therefore the $90$90th percentile is $32.5.$32.5.
c)
Think: The lower quartile is the $25%$25% mark of the data, because $25%$25% of the data lies below it.
Do: We need to find $25%$25% of the total number of scores, which according to the graph is $50$50.
$25%$25% of $50$50 | $=$= | $25%\times50$25%×50 |
$=$= | $12.5$12.5 |
Now to find the percentile we draw a horizontal line from $12.5$12.5 on the vertical axis until it hits the cumulative frequency graph, then draw a line vertically down:
We can see that the dashed line hits the horizontal axis in the column bounded by $15$15 and $20$20. So we take the average of these numbers to find the lower quartile. Therefore the lower quartile is $17.5.$17.5.
d)
Think: The upper quartile is the $75%$75% mark of the data, because $75%$75% of the data lies below it.
Do: We need to find $75%$75% of the total number of scores, which according to the graph is $50$50.
$75%$75% of $50$50 | $=$= | $75%\times50$75%×50 |
$=$= | $37.5$37.5 |
Now to find the percentile we draw a horizontal line from $37.5$37.5 on the vertical axis until it hits the cumulative frequency graph, then draw a line vertically down:
We can see that the dashed line hits the horizontal axis in the column bounded by $25$25 and $30$30. So we take the average of these numbers to find the upper quartile. Therefore the upper quartile is $27.5.$27.5.
e)
Think: The interquartile range is the difference between the upper quartile and lower quartile.
Do:
Interquartile range | $=$= | $27.5-17.5$27.5−17.5 |
$=$= | $10$10 |
For grouped data cumulative frequency scores are calculated the same way by adding the cumulative frequency column in the frequency distribution table. The difference with grouped data is that when finding the median we can only estimate the value.
The frequency distribution table below shows the heights, in centimetres, of a group of children aged $5$5 to $11$11.
Child's height in cm | class centre | frequency | cumulative frequency |
---|---|---|---|
$91$91-$100$100 | $95$95 | $5$5 | $5$5 |
$101$101-$110$110 | $105$105 | $22$22 | $27$27 |
$111$111-$120$120 | $115$115 | $30$30 | $57$57 |
$121$121-$130$130 | $125$125 | $31$31 | $88$88 |
$131$131-$140$140 | $135$135 | $18$18 | $106$106 |
$141$141-$150$150 | $145$145 | $6$6 | $112$112 |
Use the table to answer the following questions:
Do:
Complete the table and answer the following questions:
Complete the frequency distribution table:
Class | Class centre ($x$x) | Frequency ($f$f) | Cumulative frequency | Center times frequency ($fx$fx) |
---|---|---|---|---|
$1-9$1−9 | $\editable{}$ | $8$8 | $\editable{}$ | $\editable{}$ |
$10-18$10−18 | $\editable{}$ | $16$16 | $\editable{}$ | $\editable{}$ |
$19-27$19−27 | $\editable{}$ | $4$4 | $\editable{}$ | $\editable{}$ |
$28-36$28−36 | $\editable{}$ | $21$21 | $\editable{}$ | $\editable{}$ |
$37-45$37−45 | $\editable{}$ | $16$16 | $\editable{}$ | $\editable{}$ |
Totals | $\editable{}$ | $\editable{}$ |
Using the class centres as 'scores', calculate the mean to 2 decimal places.
What is the median class?
$28-36$28−36
$1-9$1−9
$10-18$10−18
$19-27$19−27
$37-45$37−45
What is the modal class?
$28-36$28−36
$37-45$37−45
$1-9$1−9
$10-18$10−18
$19-27$19−27
We can construct a cumulative frequency graph for grouped data using the class interval on the horizontal axis.
a) The global life expectancy data from 2016 is shown in the frequency distribution table below. Construct a cumulative frequency graph for the data set.
class interval | frequency | cumulative frequency |
---|---|---|
$51-54$51−54 | $5$5 | $5$5 |
$55-60$55−60 | $10$10 | $15$15 |
$61-64$61−64 | $25$25 | $40$40 |
$65-70$65−70 | $26$26 | $66$66 |
$71-74$71−74 | $40$40 | $106$106 |
$75-80$75−80 | $49$49 | $155$155 |
$81-84$81−84 | $28$28 | $183$183 |
Total | $183$183 |
Do: Plot the cumulative frequency for each class interval to get the height of each column. The columns should be ascending each time.
The cumulative frequency graph is displayed below:
b) Estimate the median life expectancy age using the graph.
Think: The median age is in the middle of the data set when the data is in ascending order. The number of scores altogether is 183. The median from the graph is the middle of 183 scores which is 92.5.
Do: Starting at 92.5 along the vertical axis, draw a line from the vertical axis to the cumulative frequency graph and then a perpendicular line down to the horizontal axis. Estimate the value of the median by its position on the horizontal axis.
The median is approximately 73 years.