Some frequency tables have an extra column for cumulative frequency. It is a running total of the frequencies. In other words, cumulative frequency is the total of that row's frequency and all the other frequencies from the previous scores in the data set.
Sam recorded the number of pets owned by $10$10 people in his class. Here is the regular frequency table.
Number of Pets | Frequency |
---|---|
$0$0 | $2$2 |
$1$1 | $5$5 |
$2$2 | $2$2 |
$3$3 | $1$1 |
Now let's look at how we would calculate and add in a cumulative frequency column. Remember, we add each frequency to the previous frequency total. The first value in the cumulative frequency table will be the same as the value in the frequency column (since there's no previous value to add it to).
Number of Pets | Frequency | Cumulative Frequency |
---|---|---|
$0$0 | $2$2 | $2$2 |
$1$1 | $5$5 | $2+5=7$2+5=7 |
$2$2 | $2$2 | $7+2=9$7+2=9 |
$3$3 | $1$1 | $9+1=10$9+1=10 |
Notice that the final value in the cumulative frequency column is the same as the total number of people that were surveyed? That's how we know we've got our frequency scores all right.
The number of sightings of the Northern Lights were recorded across various Canadian locations over a period of $1$1 month. The numbers below represent the number of sightings at each location.
$12,8,9,8,11,7,7,11,10,9,9,11,7,10,11,7,8,9,11,9$12,8,9,8,11,7,7,11,10,9,9,11,7,10,11,7,8,9,11,9
Complete the table.
Number of Sightings | Number of Locations ($f$f) | Cumulative Frequency ($cf$cf) |
---|---|---|
$7$7 | $\editable{}$ | $\editable{}$ |
$8$8 | $\editable{}$ | $\editable{}$ |
$9$9 | $\editable{}$ | $\editable{}$ |
$10$10 | $\editable{}$ | $\editable{}$ |
$11$11 | $\editable{}$ | $\editable{}$ |
$12$12 | $\editable{}$ | $\editable{}$ |
In how many locations were there at least $11$11 sightings?
In how many locations were there less than $11$11 sightings?
What was the median number of sightings across all $20$20 locations?
We've looked at frequency histograms as graphical representations of the distribution of data by plotting the frequencies of each individual score. In cumulative frequency histograms, we plot the cumulative frequency scores. As such, the columns in a cumulative frequency histogram continue to increase, with the last scores having the tallest column.
This is a graph of a cumulative frequency histogram.
This is a line that:
So adding the ogive to the cumulative frequency histogram above we would get the following graph:
The ogive by itself is below:
We can find the median of the data set using the ogive by:
We can find the percentiles and quartiles of the data set using the ogive by:
Consider the cumulative frequency histogram and ogive given below:
Use the graph to estimate:
a) The median.
b) The $90$90th percentile.
c) The lower quartile.
d) The upper quartile.
e) The interquartile range.
a)
Think: The median represents the $50%$50% mark of the data, because $50%$50% of the data lies above and below it.
Do: We need to find $50%$50% of the total number of scores, which according to the graph is $50$50.
$50%$50% of $50$50 | $=$= | $50%\times50$50%×50 |
$=$= | $25$25 |
Now to find the median we draw a horizontal line from $25$25 on the vertical axis until it hits the ogive, then draw a line vertically down:
We can see that the dashed line hits the horizontal axis in the column bounded by $20$20 and $25$25. So we take the average of these numbers to find the median. Therefore the median is $22.5.$22.5.
Note: If the column was just labelled with one number, rather than two numbers at the end points, then that one number would be the median.
b)
Think: The $90$90th percentile is the $90%$90% mark of the data, because $90%$90% of the data lies below it.
Do: We need to find $90%$90% of the total number of scores, which according to the graph is $50$50.
$90%$90% of $50$50 | $=$= | $90%\times50$90%×50 |
$=$= | $45$45 |
Now to find the percentile we draw a horizontal line from $45$45 on the vertical axis until it hits the ogive, then draw a line vertically down:
We can see that the dashed line hits the horizontal axis in the column bounded by $30$30 and $35$35. So we take the average of these numbers to find the percentile. Therefore the $90$90th percentile is $32.5.$32.5.
c)
Think: The lower quartile is the $25%$25% mark of the data, because $25%$25% of the data lies below it.
Do: We need to find $25%$25% of the total number of scores, which according to the graph is $50$50.
$25%$25% of $50$50 | $=$= | $25%\times50$25%×50 |
$=$= | $12.5$12.5 |
Now to find the percentile we draw a horizontal line from $12.5$12.5 on the vertical axis until it hits the ogive, then draw a line vertically down:
We can see that the dashed line hits the horizontal axis in the column bounded by $15$15 and $20$20. So we take the average of these numbers to find the lower quartile. Therefore the lower quartile is $17.5.$17.5.
d)
Think: The upper quartile is the $75%$75% mark of the data, because $75%$75% of the data lies below it.
Do: We need to find $75%$75% of the total number of scores, which according to the graph is $50$50.
$75%$75% of $50$50 | $=$= | $75%\times50$75%×50 |
$=$= | $37.5$37.5 |
Now to find the percentile we draw a horizontal line from $37.5$37.5 on the vertical axis until it hits the ogive, then draw a line vertically down:
We can see that the dashed line hits the horizontal axis in the column bounded by $25$25 and $30$30. So we take the average of these numbers to find the upper quartile. Therefore the upper quartile is $27.5.$27.5.
e)
Think: The interquartile range is the difference between the upper quartile and lower quartile.
Do:
Interquartile range | $=$= | $27.5-17.5$27.5−17.5 |
$=$= | $10$10 |
Consider the ogive given.
Use the ogive to determine the median score. Leave your answer to one decimal place if necessary.
What was the $14$14th lowest score?
How many scores of $92$92 or less were there?
How many scores fewer than $90$90 were there?
For grouped data cumulative frequency scores are calculated the same way by adding the cumulative frequency column in the frequency distribution table. The difference with grouped data is that when finding the median we can only estimate the value.
The frequency distribution table below shows the heights, in centimetres, of a group of children aged $5$5 to $11$11.
Child's height in cm | class centre | frequency | cumulative frequency |
---|---|---|---|
$91$91-$100$100 | $95$95 | $5$5 | $5$5 |
$101$101-$110$110 | $105$105 | $22$22 | $27$27 |
$111$111-$120$120 | $115$115 | $30$30 | $57$57 |
$121$121-$130$130 | $125$125 | $31$31 | $88$88 |
$131$131-$140$140 | $135$135 | $18$18 | $106$106 |
$141$141-$150$150 | $145$145 | $6$6 | $112$112 |
Use the table to answer the following questions:
Do:
Complete the table and answer the following questions:
Complete the frequency distribution table:
Class | Class centre ($x$x) | Frequency ($f$f) | Cumulative frequency | Center times frequency ($fx$fx) |
---|---|---|---|---|
$1-9$1−9 | $\editable{}$ | $8$8 | $\editable{}$ | $\editable{}$ |
$10-18$10−18 | $\editable{}$ | $16$16 | $\editable{}$ | $\editable{}$ |
$19-27$19−27 | $\editable{}$ | $4$4 | $\editable{}$ | $\editable{}$ |
$28-36$28−36 | $\editable{}$ | $21$21 | $\editable{}$ | $\editable{}$ |
$37-45$37−45 | $\editable{}$ | $16$16 | $\editable{}$ | $\editable{}$ |
Totals | $\editable{}$ | $\editable{}$ |
Using the class centres as 'scores', calculate the mean to 2 decimal places.
What is the median class?
$28-36$28−36
$1-9$1−9
$10-18$10−18
$19-27$19−27
$37-45$37−45
What is the modal class?
$28-36$28−36
$37-45$37−45
$1-9$1−9
$10-18$10−18
$19-27$19−27
We can construct a cumulative frequency histogram and polygon for grouped data using the class interval on the horizontal axis.
a) The global life expectancy data from 2016 is shown in the frequency distribution table below. Construct a cumulative frequency histogram and ogive for the data set.
class interval | frequency | cumulative frequency |
---|---|---|
$51-54$51−54 | $5$5 | $5$5 |
$55-60$55−60 | $10$10 | $15$15 |
$61-64$61−64 | $25$25 | $40$40 |
$65-70$65−70 | $26$26 | $66$66 |
$71-74$71−74 | $40$40 | $106$106 |
$75-80$75−80 | $49$49 | $155$155 |
$81-84$81−84 | $28$28 | $183$183 |
Total | $183$183 |
Do: Plot the cumulative frequency for each class interval to get the height of each column. The columns should be ascending each time.
The cumulative frequency histogram and ogive are displayed together below:
b) Estimate the median life expectancy age using the graph.
Think: The median age is in the middle of the data set when the data is in ascending order. The number of scores altogether is 183. The median from the graph is the middle of 183 scores which is 92.5.
Do: Starting at 92.5 along the vertical axis, draw a line from the vertical axis to the ogive and then a perpendicular line down to the horizontal axis. Estimate the value of the median by its position on the horizontal axis.
The median is approximately 73 years.
Complete the frequency distribution table below:
Class | Class Centre | Frequency | Cumulative Frequency |
---|---|---|---|
$1-9$1−9 | $\editable{}$ | $3$3 | $\editable{}$ |
$10-18$10−18 | $\editable{}$ | $4$4 | $\editable{}$ |
$19-27$19−27 | $\editable{}$ | $3$3 | $\editable{}$ |
$28-36$28−36 | $\editable{}$ | $3$3 | $\editable{}$ |
$37-45$37−45 | $\editable{}$ | $8$8 | $\editable{}$ |
Totals | $\editable{}$ |
Construct a cumulative frequency histogram to represent the data.
What is the modal class?
$\editable{}$ $-$− $\editable{}$
What is the median class?
$\editable{}$ $-$− $\editable{}$