topic badge
Standard Level

12.03 Grouped data

Lesson

Grouped frequency tables

When the data are more spread out, sometimes it doesn't make sense to record the frequency for each separate result and instead we group results together to get a grouped frequency table.

 

Grouped frequency table

A grouped frequency table combines multiple results into a single group. We can find the frequency of a group by adding all the frequencies of the results contained in that group.

Exploration

A teacher wants to express the heights (in cm) of her students in a table using the following data points:

 

$189,154,146,162,165,156,192,175,167,174$189,154,146,162,165,156,192,175,167,174

$161,153,184,177,155,192,169,166,148,170$161,153,184,177,155,192,169,166,148,170

$168,151,186,152,195,169,143,164,170,177$168,151,186,152,195,169,143,164,170,177

 

She realises that if each result has its own frequency then the table would have too many rows, so instead she grouped the results into sets of $10$10 cm. As a result, her grouped frequency table looked like this:

Height (cm) Frequency
$140-149$140149  
$150-159$150159  
$160-169$160169  
$170-179$170179  
$180-189$180189  
$190-199$190199  

To fill in the frequency for each group, the teacher counted the number of results that fell into the range of each group.

For example, the group $150-159$150159 would include the results:

$154,156,153,155,151,152$154,156,153,155,151,152

Since there are $6$6 results that fall into the range of this group, this group has a frequency of $6$6.

Using this method, the teacher filled in the grouped frequency table to get:

Height (cm) Frequency
$140-149$140149 $3$3
$150-159$150159 $6$6
$160-169$160169 $9$9
$170-179$170179 $6$6
$180-189$180189 $3$3
$190-199$190199 $3$3

Looking at the table, she can see that the modal class is the group $160-169$160169, since it has the highest frequency.

By adding the frequencies in the bottom two rows she could also see that $6$6 students were at least $180$180 cm tall. There are $30$30 students in the class in total, so she now knows that $\frac{6}{30}$630 of her students, or one fifth of the class, are taller than $180$180 cm.

 

Modal class

The modal class in a grouped frequency table is the group that has the highest frequency.

If there are multiple groups that share the highest frequency then there will be more than one modal class.

 

As we can see, grouped frequency tables are useful when the data are more spread out. While the teacher could have obtained the same information from a normal frequency table, the grouping of the results condensed the data into an easier to interpret form.

However, the drawback of a grouped frequency table is that the data becomes less precise, since we have grouped multiple data points together rather than looking at them individually.

 

Practice questions

Question 1

Fill in the frequency table using the data set below.

$77,54,53,56,73,55,94,95,76,52,72,46,85,61,48,90,64,70,40,52,57,88,59,95,61$77,54,53,56,73,55,94,95,76,52,72,46,85,61,48,90,64,70,40,52,57,88,59,95,61

  1. Class Frequency Cumulative frequency
    $40$40$-$$49$49 $\editable{}$ $\editable{}$
    $50$50$-$$59$59 $\editable{}$ $\editable{}$
    $60$60$-$$69$69 $\editable{}$ $\editable{}$
    $70$70$-$$79$79 $\editable{}$ $\editable{}$
    $80$80$-$$89$89 $\editable{}$ $\editable{}$
    $90$90$-$$99$99 $\editable{}$ $\editable{}$
Question 2

We want to find the median for this data set.

Score Frequency Cumulative frequency
$2$2 $3$3 $3$3
$3$3 $5$5 $8$8
$4$4 $3$3 $11$11
$5$5 $4$4 $15$15
$6$6 $8$8 $23$23
$7$7 $2$2 $25$25
  1. How many scores are there in total?

  2. Find the median score.

Question 3

We want to find the mean of the data set.

Score ($x$x) Frequency ($f$f) $xf$xf
$2$2 $7$7 $14$14
$3$3 $2$2 $6$6
$4$4 $8$8 $32$32
$5$5 $5$5 $25$25
$6$6 $4$4 $24$24
$7$7 $7$7 $49$49
  1. How many scores are there in the data set?

  2. What is the total sum of all the scores in the data set?

  3. Find the mean for this data set.

    Round your answer to one decimal place.

 

Histograms and grouped data

Continuous numerical data, such as times, heights, weights or temperatures, are based on measurements, so any data value is possible within a large range of values. For displaying this type of data, a special chart called a histogram is used.

As an example, the following frequency distribution table and histogram represent the times taken for $72$72 runners to complete a ten kilometre race.

Class interval Frequency
$45\le\text{time }<50$45time <50 $9$9
$50\le\text{time }<55$50time <55 $7$7
$55\le\text{time }<60$55time <60 $20$20
$60\le\text{time }<65$60time <65 $30$30
$65\le\text{time }<70$65time <70 $6$6

The histogram represents the distribution of the data. It allows us to see clearly where all of the recorded times fall along a continuous scale.  

 

Class intervals

What may surprise us at first is that the histogram above has only five columns, even though it represents $72$72 different data values. 

To produce the histogram, the data is first grouped into class intervals (also known as classes or bins), using the frequency distribution table.

In the table above,

  • The first class interval includes the running times for $9$9 different runners. Each of their times fall within a range that is greater than or equal to $45$45 minutes, but less than $50$50 minutes. This class interval is represented by the first column in the histogram.
     
  • The second class interval includes the running times for $7$7 different runners, each with times falling with a range greater than or equal to $50$50 minutes, but less than $55$55 minutes. This class interval is represented by the second column in the histogram, and so on.

 

Important!

Every data value must go into exactly one and only one class interval.

Class intervals should be equal width.

 

There are several different ways that class intervals are defined. Here are some examples with two adjacent class intervals:

Class interval formats Description
$45<\text{time }\le50$45<time 50 $50<\text{time }\le55$50<time 55 Upper endpoint included,
lower endpoint excluded.
$45\le\text{time }<50$45time <50 $50\le\text{time }<55$50time <55 Lower endpoint included,
upper endpoint excluded.
$45$45 to $<50$<50 $50$50 to $<55$<55 Lower endpoint included,
upper endpoint excluded.
$45$45 - $49$49 $50$50 - $54$54 Suitable for data rounded to the nearest whole number,
or discrete data.
$45$45 → $50$50 $50$50 → $55$55 Not clear which endpoints are included or excluded.
Assume upper endpoint is included.

Regardless of the format used, each class interval for a given set of data should be consistent across all class intervals.

Note: In this course, class intervals for any particular set of data will be the same width. There are situations in data representation when class intervals are different widths, but this is beyond the scope of this course.

 

 

Class centre

The class centre is the average of the endpoints of each interval.

For example, if the class interval is $45\le\text{time }<50$45time <50, or $45$45 - $50$50, the class centre is calculated as follows:

class centre $=$= $\frac{45+50}{2}$45+502
  $=$= $47.5$47.5

 

Because the class centre is an average of the endpoints, it is often used as a single value to represent the class interval. In some histograms, it may be used for the scale on the horizontal axis, with the class centre displayed directly below the middle of each vertical column.

 

Practice question

Question 4

Find the class centre for the class interval $19\le t<23$19t<23 where $t$t represents time.

 

Frequency polygon

Using the example of running times, we can add a 'class centre' column to the frequency distribution table.

Class interval Class centre Frequency
$45\le\text{time }<50$45time <50 $47.5$47.5 $9$9
$50\le\text{time }<55$50time <55 $52.5$52.5 $7$7
$55\le\text{time }<60$55time <60 $57.5$57.5 $20$20
$60\le\text{time }<65$60time <65 $62.5$62.5 $30$30
$65\le\text{time }<70$65time <70 $67.5$67.5 $6$6

The class centre is used to create an alternative to the histogram, called a frequency polygon. 

A frequency polygon is a line graph that displays the frequency distribution of a set of data, and for that reason, is similar to a histogram. If we draw a frequency polygon and histogram together, the polygon begins and ends on the horizontal axis and connect the midpoints at the top of each column, as shown below. 

  • Notice that the class centres have been used as the scale on the horizontal axis. Each point on the frequency polygon is a coordinate pair made up of the class centre and the frequency: $\left(\text{class centre },\text{frequency }\right)$(class centre ,frequency ).
  • If we look at the way the frequency polygon 'cuts off' triangles from the columns of the histogram, we can see that the area under the frequency polygon is equal to the area of the columns.
  • A frequency polygon can be drawn together with a frequency histogram or it can be displayed on its own.

 

A histogram is not a bar chart

Although a histogram looks similar to a bar chart, there are a number of important differences between them:

  • Histograms show the distribution of data values, whereas a bar chart is used to compare data values.
  • Histograms are used for numerical data, whereas bar charts are used for categorical data.
  • A histogram has a numerical scale on both axes, while a bar chart only has a numerical scale on the vertical axis. 
  • The columns in a bar chart could be re-ordered, without affecting the representation of the data. In a histogram each column corresponds with a range of values on a continuous scale, so the columns cannot be re-ordered.
 
Histogram   Bar chart

 

Key features of a histogram

  • The horizontal axis is a continuous numerical scale (like a number line). It represents numerical data, such as time, height, mass or temperature, and may be divided into class intervals.
  • The vertical axis is the frequency of each data value or class interval.
  • There are no gaps between the columns, because the horizontal axis is a continuous scale. It is possible for a class interval to have a frequency of zero, but this is not the same as having gaps between each column.
  • The area of each column, rather than the height, is proportional to the frequency. This is because histograms can have columns of different widths. If all the columns are equal width, then the height of each column will be proportional to the frequency.
  • It is good practice, when creating a histogram, to leave a half-column-width gap between the vertical axis and the first column.

 

To better understand histograms, we will look at an example of how a histogram is created from a set of raw data.

 

Worked example

In 2016, the World Health Organisation (WHO) collected data on the average life expectancy at birth for $183$183 countries around the world.

To appreciate what the raw data looks like, here is a reduced version of the data set. 

$62.7$62.7 $76.4$76.4 $76.4$76.4 $62.6$62.6 $75.0$75.0 $76.9$76.9 $74.8$74.8
$82.9$82.9 $81.9$81.9 $73.1$73.1 $75.7$75.7 $79.1$79.1 $72.7$72.7 $75.6$75.6
  $\vdots$   $\vdots$   $\vdots$  
$62.5$62.5 $72.5$72.5 $77.2$77.2 $81.4$81.4 $63.9$63.9 $78.5$78.5 $77.1$77.1
$72.3$72.3 $72.0$72.0 $74.1$74.1 $76.3$76.3 $65.3$65.3 $62.3$62.3 $61.4$61.4

Each value represents the average life expectancy at birth (in years) for a single country. 

Before organising the data into a frequency distribution table, we need to decide on the number of class intervals. Although there is no fixed rule, using between $5$5 and $10$10 class intervals usually produces good results for most data sets. 

We know that the lowest life expectancy in the data is $52.9$52.9 years (Lesotho in southern Africa), while the highest is $84.2$84.2 (Japan). These values indicate that the scale on the horizontal axis of our histogram should be from at least $50$50 to $85$85 years. It seems appropriate to have class intervals of width $5$5 years, which means we will have $7$7 class intervals in total. We'll use the variable $t$t to represent average life expectancy in the table below:

class interval frequency
$50\le t<55$50t<55 $5$5
$55\le t<60$55t<60 $10$10
$60\le t<65$60t<65 $25$25
$65\le t<70$65t<70 $26$26
$70\le t<75$70t<75 $40$40
$75\le t<80$75t<80 $49$49
$80\le t<85$80t<85 $28$28

This compact table now represents all $183$183 life expectancy values.

With our frequency distribution table complete, we are ready to create a histogram:

Use the histogram to answer the following questions:

  1. How many countries have an average life expectancy greater than or equal to $70$70 but less than $75$75 years?
  2. How many countries have an average life expectancy lower than $60$60 years? 

Solution

  1. The $70\le t<75$70t<75 year class interval contains $40$40 countries.
  2. Both the $50\le t<55$50t<55 and $55\le t<60$55t<60 class intervals need to be considered. 
    There are $5$5 countries in the $50\le t<55$50t<55 year interval.
    There are $10$10 countries in the $55\le t<60$55t<60 year interval.
    So there are $5+10=15$5+10=15 countries with an average life expectancy lower than $60$60 years.

 

Life expectancy at birth

Life expectancy at birth is a measure of how long, on average, a newborn can expect to live. It is one of the most common statistics for measuring the health status of a country. The higher the life expectancy at birth, the more likely it is the country will have a high standard of living and access to quality health services and education.

As a comparison, the average life expectancy at birth for all countries in the world is $71.8$71.8 years and for Australia it is $82.9$82.9 years ($6$6th highest in the world).

 

Practice questions

Question 5

In product testing, the number of faults detected in producing a certain machinery is recorded each day for several days. The frequency table shows the results.

Number of faults Frequency
$0-3$03 $10$10
$4-7$47 $14$14
$8-11$811 $20$20
$12-15$1215 $16$16
  1. Construct a histogram to represent the data.

    Faulty MachineryNumber of FaultsFrequency10201.55.59.513.5

  2. What is the lowest possible number of faults that could have been recorded on any particular day?

    $\editable{}$ faults

Question 6

As part of a fuel watch initiative, the price of petrol, $p$p, at a service station was recorded each day for $21$21 days. The frequency table shows the findings.

Price (in cents per litre) Class Centre Frequency
$120.9120.9<p125.9 $123.4$123.4 $4$4
$125.9125.9<p130.9 $128.4$128.4 $6$6
$130.9130.9<p135.9 $133.4$133.4 $5$5
$135.9135.9<p140.9 $138.4$138.4 $6$6
  1. What was the highest price that could have been recorded?

  2. How many days was the price above $130.9$130.9 cents?

What is Mathspace

About Mathspace