 8.03 Frequency tables and histograms

Lesson

Continuous numerical data, such as times, heights, weights or temperatures, are based on measurements, so any data value is possible within a large range of values. For displaying frequency information for this type of data, a special chart called a histogram is used.

As an example, the following frequency distribution table and histogram represent the times taken for $72$72 runners to complete a ten kilometre race.

Class interval Frequency
$45\le\text{time }<50$45time <50 $9$9
$50\le\text{time }<55$50time <55 $7$7
$55\le\text{time }<60$55time <60 $20$20
$60\le\text{time }<65$60time <65 $30$30
$65\le\text{time }<70$65time <70 $6$6 The histogram represents the distribution of the data. It allows us to see clearly where all of the recorded times fall along a continuous scale.

Class intervals

What may surprise us at first is that the histogram above has only five columns, even though it represents $72$72 different data values.

To produce the histogram, the data is first grouped into class intervals or bins, using the frequency distribution table.

In the table above,

• The first class interval includes the running times for $9$9 different runners. Each of their times fall within a range that is greater than or equal to $45$45 minutes, but less than $50$50 minutes. This class interval is represented by the first column in the histogram.

• The second class interval includes the running times for $7$7 different runners, each with times falling with a range greater than or equal to $50$50 minutes, but less than $55$55 minutes. This class interval is represented by the second column in the histogram, and so on.

Important!

Every data value must go into exactly one and only one class interval.

There are several different ways that class intervals can be defined. Here are some examples with two adjacent class intervals:

Class interval formats Description
$4545<x50$5050<x55 Upper endpoint included, lower endpoint excluded.
$45\le x<50$45x<50 $50\le x<55$50x<55 Lower endpoint included, upper endpoint excluded.
$45$45 to $<50$<50 $50$50 to $<55$<55 Lower endpoint included, upper endpoint excluded.
$45-49$4549 $50-54$5054 Suitable for discrete data, both end points included.

In the last lesson we briefly looked at the differences between column graphs and histograms. Recall the key features of a histogram are:

• The horizontal axis is a continuous numerical scale (like a number line). It represents numerical data, such as time, height, mass or temperature.
• There are no gaps between the columns, because the horizontal axis is a continuous scale. It is possible for a class interval to have a frequency of zero, but this is not the same as having gaps between each column.
• The area of each column, rather than the height, is proportional to the frequency. This is because histograms can have columns of different widths. When all the columns are equal width, then the height of each column will be proportional to the frequency. (Note: we will only look at cases with equal width in this course.)
• It is good practice, when creating a histogram, to leave a half-column-width gap between the vertical axis and the first column.

To better understand histograms, we will look at an example of how a histogram is created from a set of raw data.

Worked example

Example 1

In 2016, the World Health Organisation (WHO) collected data on the average life expectancy at birth for $183$183 countries around the world.

To appreciate what the raw data looks like, here is a reduced version of the data set.

 $62.7$62.7 $76.4$76.4 $76.4$76.4 $62.6$62.6 $75.0$75.0 $76.9$76.9 $74.8$74.8 $82.9$82.9 $81.9$81.9 $73.1$73.1 $75.7$75.7 $79.1$79.1 $72.7$72.7 $75.6$75.6 $\vdots$⋮ $\vdots$⋮ $\vdots$⋮ $62.5$62.5 $72.5$72.5 $77.2$77.2 $81.4$81.4 $63.9$63.9 $78.5$78.5 $77.1$77.1 $72.3$72.3 $72.0$72.0 $74.1$74.1 $76.3$76.3 $65.3$65.3 $62.3$62.3 $61.4$61.4

Each value represents the average life expectancy at birth (in years) for a single country.

Before organising the data into a frequency distribution table, we need to decide on the number of class intervals. Although there is no fixed rule, using between $5$5 and $10$10 class intervals usually produces a graph that gives a clear impression of the shape of the data for most data sets.

We know that the lowest life expectancy in the data is $52.9$52.9 years (Lesotho in southern Africa), while the highest is $84.2$84.2 (Japan). These values indicate that the scale on the horizontal axis of our histogram should be from at least $50$50 to $85$85 years. It seems appropriate to have class intervals of width $5$5 years, which means we will have $7$7 class intervals in total. We'll use the variable $t$t to represent average life expectancy in the table below:

Class interval Frequency
$50\le t<55$50t<55 $5$5
$55\le t<60$55t<60 $10$10
$60\le t<65$60t<65 $25$25
$65\le t<70$65t<70 $26$26
$70\le t<75$70t<75 $40$40
$75\le t<80$75t<80 $49$49
$80\le t<85$80t<85 $28$28

This compact table now represents all $183$183 life expectancy values.

With our frequency distribution table complete, we are ready to create a histogram: Use the histogram to answer the following questions:

(a) How many countries have an average life expectancy lower than $60$60 years?

Think: Both the $50\le t<55$50t<55 and $55\le t<60$55t<60 class intervals need to be considered.
There are $5$5 countries in the $50\le t<55$50t<55 year interval and there are $10$10 countries in the $55\le t<60$55t<60 year interval.

Do: So there are $5+10=15$5+10=15 countries with an average life expectancy lower than $60$60 years.

(b) What proportion of countries have an average life expectancy equal to or greater than $70$70 years but below $75$75 years?

Think: How many countries are in the class interval $70\le t<75$70t<75? What is this as a percentage of the total $183$183 countries?

Do: There are $40$40 countries in the $70\le t<75$70t<75 class interval.

 Percentage with life expectancy between $70$70 and $75$75 $=$= $\frac{40}{183}\times100%$40183​×100% $\approx$≈ $21.6%$21.6%

Did you know?

Life expectancy at birth is a measure of how long, on average, a newborn can expect to live. It is one of the most common statistics for measuring the health status of a country. The higher the life expectancy at birth, the more likely it is the country will have a high standard of living and access to quality health services and education.

As a comparison, the average life expectancy at birth for all countries in the world is $71.8$71.8 years and for Australia, it is $82.9$82.9 years ($6$6th highest in the world).

Practice questions

Question 1

The following frequency table shows the data distribution for the length of leaves collected from a species of tree in the botanical gardens.

Leaf length, $x$x (mm)

Frequency
$0\le x<20$0x<20 $5$5
$20\le x<40$20x<40 $11$11
$40\le x<60$40x<60 $19$19
$60\le x<80$60x<80 $49$49
$80\le x<84$80x<84 $43$43
1. Which of the following statements are correct? Select all that apply.

$35$35 leaves less than $60$60 mm were collected.

A

$11$11 leaves of less than $40$40 mm were collected.

B

The most common leaf length is between $60$60 and $80$80 mm.

C

Leaves are more likely to be at least $60$60 mm in length.

D

There were no leaves collected with a length of $10$10 mm.

E

$35$35 leaves less than $60$60 mm were collected.

A

$11$11 leaves of less than $40$40 mm were collected.

B

The most common leaf length is between $60$60 and $80$80 mm.

C

Leaves are more likely to be at least $60$60 mm in length.

D

There were no leaves collected with a length of $10$10 mm.

E

question 2

A group of year $9$9 students were surveyed on their height, and the results are shown in the table below:

Height (cm) Frequency
$135\le h<145$135h<145 $4$4
$145\le h<155$145h<155 $12$12
$155\le h<165$155h<165 $13$13
$165\le h<175$165h<175 $8$8
$175\le h<185$175h<185 $2$2
$185\le h<195$185h<195 $1$1
1. How many students were surveyed?

2. Construct a histogram to represent this data.

3. What percentage of the students were $165$165 cm or taller?

question 3

A P.E. teacher asks her students how many hours they spent playing sports in the past month. The responses are in the table below.

Time (hours)
 $11$11 $14$14 $4$4 $6$6 $12$12 $9$9 $11$11 $9$9 $16$16 $3$3 $13$13 $23$23 $13$13 $10$10 $10$10 $11$11 $20$20 $11$11 $15$15 $14$14 $13$13 $18$18 $8$8 $16$16 $11$11 $12$12 $12$12 $16$16 $9$9 $12$12
1. Complete the following frequency table for this set of data:

Time (hours) Frequency
$0\le t<4$0t<4 $\editable{}$
$4\le t<8$4t<8 $\editable{}$
$8\le t<12$8t<12 $\editable{}$
$12\le t<16$12t<16 $\editable{}$
$16\le t<20$16t<20 $\editable{}$
$20\le t<24$20t<24 $\editable{}$
2. Construct a histogram to represent this data.

3. How many students are in this P.E. class?

4. What percentage of students played sports for less than $8$8 hours?

Outcomes

ACMEM046

display numerical data as frequency distributions, dot plots, stem and leaf plots, and histograms