7.08 Histograms

Lesson

Numerical data, such as times, heights, weights or temperatures, are based on measurements making any data value possible within a large range of values. For displaying frequency information for this type of data, a special chart called a histogram is used.

Worked examples

Very simple examples of histograms may involve the consideration of data that does not need to be grouped into intervals (or bins) along the horizontal axis.

question 1

A government agency records how long people wait on hold to speak to their representatives. The results are displayed in the histogram below:

1. Complete the corresponding frequency table:

Length of hold (minutes) Frequency
$1$1 $\editable{}$
$2$2 $\editable{}$
$3$3 $\editable{}$
$4$4 $\editable{}$
$5$5 $\editable{}$
2. How many phone calls were made?

3. How long in total did these people wait on the hold?

4. What was the mean wait time? Give your answer as a decimal.

In the next example, the data needs to be grouped into class intervals (or bins) in order to construct the frequency distribution table and the histogram to represent the times taken for $72$72 runners to complete a ten kilometer race.

question 2
Class interval Frequency
$45\le\text{time }<50$45time <50 $9$9
$50\le\text{time }<55$50time <55 $7$7
$55\le\text{time }<60$55time <60 $20$20
$60\le\text{time }<65$60time <65 $30$30
$65\le\text{time }<70$65time <70 $6$6

The histogram represents the distribution of the data. It allows us to see clearly where all of the recorded times fall along a numerical scale.

Class intervals or bins

What may surprise us at first is that the histogram above has only five columns, even though it represents $72$72 different data values.

To produce the histogram, the data is first grouped into class intervals (which are also called bins), using the frequency distribution table.

In the table above,

• The first class interval includes the running times for $9$9 different runners. Each of their times fall within a range that is greater than or equal to $45$45 minutes, but less than $50$50 minutes. This class interval is represented by the first column in the histogram.

• The second class interval includes the running times for $7$7 different runners, each with times falling with a range greater than or equal to $50$50 minutes, but less than $55$55 minutes. This class interval is represented by the second column in the histogram, and so on.

Important!

Every data value must go into exactly one and only one class interval (or bin)

There are some general guidelines to use when attempting to create class intervals (or bins)

• Bins should be all the same size.
• Bins should include all of the data.
• Boundaries for bins should reflect the data values being represented.
• Determine the number of bins based upon the data.
• If possible, selecting to create a number of bins that is a factor of the number of data values (ie. a histogram representing 20 data values might have 4 or 5 bins) will simplify the process

There are several different ways that you may see class intervals (or bins) defined. Here are some examples of how you might represent two adjacent class intervals (or bins):

Class interval formats Description
$4545<x50$5050<x55 Upper endpoint included, lower endpoint excluded.
$45\le x<50$45x<50 $50\le x<55$50x<55 Lower endpoint included, upper endpoint excluded.
$45$45 to $<50$<50 $50$50 to $<55$<55 Lower endpoint included, upper endpoint excluded.
$45$45 - $49$49 $50$50 - $54$54 Only suitable when data is restricted to certain (usually whole number) values.
In this case, both endpoints are included.

Remember!

The key features of a histogram are:

• The horizontal axis is a numerical scale (like a number line)
• The data on the horizontal axis may be grouped into class intervals (which are also called bins)
• There are no gaps between the columns of a histogram
• The height of each column will be the frequency
Careful!

Histograms are not the same as bar graphs!  The two major differences between them are:

1. In a bar graph, the bars do not touch.
2. Bar graphs are normally used to represent categorical data (ie. eye color, hair color, gender, etc.) along the horizontal axis, rather than numerical data.

To better understand histograms, we will look at an example of how a histogram is created from a set of raw data.

question 3

In 2016, the World Health Organization (WHO) collected data on the average life expectancy at birth for $183$183 countries around the world.  Draw a histogram to represent this data.

To appreciate what the raw data looks like, here is a reduced version of the data set.

 $62.7$62.7 $76.4$76.4 $76.4$76.4 $62.6$62.6 $75.0$75.0 $76.9$76.9 $74.8$74.8 $82.9$82.9 $81.9$81.9 $73.1$73.1 $75.7$75.7 $79.1$79.1 $72.7$72.7 $75.6$75.6 $\vdots$⋮ $\vdots$⋮ $\vdots$⋮ $62.5$62.5 $72.5$72.5 $77.2$77.2 $81.4$81.4 $63.9$63.9 $78.5$78.5 $77.1$77.1 $72.3$72.3 $72.0$72.0 $74.1$74.1 $76.3$76.3 $65.3$65.3 $62.3$62.3 $61.4$61.4

Each value represents the average life expectancy at birth (in years) for a single country.

Think:  Before organizing the data into a frequency distribution table, we need to decide on the number of class intervals. Although there is no fixed rule, using between $5$5 and $10$10 class intervals usually produces a graph that gives a clear impression of the shape of the data for most data sets.

We know that the least life expectancy in the data is $52.9$52.9 years (Lesotho in southern Africa), while the greatest is $84.2$84.2 (Japan). These values indicate that the scale on the horizontal axis of our histogram should be from at least $50$50 to $85$85 years. It seems appropriate to have class intervals of width $5$5 years, which means we will have $7$7 class intervals in total. We'll use the variable $t$t to represent average life expectancy in the table below:

Do:  Construct the frequency table

Class interval Frequency
$50\le t<55$50t<55 $5$5
$55\le t<60$55t<60 $10$10
$60\le t<65$60t<65 $25$25
$65\le t<70$65t<70 $26$26
$70\le t<75$70t<75 $40$40
$75\le t<80$75t<80 $49$49
$80\le t<85$80t<85 $28$28

This compact table now represents all $183$183 life expectancy values.

With our frequency distribution table complete, we are ready to create a histogram:

Use the histogram to answer the questions below.

a) How many countries have an average life expectancy lower than $60$60 years?

Think: Both the $50\le t<55$50t<55 and $55\le t<60$55t<60 class intervals need to be considered.
There are $5$5 countries in the $50\le t<55$50t<55 year interval and there are $10$10 countries in the $55\le t<60$55t<60 year interval.

Do: So there are $5+10=15$5+10=15 countries with an average life expectancy lower than $60$60 years.

b) What proportion of countries have an average life expectancy equal to or greater than $70$70 years but below $75$75 years?

Think: How many countries are in the class interval $70\le t<75$70t<75? What is this as a percentage of the total $183$183 countries?

Do: There are $40$40 countries in the $70\le t<75$70t<75 class interval.

 Percentage with life expectancy between $70$70 and $75$75 $=$= $\frac{40}{183}\times100%$40183​×100% $\approx$≈ $21.6%$21.6%

Life expectancy at birth

Life expectancy at birth is a measure of how long, on average, a newborn can expect to live. It is one of the most common statistics for measuring the health status of a country. The higher the life expectancy at birth, the more likely it is the country will have a high standard of living and access to quality health services and education.

As a comparison, the average life expectancy at birth for all countries in the world is $71.8$71.8 years and for the United States, it is $78.69$78.69 years ($38$38th greatest in the world).

Practice questions

Question 4

The amount of snowfall (in centimeters) is recorded at the base of the mountain each day.

1. To create a frequency histogram of the data, which values go on the horizontal axis?

Number of days it snowed each amount

A

Amount of snowfall

B

Number of days it snowed each amount

A

Amount of snowfall

B
2. The snowfall recorded each day, to the nearest centimeter, is as follows:

$6,2,0,3,2,2,3,4,2,0,3,2,3,4,6,4,3,0,5,3$6,2,0,3,2,2,3,4,2,0,3,2,3,4,6,4,3,0,5,3

Construct a frequency histogram of the data.

3. On how many days did $3$3 centimeters of snow fall?

4. On how many days did at least $4$4 centimeters of snow fall?

Question 5

The following frequency table shows the data distribution for the length of leaves collected from a species of tree in the botanical gardens.

Leaf length, $x$x (mm)

Frequency
$0\le x<20$0x<20 $5$5
$20\le x<40$20x<40 $11$11
$40\le x<60$40x<60 $19$19
$60\le x<80$60x<80 $49$49
$80\le x<84$80x<84 $43$43
1. Which of the following statements are correct? Select all that apply.

$35$35 leaves less than $60$60 mm were collected.

A

$11$11 leaves of less than $40$40 mm were collected.

B

The most common leaf length is between $60$60 and $80$80 mm.

C

Leaves are more likely to be at least $60$60 mm in length.

D

There were no leaves collected with a length of $10$10 mm.

E

$35$35 leaves less than $60$60 mm were collected.

A

$11$11 leaves of less than $40$40 mm were collected.

B

The most common leaf length is between $60$60 and $80$80 mm.

C

Leaves are more likely to be at least $60$60 mm in length.

D

There were no leaves collected with a length of $10$10 mm.

E

Outcomes

MGSE6.SP.4

Display numerical data in plots on a number line, including dot plots (line plots), histograms, and box plots.

MGSE6.SP.5

Summarize numerical data sets in relation to their context, such as by:

MGSE6.SP.5a

Reporting the number of observations.

MGSE6.SP.5b

Describing the nature of the attribute under investigation, including how it was measured and its units of measurement.

MGSE6.SP.5c

Giving quantitative measures of center (median and/or mean) and variability (interquartile range).