Numerical data, such as times, heights, weights or temperatures, are based on measurements making any data value possible within a large range of values. For displaying frequency information for this type of data, a special chart called a histogram is used.
Very simple examples of histograms may involve the consideration of data that does not need to be grouped into intervals (or bins) along the horizontal axis.
A government agency records how long people wait on hold to speak to their representatives. The results are displayed in the histogram below:
Complete the corresponding frequency table:
Length of hold (minutes) | Frequency |
---|---|
$1$1 | $\editable{}$ |
$2$2 | $\editable{}$ |
$3$3 | $\editable{}$ |
$4$4 | $\editable{}$ |
$5$5 | $\editable{}$ |
How many phone calls were made?
How long in total did these people wait on the hold?
What was the mean wait time? Give your answer as a decimal.
In the next example, the data needs to be grouped into class intervals (or bins) in order to construct the frequency distribution table and the histogram to represent the times taken for $72$72 runners to complete a ten kilometer race.
|
The histogram represents the distribution of the data. It allows us to see clearly where all of the recorded times fall along a numerical scale.
What may surprise us at first is that the histogram above has only five columns, even though it represents $72$72 different data values.
To produce the histogram, the data is first grouped into class intervals (which are also called bins), using the frequency distribution table.
In the table above,
Every data value must go into exactly one and only one class interval (or bin)
There are some general guidelines to use when attempting to create class intervals (or bins)
There are several different ways that you may see class intervals (or bins) defined. Here are some examples of how you might represent two adjacent class intervals (or bins):
Class interval formats | Description | |
---|---|---|
$45 |
$50 |
Upper endpoint included, lower endpoint excluded. |
$45\le x<50$45≤x<50 | $50\le x<55$50≤x<55 | Lower endpoint included, upper endpoint excluded. |
$45$45 to $<50$<50 | $50$50 to $<55$<55 | Lower endpoint included, upper endpoint excluded. |
$45$45 - $49$49 | $50$50 - $54$54 | Only suitable when data is restricted to certain (usually whole number) values. In this case, both endpoints are included. |
The key features of a histogram are:
Histograms are not the same as bar graphs! The two major differences between them are:
To better understand histograms, we will look at an example of how a histogram is created from a set of raw data.
In 2016, the World Health Organization (WHO) collected data on the average life expectancy at birth for $183$183 countries around the world. Draw a histogram to represent this data.
To appreciate what the raw data looks like, here is a reduced version of the data set.
$62.7$62.7 | $76.4$76.4 | $76.4$76.4 | $62.6$62.6 | $75.0$75.0 | $76.9$76.9 | $74.8$74.8 |
$82.9$82.9 | $81.9$81.9 | $73.1$73.1 | $75.7$75.7 | $79.1$79.1 | $72.7$72.7 | $75.6$75.6 |
$\vdots$⋮ | $\vdots$⋮ | $\vdots$⋮ | ||||
$62.5$62.5 | $72.5$72.5 | $77.2$77.2 | $81.4$81.4 | $63.9$63.9 | $78.5$78.5 | $77.1$77.1 |
$72.3$72.3 | $72.0$72.0 | $74.1$74.1 | $76.3$76.3 | $65.3$65.3 | $62.3$62.3 | $61.4$61.4 |
Each value represents the average life expectancy at birth (in years) for a single country.
Think: Before organizing the data into a frequency distribution table, we need to decide on the number of class intervals. Although there is no fixed rule, using between $5$5 and $10$10 class intervals usually produces a graph that gives a clear impression of the shape of the data for most data sets.
We know that the least life expectancy in the data is $52.9$52.9 years (Lesotho in southern Africa), while the greatest is $84.2$84.2 (Japan). These values indicate that the scale on the horizontal axis of our histogram should be from at least $50$50 to $85$85 years. It seems appropriate to have class intervals of width $5$5 years, which means we will have $7$7 class intervals in total. We'll use the variable $t$t to represent average life expectancy in the table below:
Do: Construct the frequency table
Class interval | Frequency |
---|---|
$50\le t<55$50≤t<55 | $5$5 |
$55\le t<60$55≤t<60 | $10$10 |
$60\le t<65$60≤t<65 | $25$25 |
$65\le t<70$65≤t<70 | $26$26 |
$70\le t<75$70≤t<75 | $40$40 |
$75\le t<80$75≤t<80 | $49$49 |
$80\le t<85$80≤t<85 | $28$28 |
This compact table now represents all $183$183 life expectancy values.
With our frequency distribution table complete, we are ready to create a histogram:
Use the histogram to answer the questions below.
a) How many countries have an average life expectancy lower than $60$60 years?
Think: Both the $50\le t<55$50≤t<55 and $55\le t<60$55≤t<60 class intervals need to be considered.
There are $5$5 countries in the $50\le t<55$50≤t<55 year interval and there are $10$10 countries in the $55\le t<60$55≤t<60 year interval.
Do: So there are $5+10=15$5+10=15 countries with an average life expectancy lower than $60$60 years.
b) What proportion of countries have an average life expectancy equal to or greater than $70$70 years but below $75$75 years?
Think: How many countries are in the class interval $70\le t<75$70≤t<75? What is this as a percentage of the total $183$183 countries?
Do: There are $40$40 countries in the $70\le t<75$70≤t<75 class interval.
Percentage with life expectancy between $70$70 and $75$75 | $=$= | $\frac{40}{183}\times100%$40183×100% |
$\approx$≈ | $21.6%$21.6% |
Life expectancy at birth is a measure of how long, on average, a newborn can expect to live. It is one of the most common statistics for measuring the health status of a country. The higher the life expectancy at birth, the more likely it is the country will have a high standard of living and access to quality health services and education.
As a comparison, the average life expectancy at birth for all countries in the world is $71.8$71.8 years and for the United States, it is $78.69$78.69 years ($38$38th greatest in the world).
The amount of snowfall (in centimeters) is recorded at the base of the mountain each day.
To create a frequency histogram of the data, which values go on the horizontal axis?
Number of days it snowed each amount
Amount of snowfall
The snowfall recorded each day, to the nearest centimeter, is as follows:
$6,2,0,3,2,2,3,4,2,0,3,2,3,4,6,4,3,0,5,3$6,2,0,3,2,2,3,4,2,0,3,2,3,4,6,4,3,0,5,3
Construct a frequency histogram of the data.
On how many days did $3$3 centimeters of snow fall?
On how many days did at least $4$4 centimeters of snow fall?
The following frequency table shows the data distribution for the length of leaves collected from a species of tree in the botanical gardens.
Leaf length, $x$x (mm) |
Frequency |
---|---|
$0\le x<20$0≤x<20 | $5$5 |
$20\le x<40$20≤x<40 | $11$11 |
$40\le x<60$40≤x<60 | $19$19 |
$60\le x<80$60≤x<80 | $49$49 |
$80\le x<84$80≤x<84 | $43$43 |
Which three of the following statements do we know to be true?
$35$35 leaves less than $60$60 mm were collected.
$11$11 leaves of less than $40$40 mm were collected.
The most common leaf length is between $60$60 and $80$80 mm.
Leaves are more likely to be at least $60$60 mm in length.
There were no leaves collected with a length of $10$10 mm.