topic badge

5.05 Histograms

Histograms

Numerical data, such as times, heights, weights or temperatures, are values that can be measured. Any data value within a range of values is possible. Instead of having a visual for every single data point, we can group the values into equal-sized intervals to more easily observe patterns and trends in the data.

A grouped frequency table can be helpful when collecting numerical data when there is a lot of data or a large range of data values. It shows the number of values (frequency) within each interval, called a class or bin.

AppsFrequency
20–39 9
40–59 7
60–79 20
80–99 30
100–119 6

This frequency table shows the number of apps 72 people have installed on their phone.

Notice that the number of apps within each class interval is the same.

The first class includes the number of apps for 9 different people. Each of them have 20 to 39 apps, but we don't know the actual number for each individual.

When displaying frequency information for this type of data, a histogram is used. The histogram is a visual representation of numerical data. It allows us to see clearly where all of the recorded values fall along a numerical scale. The x-axis represents the measurements in the data set, and the y-axis represents the frequency, or number of times that the measure occurs in the data set.

The key features of a histogram are:

  • The horizontal axis is a numerical scale

  • The data on the horizontal axis is grouped into intervals, called classes or bins

  • There are no gaps between the columns of a histogram

  • The height of each column will be the frequency

A histogram about the number of apps on people's phones. The horizontal label shows the number of apps in in 5 bins. Each bin has a range of 19. The first bin is starts at 20. The vertical label shows frequency from 0 to 30. Ask your teacher for more info.

The horizontal axis of a histogram can be labeled in two different ways. One method is to label each column with the interval of values it represents. The other method is to label the boundaries of each interval. In this method, the lower endpoint is always included, and the upper endpoint is excluded.

So the first bin includes everyone who has from 20 to 39 apps on their phone. But those with 40 apps are counted in the second bin.

A histogram about the number of apps on people's phones. The horizontal label shows the number of apps in intervals of 20, starting from 20 to 120. The vertical label shows frequency from 0 to 30. Ask your teacher for more info.

Histograms are a special type of bar graph. For a bar graph to be a histogram:

  1. The bars must touch because they measure consecutive intervals

  2. It must measure quantitative, numerical data

Exploration

Move the slider to investigate how using different intervals could impact the representation of the data in a histogram.

Loading interactive...
  1. Describe what happens to the histogram as you increase the number of intervals.

  2. What number of intervals is the most appropriate? Would this answer change if the data set looked different?

The intervals in a histogram could result in misleading conclusions regarding the data set. For example, extremely large or small intervals (bins) can make it difficult to see the shape of the data.

Consider the data set: \begin{aligned}&11, 13, 26, 35, 33, 37, 41, 42, 45, 45, 50, 52,\\ &54, 55, 55, 58, 60, 60,62, 63, 65, 67, 68, 77, 78\end{aligned}

In this histogram, we might conclude that the least likely values are those below 21, or the most likely values are from 41 to 60.

A histogram. Horizontal label shows class and vertical label shows frequency. Ask your teacher for more info.

The following histogram represents the same data with different sized bins. Here, we see that the least likely values are actually 20–29 and the most likely values are actually 60–69.

A histogram. Horizontal label shows class and vertical label shows frequency. Ask your teacher for more info.

There are some general guidelines to use when choosing bin intervals:

  • Intervals should all be equal in size.

  • Intervals should include all of the data.

  • Boundaries for intervals should reflect the data values being represented.

  • Determine the number of intervals based upon the data.

  • If possible, create a number of intervals that is a factor of the number of data values (ie. a histogram representing 20 data values might have 4 or 5 intervals) will simplify the process.

Examples

Example 1

A government agency records how long people wait on hold to speak to their representatives. The results are displayed in the histogram:

A histogram about time on hold. The horizontal axis shows length of hold in minutes and the vertical axis shows frequency from 0 to 12. Ask your teacher for more info.
a

Complete the corresponding frequency table:

Length of hold (minutes)Frequency
0–9
10–19
20–29
30–39
40–49
Worked Solution
Create a strategy

List the corresponding frequency of each length of hold (minutes).

Apply the idea
Length of hold (minutes)Frequency
0–9 11
10–19 12
20–29 11
30–39 2
40–49 4
b

How many phone calls were made?

Worked Solution
Create a strategy

Add all the frequencies from part (a).

Apply the idea
\displaystyle \text{Number of calls}\displaystyle =\displaystyle 11+12+11+2+4Find the sum of all frequencies
\displaystyle =\displaystyle 40Evaluate
c

Find the number of people that waited less than 30 minutes.

Worked Solution
Create a strategy

The number of people that waited less than 30 minutes will be the sum of the heights (frequencies) of the first three columns of the histogram.

Apply the idea

The frequencies of the first three bins are 11, 12, and 11.

The number of people that waited less than 30 minutes is 11+12+11=34.

d

Find the proportion of people that waited 40–49 minutes.

Worked Solution
Create a strategy

We can find the proportion by finding the height (frequency) of the 40–49 column and dividing it by the total number of calls made.

Apply the idea

The number of people that waited 40–49 minutes is 4.

The total number of phone calls made is 40.

The proportion of people that waited 40–49 minutes is \dfrac{4}{40}=\dfrac{1}{10}.

Example 2

In product testing, the number of faults found in a certain piece of machinery is recorded over time. The number of faults found each day is shown:\begin{aligned}&0,\, 0,\, 2,\, 1,\, 0,\, 1,\, 2,\, 3,\, 0,\, 1,\, 4,\, 5,\, 6,\, 7,\, 4,\, 5,\, 5,\, 7,\, 6,\, 5,\, 6,\, 4,\, 4,\, 5,\\ & 8,\, 9,\,8,\, 9,\, 10,\, 11,\, 8,\, 9,\,9,\, 8,\, 10,\, 8,\,11,\, 9,\, 10,\, 11,\, 10,\, 9,\,10,\, 10, \\ & 12,\, 13,\, 14,\, 15,\, 12,\, 12,\, 14,\, 13,\, 12,\, 13,\, 14,\, 15,\, 15,\, 13,\, 12,\, 14\end{aligned}

a

Use the data to construct a histogram.

Worked Solution
Create a strategy

We can use technology to construct the histogram by following these steps:

  1. In the GeoGebra Statistics calculator, enter the data in a single column

  2. Select all of the cells containing data and choose "One Variable Analysis."

  3. Use the settings to adjust the histogram as needed.

Apply the idea
  1. Enter the data in a single column.

    A screenshot of the GeoGebra Statistics tool showing how to enter a given set of data. Speak to your teacher for more details.
  2. Select all of the cells containing data and choose "One Variable Analysis."

    A screenshot of the GeoGebra Statistics tool showing the menu that contains the One Variable Analysis option. Speak to your teacher for more details.
  3. After the histogram is generated, we can adjust the width of the columns by checking the box that says "Set Classes Manually". At the top, we can set the width to 4, since 4 is a factor of 60. We can also check the box to show the frequency table below the histogram.

    A screenshot of the GeoGebra Statistics tool showing how to generate the histogram of a given data set. Speak to your teacher for more details.
Reflect and check

We could have chosen different widths for the classes, but the other options may show different trends. For example, a width of 3 shows that the frequency increases with the number of faults. It also shows a different range for the most common number of faults.

A screenshot of the GeoGebra Statistics tool showing how to generate the histogram of a given data set. Speak to your teacher for more details.

We should consider other possible bin sizes and explore the trends that arise. However, we should not use bin intervals to manipulate how others might interpret our results.

b

How many days did the company record the number of faults?

Worked Solution
Create a strategy

Add the frequency columns to find the total number of days the company recorded faults.

Apply the idea

Each value in the frequency column represents a fault that was recorded.

\displaystyle \text{Number of faults}\displaystyle =\displaystyle 10+14+20+16Add the frequencies
\displaystyle =\displaystyle 60Evaluate

There were 60 days in which the company recorded faults.

c

On how many days were no more than 8 faults recorded?

Worked Solution
Create a strategy

Check the frequencies where the faults are less than 8.

Apply the idea

If we look at the first and second rows of the table, we see that there were 10+14, which equals 24 days, so 24 days recorded less than 8 faults.

d

What percentage of the days were 12–15 faults recorded?

Worked Solution
Create a strategy

Look at the frequency for the last interval, which corresponds to 12–15 faults.

Apply the idea

The last row of the table shows that 16 faults were recorded out of 60 which is about 26.7\% of the total faults.

Example 3

A city's botanical garden recently planted a new species of tree. They want to learn more about the tree's characteristics so they can share their findings with the public. One of the investigative questions they asked is, "What are the possible lengths of the leaves of this tree when it is mature?"

a

What type of data needs to be collected to answer their question?

Worked Solution
Apply the idea

To answer the question, the staff at the botanical gardens must collect data on the lengths of the leaves of mature trees of this species. Since this is only one varible of interest, the data is univariate.

Reflect and check

This data will be collected by measuring the lengths of the leaves. We cannot survey the leaves or observe them to identify their lengths. We also do not want to try to control the lengths of the leaves, so an experiment is not a good method for data collection.

b

The frequency table shows the data distribution for the length of leaves collected from the new species of tree in the botanical gardens. Use the data to construct a histogram.

\text{Leaf length}, x\text{ (mm)}\text{Frequency}
0 \leq x\ \lt 205
20 \leq x\ \lt 4011
40 \leq x\ \lt 6019
60 \leq x\ \lt 8049
80 \leq x \ \lt 10043
Worked Solution
Create a strategy

Use the class intervals to draw the horizontal axis and the maximum frequency to scale the vertical axis, then draw the bars on the histogram.

Apply the idea

We can draw a histogram using the class interval boundaries as the labels on the horizontal axis and the frequencies on the vertical axis. Since the frequencies go up to 49, we can choose a scale of 10 that ranges from 0 to 50.

A histogram about length of leaves. The horizontal axis shows leaf length in mm, and the vertical label shows frequency. Ask your teacher for more information.
c

Which three of the following statements are correct?

A
35 leaves less than 60\text{ mm} were collected.
B
11 leaves less than 40\text{ mm} were collected.
C
The most commn leaf length is between 60 and 80\text{ mm}.
D
Leaves are more likely to be at least 60\text{ mm} in length.
E
There were no leaves collected with a length of 10\text{ mm}.
Worked Solution
Create a strategy

Use the histogram to determine which options are accurate. Recall that histograms only give information about how frequent certain ranges of lengths are. They does not tell us anything about specific lengths.

Apply the idea

The first three columns of the histogram show the frequencies of leaves less than 60\text{ mm} long. 5+11+19=35Option A is correct.

The first two columns of the histogram show the frequencies of leaves less than 40\text{ mm} long. 5+11=16Option B is incorrect.

The class with the highest frequency is 60–80, so option C is correct.

The two rightmost columns have the highest frequencies. Each of their frequencies are higher than the left three columns combined. So, option D is correct.

Histograms and frequency tables do no provide information on specific data values, so we do not know whether any leaves had a length of 10\text{ mm}.

The correct options are: A, C, and D.

Example 4

The following set of values represent the distances (in inches) reached by 8\text{th} grade students in a standing long jump exercise.\begin{aligned}&44,\,62,\,56,\,53,\,31,\,78,\,59,\,46,\,32,\,41,\,65,\,45,\\&48,\,57,\,61,\,98,\,35,\,42,\,88,\,49,\,33,\,75,\,95,\,55,\,97\end{aligned}

a

Formulate a question that could be answered by constructing a histogram.

Worked Solution
Create a strategy

Histograms help us see the modal class (which interval has the most data values) and the spread of the data. They can help us answer statistical questions about how data varies or how common a range of values.

Apply the idea

One question that a histogram can help us answer is, "What proportion of students jumped further than 90\text{ in}?"

Reflect and check

There are many possible questions we could ask that a histogram could help us answer. Another question might be, "What is most common range of distances jumped by 8th grade students?"

b

Which histogram should we use to analyze this data? Explain your answer.

A
A histogram about 8th grade students' long jump. The horizontal axis shows distance in inches, and the vertical axis shows frequency. There are 7 bins. Ask you teacher for more info.
B
A histogram about 8th grade students' long jump. The horizontal axis shows distance in inches, and the vertical axis shows frequency. There are 4 bins. Ask you teacher for more info.
C
A histogram about 8th grade students' long jump. The horizontal axis shows distance in inches, and the vertical axis shows frequency. There are 14 bins. Ask you teacher for more info.
D
A histogram about 8th grade students' long jump. The horizontal axis shows distance in inches, and the vertical axis shows frequency. There are 3 bins. Ask you teacher for more info.
Worked Solution
Create a strategy

We can begin by checking for the basic characteristics of histograms:

  • The intervals should cover all the values in the data set. That is, there should be no value that does not belong to a set.

  • The upper boundary of any class should be adjacent to the lower boundary of the next interval. That is, there should be no gaps between the interval boundaries.

  • The size of each interval must be the same.

All of these are satisfied, so we now need to consider the lengths of each interval. We need to choose an interval that helps us identify trends in the data. If the length of intervals is too small or too large, it can be difficult to see any trends.

Apply the idea

In histogram A, the columns have a relatively good length which leads to a good number of columns. This length also helps us see trends in the data, such as the most common or least common ranges of long jumps.

The intervals in the second histogram have a relatively large range. Although it shows one column taller than the other, it is still too general to make any specific conclusions about the data.

The intervals in histogram C are very small, so there are many peaks, gaps, and valleys in the columns. With so many differences betweeen the columns, the conclusions drawn from this histogram would be too specific.

The intervals in the last histogram have a very large range. This makes it difficult to identify specific trends in the data, and the conclusions would be too general.

The correct answer is histogram A.

Reflect and check

Now that the data is represented in a histogram, we can answer our question from part (a), "What proportion of students jumped further than 90\text{ in}?"

There are 25 data values in total, and 3 students jumped in the largest class interval of 90–99\text{ in}. So, \dfrac{3}{25} jumped further than 90\text{ in}. Although we could have answered this question using the raw data, the histogram makes it easier to identify exactly how many students are in the largest class interval.

c

Approximately half of the data falls within which two bins of the histogram?

Worked Solution
Create a strategy

Counting the values in the data set, we see there are 25 data values in total. That means half of the data values is 12.5. Using the histogram from part (b), we need to look for two bins whose frequencies sum to about 12.5.

Apply the idea
A histogram about 8th grade students' long jump. The horizontal axis shows distance in inches, and the vertical axis shows frequency. There are 7 bins. Ask your teacher for more info.

The two tallest columns in the histogram have frequencies of 7 and 5 which sum to 12. This shows approximately half of the data falls within the bins 40–49 and 50–59.

Reflect and check

Because approximately half of the data are within the two bins we identified, that means the other half of the data must in the remaining bins. When we add the frequencies of all the other bins, we get 4+3+2+1+3=13This confirms our answer because \dfrac{12}{25}=0.48 and \dfrac{13}{25}=0.52 and 0.48+0.52=1 or 100\% of the data.

Idea summary

Every data value must go into exactly one and only one interval or interval.

The key features of a histogram are:

  • The horizontal axis is a numerical scale (like a number line)

  • The data on the horizontal axis may be grouped into intervals

  • There are no gaps between the columns of a histogram

  • The height of each column will be the frequency

Outcomes

7.PS.2

The student will apply the data cycle (formulate questions; collect or acquire data; organize and represent data; and analyze data and communicate results) with a focus on histograms.

7.PS.2d

Organize and represent numerical data using histograms with and without the use of technology.

7.PS.2e

Investigate and explain how using different intervals could impact the representation of the data in a histogram.

7.PS.2g

Analyze data represented in histograms by making observations and drawing conclusions. Determine how histograms reveal patterns in data that cannot be easily seen by looking at the corresponding given data set.

What is Mathspace

About Mathspace