Numerical data, such as times, heights, weights or temperatures, are values that can be measured. Any data value within a range of values is possible. Instead of having a visual for every single data point, we can group the values into equal-sized intervals to more easily observe patterns and trends in the data.

A **grouped frequency table** can be helpful when collecting numerical data when there is a lot of data or a large range of data values. It shows the number of values (frequency) within each interval, called a **class** or **bin**.

Apps | Frequency |
---|---|

20–39 | 9 |

40–59 | 7 |

60–79 | 20 |

80–99 | 30 |

100–119 | 6 |

When displaying frequency information for this type of data, a **histogram** is used. The histogram is a visual representation of numerical data. It allows us to see clearly where all of the recorded values fall along a numerical scale. The x-axis represents the measurements in the data set, and the y-axis represents the frequency, or number of times that the measure occurs in the data set.

The key features of a histogram are:

The horizontal axis is a numerical scale

The data on the horizontal axis is grouped into intervals, called classes or bins

There are no gaps between the columns of a histogram

The height of each column will be the frequency

The horizontal axis of a histogram can be labeled in two different ways. One method is to label each column with the interval of values it represents. The other method is to label the boundaries of each interval. In this method, the lower endpoint is always included, and the upper endpoint is excluded.

So the first bin includes everyone who has from 20 to 39 apps on their phone. But those with 40 apps are counted in the second bin.

Histograms are a special type of bar graph. For a bar graph to be a histogram:

The bars must touch because they measure consecutive intervals

It must measure quantitative, numerical data

Move the slider to investigate how using different intervals could impact the representation of the data in a histogram.

Describe what happens to the histogram as you increase the number of intervals.

What number of intervals is the most appropriate? Would this answer change if the data set looked different?

The intervals in a histogram could result in misleading conclusions regarding the data set. For example, extremely large or small intervals (bins) can make it difficult to see the shape of the data.

Consider the data set: \begin{aligned}&11, 13, 26, 35, 33, 37, 41, 42, 45, 45, 50, 52,\\ &54, 55, 55, 58, 60, 60,62, 63, 65, 67, 68, 77, 78\end{aligned}

In this histogram, we might conclude that the least likely values are those below 21, or the most likely values are from 41 to 60.

The following histogram represents the same data with different sized bins. Here, we see that the least likely values are actually 20–29 and the most likely values are actually 60–69.

There are some general guidelines to use when choosing bin intervals:

Intervals should all be equal in size.

Intervals should include all of the data.

Boundaries for intervals should reflect the data values being represented.

Determine the number of intervals based upon the data.

If possible, create a number of intervals that is a factor of the number of data values (ie. a histogram representing 20 data values might have 4 or 5 intervals) will simplify the process.

A government agency records how long people wait on hold to speak to their representatives. The results are displayed in the histogram:

a

Complete the corresponding frequency table:

Length of hold (minutes) | Frequency |
---|---|

0–9 | |

10–19 | |

20–29 | |

30–39 | |

40–49 |

Worked Solution

b

How many phone calls were made?

Worked Solution

c

Find the number of people that waited less than 30 minutes.

Worked Solution

d

Find the proportion of people that waited 40–49 minutes.

Worked Solution

In product testing, the number of faults found in a certain piece of machinery is recorded over time. The number of faults found each day is shown:\begin{aligned}&0,\, 0,\, 2,\, 1,\, 0,\, 1,\, 2,\, 3,\, 0,\, 1,\, 4,\, 5,\, 6,\, 7,\, 4,\, 5,\, 5,\, 7,\, 6,\, 5,\, 6,\, 4,\, 4,\, 5,\\ & 8,\, 9,\,8,\, 9,\, 10,\, 11,\, 8,\, 9,\,9,\, 8,\, 10,\, 8,\,11,\, 9,\, 10,\, 11,\, 10,\, 9,\,10,\, 10, \\ & 12,\, 13,\, 14,\, 15,\, 12,\, 12,\, 14,\, 13,\, 12,\, 13,\, 14,\, 15,\, 15,\, 13,\, 12,\, 14\end{aligned}

a

Use the data to construct a histogram.

Worked Solution

b

How many days did the company record the number of faults?

Worked Solution

c

On how many days were no more than 8 faults recorded?

Worked Solution

d

What percentage of the days were 12–15 faults recorded?

Worked Solution

A city's botanical garden recently planted a new species of tree. They want to learn more about the tree's characteristics so they can share their findings with the public. One of the investigative questions they asked is, "What are the possible lengths of the leaves of this tree when it is mature?"

a

What type of data needs to be collected to answer their question?

Worked Solution

b

The frequency table shows the data distribution for the length of leaves collected from the new species of tree in the botanical gardens. Use the data to construct a histogram.

\text{Leaf length}, x\text{ (mm)} | \text{Frequency} |
---|---|

0 \leq x\ \lt 20 | 5 |

20 \leq x\ \lt 40 | 11 |

40 \leq x\ \lt 60 | 19 |

60 \leq x\ \lt 80 | 49 |

80 \leq x \ \lt 100 | 43 |

Worked Solution

c

Which three of the following statements are correct?

A

35 leaves less than 60\text{ mm} were collected.

B

11 leaves less than 40\text{ mm} were collected.

C

The most commn leaf length is between 60 and 80\text{ mm}.

D

Leaves are more likely to be at least 60\text{ mm} in length.

E

There were no leaves collected with a length of 10\text{ mm}.

Worked Solution

The following set of values represent the distances (in inches) reached by 8\text{th} grade students in a standing long jump exercise.\begin{aligned}&44,\,62,\,56,\,53,\,31,\,78,\,59,\,46,\,32,\,41,\,65,\,45,\\&48,\,57,\,61,\,98,\,35,\,42,\,88,\,49,\,33,\,75,\,95,\,55,\,97\end{aligned}

a

Formulate a question that could be answered by constructing a histogram.

Worked Solution

b

Which histogram should we use to analyze this data? Explain your answer.

A

B

C

D

Worked Solution

c

Approximately half of the data falls within which two bins of the histogram?

Worked Solution

Idea summary

Every data value must go into exactly one and only one interval or interval.

The key features of a histogram are:

The horizontal axis is a numerical scale (like a number line)

The data on the horizontal axis may be grouped into intervals

There are no gaps between the columns of a histogram

The height of each column will be the frequency