Numerical data, such as times, heights, weights or temperatures, are values that can be measured. Any data value within a range of values is possible. Instead of having a visual for every single data point, we can group the values into equal-sized intervals to more easily observe patterns and trends in the data.
A grouped frequency table can be helpful when collecting numerical data when there is a lot of data or a large range of data values. It shows the number of values (frequency) within each interval, called a class or bin.
Apps | Frequency |
---|---|
20–39 | 9 |
40–59 | 7 |
60–79 | 20 |
80–99 | 30 |
100–119 | 6 |
When displaying frequency information for this type of data, a histogram is used. The histogram is a visual representation of numerical data. It allows us to see clearly where all of the recorded values fall along a numerical scale. The x-axis represents the measurements in the data set, and the y-axis represents the frequency, or number of times that the measure occurs in the data set.
The key features of a histogram are:
The horizontal axis is a numerical scale
The data on the horizontal axis is grouped into intervals, called classes or bins
There are no gaps between the columns of a histogram
The height of each column will be the frequency
The horizontal axis of a histogram can be labeled in two different ways. One method is to label each column with the interval of values it represents. The other method is to label the boundaries of each interval. In this method, the lower endpoint is always included, and the upper endpoint is excluded.
So the first bin includes everyone who has from 20 to 39 apps on their phone. But those with 40 apps are counted in the second bin.
Histograms are a special type of bar graph. For a bar graph to be a histogram:
The bars must touch because they measure consecutive intervals
It must measure quantitative, numerical data
Move the slider to investigate how using different intervals could impact the representation of the data in a histogram.
Describe what happens to the histogram as you increase the number of intervals.
What number of intervals is the most appropriate? Would this answer change if the data set looked different?
The intervals in a histogram could result in misleading conclusions regarding the data set. For example, extremely large or small intervals (bins) can make it difficult to see the shape of the data.
Consider the data set: \begin{aligned}&11, 13, 26, 35, 33, 37, 41, 42, 45, 45, 50, 52,\\ &54, 55, 55, 58, 60, 60,62, 63, 65, 67, 68, 77, 78\end{aligned}
In this histogram, we might conclude that the least likely values are those below 21, or the most likely values are from 41 to 60.
The following histogram represents the same data with different sized bins. Here, we see that the least likely values are actually 20–29 and the most likely values are actually 60–69.
There are some general guidelines to use when choosing bin intervals:
Intervals should all be equal in size.
Intervals should include all of the data.
Boundaries for intervals should reflect the data values being represented.
Determine the number of intervals based upon the data.
If possible, create a number of intervals that is a factor of the number of data values (ie. a histogram representing 20 data values might have 4 or 5 intervals) will simplify the process.
A government agency records how long people wait on hold to speak to their representatives. The results are displayed in the histogram:
Complete the corresponding frequency table:
Length of hold (minutes) | Frequency |
---|---|
0–9 | |
10–19 | |
20–29 | |
30–39 | |
40–49 |
How many phone calls were made?
Find the number of people that waited less than 30 minutes.
Find the proportion of people that waited 40–49 minutes.
In product testing, the number of faults found in a certain piece of machinery is recorded over time. The number of faults found each day is shown:\begin{aligned}&0,\, 0,\, 2,\, 1,\, 0,\, 1,\, 2,\, 3,\, 0,\, 1,\, 4,\, 5,\, 6,\, 7,\, 4,\, 5,\, 5,\, 7,\, 6,\, 5,\, 6,\, 4,\, 4,\, 5,\\ & 8,\, 9,\,8,\, 9,\, 10,\, 11,\, 8,\, 9,\,9,\, 8,\, 10,\, 8,\,11,\, 9,\, 10,\, 11,\, 10,\, 9,\,10,\, 10, \\ & 12,\, 13,\, 14,\, 15,\, 12,\, 12,\, 14,\, 13,\, 12,\, 13,\, 14,\, 15,\, 15,\, 13,\, 12,\, 14\end{aligned}
Use the data to construct a histogram.
How many days did the company record the number of faults?
On how many days were no more than 8 faults recorded?
What percentage of the days were 12–15 faults recorded?
A city's botanical garden recently planted a new species of tree. They want to learn more about the tree's characteristics so they can share their findings with the public. One of the investigative questions they asked is, "What are the possible lengths of the leaves of this tree when it is mature?"
What type of data needs to be collected to answer their question?
The frequency table shows the data distribution for the length of leaves collected from the new species of tree in the botanical gardens. Use the data to construct a histogram.
\text{Leaf length}, x\text{ (mm)} | \text{Frequency} |
---|---|
0 \leq x\ \lt 20 | 5 |
20 \leq x\ \lt 40 | 11 |
40 \leq x\ \lt 60 | 19 |
60 \leq x\ \lt 80 | 49 |
80 \leq x \ \lt 100 | 43 |
Which three of the following statements are correct?
The following set of values represent the distances (in inches) reached by 8\text{th} grade students in a standing long jump exercise.\begin{aligned}&44,\,62,\,56,\,53,\,31,\,78,\,59,\,46,\,32,\,41,\,65,\,45,\\&48,\,57,\,61,\,98,\,35,\,42,\,88,\,49,\,33,\,75,\,95,\,55,\,97\end{aligned}
Formulate a question that could be answered by constructing a histogram.
Which histogram should we use to analyze this data? Explain your answer.
Approximately half of the data falls within which two bins of the histogram?
Every data value must go into exactly one and only one interval or interval.
The key features of a histogram are:
The horizontal axis is a numerical scale (like a number line)
The data on the horizontal axis may be grouped into intervals
There are no gaps between the columns of a histogram
The height of each column will be the frequency