topic badge

5.06 Compare data representations with histograms

Compare data representations with histograms

Data displays are useful for the organize and represent and analyze and communicate results stages of the data cycle. The best data displays make useful information easy to read for the intended audience.

Numerical data can be displayed using histograms, stem-and-leaf plots, and line plots (dot plots).

Categorical data can be displayed using circle graphs, bar graphs, and dot plots.

Exploration

The histogram, stem-and-leaf plot, and line plot summarize the heights of 30 students in a class, in inches:

Histogram titled 'Classmate height'. The horizontal axis shows height (in) and the vertical axis shows the number of students. There are six tickmarks -50,55,60,65,70 and 75, creating five bins. The following are the frequency of each bin in order starting from the left: 4,1,18,6,1.
Classmate height
StemLeaf
51\ 4\ 4\ 4\ 7
60\ 0\ 0\ 1\ 1\ 2\ 2\ 2\ 2\ 2\ 3\ 3\ 3\ 3\ 3\ 4\ 4\ 4\ 5\ 5\ 5\ 6\ 7\ 7
71
Key 5\vert 1 = 51 inches
A line (dot) plot titled 'Classmate height'. The horizontal axis shows the height (in) starting from 50 to 73. The following are the tickmarks with corresponding dots (none for the others): 51,1; 54,3 ; 57,1; 60,3; 61.2; 62,5; 63,5; 64,3 ; 65,3 ; 66,1; 67,2; 71,1.
  1. What do you notice about the different displays for the same data?

  2. What does the line plot tell you that the histogram does not?

  3. What does the stem-and-leaf tell you that the histogram does not?

  4. What are the benefits of each display? What information is easier to read of one display compared to another?

  5. If the data was recorded to the nearest eighth of an inch, would these displays still be helpful? Explain.

Histograms display the frequency of data as either a count or relative proportion along the y-axis and divide the numerical data into bins of equal width along the x-axis.

Generally, we include the lower bound and exclude the upper bound, so the equivalent labels for the first bin would be 1\leq \text{Rainfall}\lt 5.

A histogram titled Annual Rainfall in inches. The y-axis is titled Number of Cities. The x-axis is titles Rainfall in inches and shows the boxes of different heights and bin ranges of the same width visually and numerically. The bin ranges of each of the bars of different heights are as follows: 1 to 5 has height 10; 5 to 9 has height 22; 9 to 13 has height 46; 13 to 17 has height 48; 17 to 21 has height 18; 21 to 25 has height 9.
A histogram titled 'Ages on the spinning strawberry ride'. The horizontal axis shows the ages and the vertical axis shows the number of people. Ask your teacher for more information.

Histograms can also have the intervals labeled on the bars for data that is rounded.

Histograms
AdvantagesDisadvantages
Good for large quantities of dataIndividual data values are lost
Easy to read off the spread, clusters, and trendsBin size can affect the conclusion
We do not need to round when collecting data

Line plots or dot plots display the frequency of data by the number of dots at each value. This display is best used for countable values with a small range.

A dot plot with no title is shown. X's representing individual data point are stacked with equal size. The number line ranges from 1 to 3 with an interval of one fourth. The number of X's in each interval is as follows: at 1, 2; at 1 and 1 fourth, 3; at 1 and 1 half, 2; at 1 and 3 fourth, 1; at 2, 3, at 2 and 1 fourth, 3; at 2 and 1 half, 2; at 2 and 3 fourth, 1; and at 3, 1.
Dot plot (line plot)
AdvantagesDisadvantages
Simple to createDifficult to show for large quantities of data
Can read off the most and least common responseDifficult to show a large number of categories or a large spread
Can see some clusters in the data
Tortoise age
StemLeaf
106\ 7\ 9
110\ 3\ 3\ 4\ 5\ 5\ 6\ 6
122\ 3\ 5\ 7
131\ 5
Key 12\vert 3 = 123 years

Stem-and-leaf plots show the actual data values in an organized and sorted display.

Stem-and-leaf plot
AdvantagesDisadvantages
Data is sorted Difficult to show large quantities of data
Keeps the original dataNot helpful if the data only varies in the last digit
Can see some clusters in the data

A circle graph is useful for showing proportions and parts of a whole for different categories.

A circle graph showing the percentages of age groups on a spinning strawberry ride. Ask your teacher for more information.

Advantages:

  • We can easily compare the proportions of different categories visually.

  • They are commonly used.

Disadvantages

  • It can be difficult to compare similar groups if they are not labeled with the values or percentages.

  • We may lose the original totals if we just show the percentages, but it becomes harder to compare portions of a whole if we show the original totals.

  • Data must be able to fit in less than about 10 categories to have a readable circle graph.

Histograms, line plots, stem-and-leaf plots, and sometimes circle graphs, may display the same data, but the different displays have strengths and weaknesses.

  • Histograms do not show every individual data value, but show intervals where there may be gaps or a lower frequency of data and can be used for very large data sets.

  • Line plots and stem-and-leaf plots show the original data values, but cannot easily represent very large sets of numerical data.

  • Circle graphs may only be used for grouped numerical data or when there are only a few possible numerical responses.

Examples

Example 1

Classroom attendance over a month is shown in a histogram and a stem-and-leaf plot. Each data value tells us how many students were present each day.

Daily class attendance
StemLeaf
19
21\ 1\ 1\ 2\ 2\ 3\ 3\ 3\ 4\ 4\ 4\ 5\ 5\ 5\ 5\ 6\ 6\ 6\ 6\ 7\ 7\ 8\ 8\ 8\ 8\ 9
30
A histogram about class attendance. The vertical axis shows the attendance number in bins scaling 2 starting from 17 to 33. The vertical axis shows the number of days from 0 to 8.There are 6 bins starting from 19 and ending at 31. The frequencies of each bin are, starting from the left: 1,5,6,8,6, and 2.
a

Which display shows how many days had 25\leq \text{ attendance } \lt 27 more clearly?

Worked Solution
Create a strategy

The histogram shows the attendance number on its horizontal axis and the number of days on its vertical axis. The stem-and-leaf plot also shows the attendance number, but it doesn't clearly tell us the number of days.

Apply the idea

The histogram clearly displays how many days had 25\leq \text{ attendance } \lt 27.

Reflect and check

We can check our answer using the stem-and-leaf plot, but we would need to count the values in this interval.

A stem and leaf plot showing class attendance. 25 and 26 are highlighted. Ask your teacher for more information.
b

Which display shows the day with the highest attendance more clearly?

Worked Solution
Create a strategy

The histogram groups the attendance numbers as intervals while the stem-and-leaf plot clearly shows the individual attendance numbers.

Apply the idea

The stem-and-leaf plot shows the day with the highest attendance more clearly.

Reflect and check

We usually need to estimate when we find the range from a histogram as we don't know what the actual values are. However, we can read the lower and upper extremes from the stem-and-leaf plot and don't need to estimate.

c

Which display shows the shape more clearly?

Worked Solution
Create a strategy

Histograms provide a visual representation of the data by displaying the frequency of values within different intervals.

Stem-and-leaf plots show individual data points, but may not show the overall shape of the distribution as effectively as histograms unless there are between 5 and 10 stems.

Apply the idea

The histogram shows the shape more clearly.

Reflect and check

If the data had a larger range over more stems, then we can also see the shape from the stem-and-leaf plot. For example:

Ages at the doctor's office
StemLeaf
00\ 0\ 0\ 1\ 2\ 4\ 7
10\ 1\ 4\ 4\ 5\ 8
22\ 5\ 6\ 7\ 8\ 9
34\ 5\ 6\ 6
42
51\ 3\ 3\ 7
60\ 2\ 2\ 3\ 3\ 4

Example 2

Determine the best type of data display(s) for each formulated question:

a

What is a typical number of goals for my high school's soccer team?

Worked Solution
Create a strategy

Look at the size and possible range of values of the data set.

Apply the idea

A dot plot would be a good choice for a data display.

The number of games in a season would be fairly small, and the number of goals per game would not vary greatly.

Reflect and check

It would likely be difficult to find appropriate bin widths for a histogram without hiding the shape of the data.

b

How do the heights of the 196 Olympic gymnasts in the most recent summer games compare?

Worked Solution
Create a strategy

Look at the size and possible values that could be recorded for the data set.

Apply the idea

Since there is a large number of data points, 196, and it is likely that there would be a very large range of different heights, a histogram would be the best graph to summarize the data.

There might be significant height differences between male and female gymnasts, so a histogram may be helpful to show any gaps in the data or we could create separate histograms for different categories like main event or gender.

Reflect and check

Technology can make it easier to graph data. With technology, we can create and compare a dot plot or histogram to determine which visual best represents the data. We can also play with the bin or category width for histograms to see the best display.

Example 3

Shown below are the quiz score percentages from Mr. Sanchez's first period math class: \{20, 25, 26, 30, 30, 40, 43, 63, 65, 67, 70, 70, 75, 90, 93 \}

a

Construct a histogram of the quiz scores with intervals of 15.

Worked Solution
Create a strategy

We can use technology or create the histogram by hand. Let's look at how to do it using technology.

  1. Enter the data in a single column.

    A screenshot of the GeoGebra statistics tool showing the data 20, 25, 26, 30, 30, 40, 43, 63, 65, 67, 70, 70, 75, 90, and 93 entered in column A, rows 1 to 15. Speak to your teacher for more details.
  2. Select all of the cells containing data and choose "One Variable Analysis".

    A screenshot of the GeoGebra statistics tool showing the cells containing 20, 25, 26, 30, 30, 40, 43, 63, 65, 67, 70, 70, 75, 90, and 93 selected. The menu from the second leftmost icon is shown. Speak to your teacher for more details.
  3. By default a histogram is created, but we can then adjust the bin or category widths.

    A screenshot of the GeoGebra statistics tool. From left to right, the following are shown: the cells containing 20, 25, 26, 30, 30, 40, 43, 63, 65, 67, 70, 70, 75, 90, and 93 selected and a histogram. Speak to your teacher for more details.
Apply the idea

We now need to adjust the settings by clicking on the gear, ticking the box for Set Classes Manually, and then checking the boxes on top of the histogram as Start: 20 and Width: 15.

A screenshot of the GeoGebra statistics tool. From left to right, the following are shown: the cells containing 20, 25, 26, 30, 30, 40, 43, 63, 65, 67, 70, 70, 75, 90, and 93 selected, a histogram, and the settings. Speak to your teacher for more details.

Finally, we can label our histogram.

A histogram titled ' Quiz results for Mr. Sanchez's first period math class'. The horizontal axis shows the quiz score (%) and the vertical axis shows the number of students (Frequency). There are 6 bins that are labeled (score range), and has frequencies of: 20-34, 5; 35-49, 2; 50-64, 1; 65-79, 5; and 80-94, 2.
b

Explain whether a dot plot, stem-and-leaf plot, circle graph, or a histogram with a different bin width could be a better display for the data.

Worked Solution
Apply the idea

A dot plot would not be a good choice. Although the data set is made up of only 15 students' scores, the scores are spread out from 20 \% to 93 \%, which would mean the dot plot is very long for the data points and would mostly be gaps. Also, 30 and 70 are the only repeated scores, so the dot plot would not really tell us any new information.

Quiz results for Mr. Sanchez's first period math class
StemLeaf
20\ 5\ 6
30\ 0
40\ 3
5
63\ 5\ 7
70\ 0\ 5
8
90\ 3
Key 2\vert 0 = 20 %

\, \\A stem-and-leaf plot would be quick to make as the data is already sorted. It clearly shows the gaps in the 50s and 80s and that the data is quite spread out.

A circle graph would not be an ideal choice as it would not give any more insights than the histograms which are generally easier to read.

A histogram graphed by organizing the data into 10\% intervals would be similar to the stem-and-leaf plot, but rotated and using bars instead of numbers. One advantage of this histogram is that we would be able to see the gaps in the 50s and 80s which are lost with the 15\% interval and it can be more visually appealing than a stem-and-leaf plot.

Reflect and check

The other histogram would look like this:

A histogram titled 'Quiz results for Mr. Sanchez's first period math class'. The vertical axis shows the quiz score (%) and the horizontal axis shows the number of students (Frequency). There are 8 bins, starting from 20 with a scale of 9, up to 99. The following are the bins and their frequencies: 20-29,3; 30-39, 2; 40-49,2; 50-59,0; 60-69,3; 70-79,3; 80-89, 0; and 90-99,2.
Idea summary

The best display for a data set is one that reveals the information we want to share. Some displays hide key information like the individual data points, the total number of data points, or features like the shape, clusters, gaps, and spread.

As a starting place, consider the size, range, and quantity of data.

  • If there is a small quantity and range of data, try a dot plot.

  • If the data has a large range or quantity of data, try a histogram.

  • Choose a histogram if, in addition to center, spread, and shape, you want to know the size of the data set and view any gaps or clusters among various intervals.

StrengthsDrawbacks
HistogramEasily display large or spread out data sets. The shape, mode, and spread are visible.Cannot see individual data values. Depending on the number or size of the bins or intervals, the shape can look different.
Circle graphWe can compare the proportions of different categories visually. They are commonly used.It can be difficult to compare similar groups if they are not labeled. We may lose the original totals if we just show the percentages.
Dot plot (line plot)Useful for individual data values. The highest and lowest categories are easy to see.Do not work well for large data sets. Can be slower to make by hand.
Stem-and-leaf plotUseful for individual data values. The highest and lowest values are easy to see.Do not work well for large data sets.

Outcomes

7.PS.2

The student will apply the data cycle (formulate questions; collect or acquire data; organize and represent data; and analyze data and communicate results) with a focus on histograms.

7.PS.2f

Compare data represented in histograms with the same data represented in other graphs, including but not limited to line plots (dot plots), circle graphs, and stem-and-leaf plots, and justify which graphical representation best represents the data.

What is Mathspace

About Mathspace