topic badge

4.06 Misleading data displays

Misleading data displays

Different types of graphs can be used to display categorical and numerical data. The way data is displayed is often dependent on what someone is trying to communicate. It is important to analyze graphs carefully as they can be skewed intentionally or unintentionally.

Exploration

Graph 1 and Graph 2 show which stakeholders are in favor of a particular school policy.

A bar graph showing the percentage of stakeholders in favor of a school policy. The horizontal axis shows stakeholders and the vertical axis shows percent in favor from 68-78. The following are the stakeholders and the height of each corresponding bar: school staff at 76, guardians at 72 and students at 70.
Graph 1
A bar graph showing the percentage of stakeholders in favor of a school policy. The horizontal axis shows stakeholders and the vertical axis shows percent in favor from 0-80. The following are the stakeholders and the height of each corresponding bar: school staff at above halfway of 70-80, guardians a little above 70, and students at 70.
Graph 2
  1. Are they showing the same data or different data? Are they conveying the same message? Explain.

  2. What do you notice that is different about the graphs? Explain.

  3. Which graph might the school use to represent the data in the city's local newspaper? Which graph might the students use to represent the data in the school newspaper? Explain.

Sometimes, graphs are also used to misrepresent information and promote a biased view. Let's look at some of the ways in which graphs are used to mislead people.

Manipulating the scale, such as not starting the scale at zero or using a scale that is not uniform

For example, the range of values shown on the vertical axis of this graph is too large when compared to the data. This makes it appear as though the internet speed is constant.

A line graph showing internet speeds in an area. The speed looks constant because of vertical axis scale is too large compared to the data.

If we adjust the range of values shown on the vertical axis to be in proportion to the data, we can now see the data has high variability.

A line graph showing internet speeds in an area. A high variability on the data can be observed.

Manipulating intervals that could exaggerate the distance between data points

A histogram about hotel occupancy rate. Ask your teacher for more information.

\text{}\\For example, this histogram appears to show that the hotel is usually close to full capacity. However, notice that the intervals are not consistent.

The last column's interval is twice the size of the other intervals, making the other columns appear shorter than reality.

Omitting important information in titles and labels

For example, this graph shows the speeds of the 6 fastest roller coasters in the U.S. The graph makes it appear that Kingda Ka is much faster than the other roller coasters. However, there is no information on the exact speeds of the roller coasters, so we do not know how much faster it is compared to Superman: Escape from Krypton.

A horizontal bar graph showing the speeds of the 6  roller coasters.

Omitting certain data points, such as outliers or values that do not align with the desired conclusion

For example, visitors are asked to rate the cleanliness of the bathrooms a public airport. The airport claimed to survey over 100 people and shared the results using the pictograph shown.

The graph appears to show that majority of people think the bathrooms are very clean. However, only 80 of the responses are shown. If the data points that were omitted fall under "average" or "poor", then it is possible that most people do not find the bathrooms to be very clean.

A pictograph showing responses where 1 icon=10 people. Poor shows 1 sad faced icon, average shows 2 neutral faced icons, and excellent shows 5 happy faced icons.

Choosing a graphical display that does not best represent the data

Number of likes
StemLeaf
44\ 6\ 6\ 8\ 9\ 9
50\ 2\ 3\ 5\ 5\ 8\ 9
60\ 1\ 1\ 1\ 3\ 5\ 6\ 8

\text{}\\For example, this stem-and-leaf plot shows the number of likes someone received on their posts. The person claims that they typically receive 61 likes on a post.

Number of likes
40
45
50
55
60
65
70

While the mode of the data is 61, this boxplot of the same data shows that 61 likes is the upper quartile, so it is not a good representation of the center of the data.

A more accurate claim would be that they typically receive around 55 likes on a post.

With any set of data, it is important to analyze the data within the correct context.

Examples

Example 1

Shawnte plays for the school soccer team, and she was one of the top scorers of the season. They played a total of 17 games, and Shawnte organized the number of goals she scored in the dot plot shown.

A line (dot) plot about goals scored from 1 to 5. Each has the following number of dots starting from 1: 5,4,1,2,2.

Select the option that describes why this graph is misleading.

A
This graph is misleading because there is no information on what the dots represent.
B
This graph is misleading because it does not show the data from all 17 games.
C
This graph is misleading because there is a very small interval for the number of goals scored.
D
The graph is misleading because a dot plot skews the results to make them appear better than reality.
Worked Solution
Create a strategy

Check if each statement matches what the graph shows.

Apply the idea

The data represents the number of goals Shawnte scored throughout the season, which means each dot represents a game in the season. Option A is incorrect.

There are only 14 dots shown, but there were 17 games in the season. This means there are missing data values, so option B is correct.

It is not common in soccer to score a high number of goals, so a small range of goals is reasonable. Option C is incorrect.

Dot plots provide an ordered display of all values in a data set and shows the frequency of data on a number line. It is a reasonable choice for this data set, so option D is incorrect.

Example 2

A school newspaper article reads, "Students prefer pepperoni pizza over any other type of pizza topping."

A pizza composed of 6 slices of different toppings. The following are the different toppings and the percentages of students who preferred each topping: Chicken 42%, pepperoni 65%, olives 33%, onions 62%, peppers 60%, sausage 56%, ham 47%, pineapple 42%, mushrooms 59%, spinach 26%, and bacon 49% .

Explain why this graph does not best represent the data.

Worked Solution
Create a strategy

Circle graphs are used to show a relationship of the parts to a whole. Consider what the article was trying to communicate and whether this graph clearly communicated those results.

Apply the idea

The article was comparing students' preferred type of pizza toppings. However, the percentages in this circle graph sum to a number larger than 100\%, so it does not represent the proportion of students that prefer each type of topping.

Because the percentages do not represent parts of the whole population, it appears that students were able to choose more than one type of pizza topping, though not all type of toppings are represented.

Reflect and check

To compare the percentages of pizza topping preference, a bar graph would have been a better choice of display.

A bar graph showing the students' preferred pizza topping. The following are the toppings and percentages shows: pepperoni 65%,olives around halfway between 30-35%, onions a little above 60%, peppers at 60%, sausage a little above 55%, ham above 45%, pineapple halfway between 40-45%, mushrooms almost 60%, spinach a little above 25%, bacon at almost 50% and chicken around halfway 40-45%.

Example 3

Which feature of this graph is misleading?

2012
2013
2014
2015
2016
2017
2018
2019
2020
\text{Year}
100
200
300
400
500
600
700
800
900
\text{Sales }(\$)
Worked Solution
Create a strategy

Common ways for a graph to be misleading are:

  • Manipulating the scale, such as not starting the scale at zero or using a scale that is not uniform

  • Manipulating intervals that could exaggerate the distance between data points

  • Omitting important information in titles and labels

  • Omitting certain data points, such as outliers or values that do not align with the desired conclusion

  • Choosing a graphical display that does not best represent the data

Apply the idea

When looking at the grid lines of the graph, notice that the vertical scale is not uniform with the horizontal scale. The distance between the years appears much larger then the distance between the sales amount, causing the decrease is sales to appear small.

Thus, the intervals on the horizontal axis exaggerate the distance between the data points, and the vertical scale is not uniform to the horizontal scale.

Reflect and check

Here's how the graph would look like with a uniform scale:

2012
2013
2014
2015
2016
2017
2018
2019
2020
\text{Year}
100
200
300
400
500
600
700
800
900
\text{Sales}\left(\$\right)

The decrease in sales appears to be faster in this graph.

Example 4

In 2017, a state passed a "Seatbelt safety first" law in hopes that the law would reduce the number of deaths from vehicle accidents. The statistical question they want to answer is, "How does the number of deaths from vehicle accidents before 2017 compare to the number of deaths after the law was passed?"

a

What type of data needs to be collected to answer their formulated question?

Worked Solution
Create a strategy

Consider whether the data will be univariate or bivariate, and determine what variable(s) will be.

Apply the idea

The state department need to collect data on two different variables to answer their question, so they need to collect bivariate data. The first variable is the year, and the second variable is the number of deaths from vehicle accidents in each year.

Reflect and check

This data is generally collected by the state's department of transportation or the state's highway patrol. You can collect this data from either of these government websites.

b

The state collected the data and organized it into the graph shown. They concluded, "After passing the Seatbelt safety first law, the number of deaths related to vehicle accidents decreased."

A line graph about vehicular accident deaths from 2014 to 2022. The y-axis is inverted. Ask your teacher for more information.

Explain why this is misleading.

Worked Solution
Create a strategy

When looking at the graph, it appears that the number of deaths does decrease after 2017. However, we should analyze the values on the vertical and horizontal scales carefully.

Apply the idea

The vertical axis on this graph does not increase from 0. Instead, the scale has been flipped, making an increase in deaths appear to be a decrease in deaths.

Reflect and check

The correct graph would be flipped vertically, as shown.

A line graph showing vehicular accidents from 2014 to 2022. Ask your teacher for more information.

With this correct graph, we can see that the number of deaths related to vehicle accidents actually increased after the law was passed.

Idea summary

Some of the ways in which graphs can be used to mislead include:

  • Manipulating the scale , such as not starting the scale at zero or using a scale that is not uniform

  • Manipulating intervals that could exaggerate the distance between data points

  • Omitting important information in titles and labels

  • Omitting certain data points, such as outliers or values that do not align with the desired conclusion

  • Choosing a graphical display that does not best represent the data

Outcomes

8.PS.2

The student will apply the data cycle (formulate questions; collect or acquire data; organize and represent data; and analyze data and communicate results) with a focus on boxplots.

8.PS.2j

Identify components of graphical displays that can be misleading.

What is Mathspace

About Mathspace