topic badge

9.03 Interpreting data distributions

Lesson

Concept summary

When we describe the shape of data sets, we focus on how the scores are distributed and whether the shape is symmetrical or not.

Symmetric

The data set is distributed around the center with similar frequency on the left and right.

In symmetrical distributions the \text{mean}\approx \text{median}.

Left skew

The majority of the data points have higher values, with some data points at lower values.

In distributions that are skewed left the \text{mean} < \text{median}.

Right skew

The majority of the data points have lower values, with some data points at higher values.

In distributions that are skewed right the \text{mean} > \text{median}.

Uniform

The data set is evenly distributed across all values.

In uniform distributions the \text{mean} \approx \text{median}.

When describing skewed distributions, it's better to use median and interquartile range as measures of center and spread because they are resistant more to extreme data points. When describing symmetrical or uniform distributions, it's better to use mean and range as measures of center and spread because they take the values of all data points into account.

If we want to compare two distributions visually, it is important to check that the axis and scales are the same on both displays.

Data displays can be compared by showing them in parallel or back-to-back:

Parallel box plots
Back-to-back stem plot

Worked examples

Example 1

Use measures of shape, center, and spread to analyze and interpret the given dot plot.

Approach

To analyze the data set we need to look at:

  • Shape: skewed or symmetric
  • Center: mean or median
  • Spread: range or IQR
  • Extreme data points

Solution

Because of the extreme data point at 24, this distribution is skewed right.

The mean of the distribution is \frac{12+12+13+13+13+13+13+14+14+24}{10}=14 and the median is 13. The median is a better measure to describe the center of the distribution because the data set is skewed.

The range of the data set is 24-12=12 and the interquartile range is 14-13=1. The interquartile range is a better measure to describe the spread because the data set has an extreme data point that greatly impacts the range.

In summary, the typical age for this data set is 13 years old and the middle half of the ages vary by 1 year. There is an extreme data point at 24 years old making the data set skewed to the right.

Reflection

Notice that the majority of the data points in the dot plot are between 12-14 years old, but the mean 14.1 and bigger than almost all of the points. This is an example of why the mean shouldn't be used for a skewed data set.

Similarly, if we describe the data set by saying the range is 12 years, it gives the false impression that the data set is spread out over a 12 year span, when in reality most of the data points are 1-2 years apart. This is why it's better to choose the interquartile range to describe the spread of a skewed data set.

Example 2

The parallel box plots show the weight of the 2021 Chicago Bulls NBA team and the 2021 Chicago Sky WNBA team.

Interpret the differences in the shape, center, and spread of the weights for each team.

Chicago Bulls Player Weights (lbs)
130
150
170
190
210
230
250
270
Chicago Sky Player Weights (lbs)
130
150
170
190
210
230
250
270

Approach

To compare these data sets we need to individually identify the:

  • Shapes: symmetric or skewed
  • Centers: mean and median
  • Spread: range and IQR

Both distributions have extreme values that are higher than the rest. Excluding the effect of these outliers, the Chicago Bull's distribution appears to be skew right and the Chicago Sky distribution appears to be symmetric.

The Chicago Bulls have a median weight of 210lbs and the Chicago Sky have a median weight around 165lbs.

The Chicago Bulls have an interquartile range of about 220-190=30\text{lbs}. The range with the extreme data points is about 260-185=75\text{lbs}, but if the extreme values are excluded it's closer to 250-185=65\text{lbs}. The Chicago Sky have an interquartile range of about 185-145=40\text{lbs}. The range with the extreme data point is about 235-135=100\text{lbs}, but the range without the outlier is only about 190-135=55\text{lbs}.

Solution

Both teams appear have extreme data values which makes the median and interquartile range a more accurate summary statistic. Excluding the effect of the extreme values, the weight of players on the Chicago Bulls is skewed right which tells us that most players have similar weights and a few players weigh much more than the rest of the team. The weight of players on the Chicago Sky team are symmetrically distributed which tells us that most players weigh around average with a few players weighing less and a few players weighing more.

A typical player on the Chicago Bulls weighs 45lbs more than a typical player on the Chicago Sky which is consistent with the knowledge that men, on average, weigh more than women. The weight of the players on the Chicago Sky basketball team are less consistent than the weights of the players on the Chicago Bulls basketball team because both the range and interquartile range are larger.

Reflection

When describing and comparing data sets be sure to include a sentence on the shape, center, and spread. Include an analysis of any extreme data points and always include the context of the problem.

Outcomes

A1.N.Q.A.1

Use units as a way to understand real-world problems.*

A1.N.Q.A.1.A

Choose and interpret the scale and the origin in graphs and data displays.*

A1.S.ID.A.2

Use statistics appropriate to the shape of the data distribution to compare center (mean, median, and/or mode) and spread (range, interquartile range) of two or more different data sets.*

A1.S.ID.A.3

Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points.*

A1.MP1

Make sense of problems and persevere in solving them.

A1.MP2

Reason abstractly and quantitatively.

A1.MP3

Construct viable arguments and critique the reasoning of others.

A1.MP4

Model with mathematics.

A1.MP5

Use appropriate tools strategically.

A1.MP6

Attend to precision.

A1.MP7

Look for and make use of structure.

What is Mathspace

About Mathspace