topic badge

6.02 Measures of center

Introduction

We saw measures of center: mean, median, and mode in 6th grade and used them to draw conclusions about sets of data. We will compare the same measures of center here with different sets of data and determine how outliers influence the ways in which we compare data sets.

Measures of center

A measure of center, sometimes called a measure of central tendency, describes the center, or typical, value in a data set. Measures of center include the mean, median, and mode.

Mean

The average of the numbers in a data set

Median

The middle value of an ordered data set. For a set with an even number of data points, the median is the average of the two middle values.

Mode

The most frequently occurring data value. A data set could have more than one mode or no mode.

When comparing sets of data we should note any differences in the measures of center and interpret what these differences tell us about the data.

Examples

Example 1

The dot plot shows the air quality index (AQI) rating for 20 of the world's most polluted countries measured in micrograms per cubic centimeter (µ\text{g}/\text{cm}^3).

A dot plot titled 2020 AQI Rating in micrograms per cubic centimeter, ranging from 28 to 77. The number of dots is as follows: at 29, 2; at 30, 1; at 31, 2; at 34, 1; at 35, 1; at 38, 1; at 39, 1; at 40, 1; at 41, 2; at 44, 3; at 47, 2; at 52, 1; at 59, 1; at 77, 1.
a

Interpret the meaning of the mean in the context of the data set.

Worked Solution
Create a strategy

The measures of center describe a "typical" value. The mean describes an average value.

Apply the idea

The average air quality index of a polluted country is 41.6 \,µ\text{g}/\text{cm}^3.

Reflect and check

All statements about the mean and median should be sentences phrased in terms of the variable of interest and include the correct units of measurement.

b

Explain and interpret the meaning of the median in the context of the data set.

Worked Solution
Create a strategy

Remember, about half the data points will be more than the median, and about half the data points will be less.

Apply the idea

Half of the 20 most polluted countries have an air quality rating less than 40.5\, µ\text{g}/\text{cm}^3 and half of them have an air quality rating better than 40.5\, µ\text{g}/\text{cm}^3.

c

Interpret the meaning of the mode in the context of the data set.

Worked Solution
Create a strategy

Mode is the most common value.

Apply the idea

The most common air quality index of a polluted country is 44 \,µ\text{g}/\text{cm}^3.

Example 2

Compare the mean and median of the data sets.

a

The following results from the same quiz taken by two different classes:

Two dot plots. The left dot plot is titled Class A, with the quiz scores ranging from 1 to 11. The number of dots is as follows: at 2, 1; at 3, 1; at 5, 1; at 6, 2; at 7, 6; at 8, 3; at 9, 2; at 10, 4. The right dot plot is titled Class B, with the quiz scores ranging from 1 to 11. The number of dots is as follows: at 1, 1; at 2, 2; at 3, 2; at 5, 3; at 6, 5; at 7, 2; at 8, 1; at 10, 2.
Worked Solution
Create a strategy

Calculate the mean and median from each data set.

Apply the idea

The mean of Class A's quiz scores is 7.3 while the mean of Class B's quiz scores is 5.44. Class A has a higher score average than Class B.

The median of Class A's quiz scores is 7 while the median of Class B's quiz scores is 6. Half of Class A scored above a 7 and half of Class B scored above a 6 on the quiz.

In general, Class A scored higher on average and more students in Class A scored above a 7 on the quiz.

Reflect and check

By looking at the dot plots, we have an idea about where the mean and median of the data for each class is located and where most of the quiz scores lie.

b

The following typing speeds, in words per minute, of students from two different classes:\text{Class A: }\{26, 28, 28, 29, 29, 29, 30, 31, 31, 31, 31, 32, 32, 32, 32, 33, 35, 35, 36\}

Class B typing speed
25\ 5\ 7\ 7\ 7\ 8
30\ 0\ 0\ 1\ 1\ 1\ 1\ 2\ 2\ 2\ 2\ 4\ 4\ 4\ 5\ 6\ 8\ 8\ 8\ 9
40\ 0\ 0\ 0\ 1\ 1\ 2\ 2\ 2

Key: 1\vert 4=14 words per minute

Worked Solution
Apply the idea

The mean of Class A's typing speeds is 31.05 words per minute and the mean of Class B's typing speeds is 34.14 words per minute.

The median of Class A's typing speeds is 31 words per minute, and the median of Class B's typing speeds is 34 words per minute.

In general, Class B types faster on average, and half of the students in Class B type faster than at least half of the students in Class A.

Reflect and check

The mean and median being close in each data set indicates that the data is mostly symmetric and the mean is a good way to describe the data.

Idea summary

The mean and median can be used to describe the middle of a set of data, while the mode simply indicates the value that is most frequent. When the mean and median are different, it tells us that the graph is likely skewed.

Outliers

Exploration

Explore the applet by dragging Point P and clicking the button for a new set of data.

Loading interactive...
  1. Drag Point P closer to the other points in the data set. Then, move Point P further away from the data set. What happens to the mean as you move Point P to the position of an outlier?

  2. What happens to the median as you move Point P to the position of an outlier?

  3. What can you conclude about an outlier's impact on measures of center?

The mean of a data set is impacted by the inclusion of an outlier. When analyzing sets of data, if the data has outliers, it is best to use the median to describe the data since the median is less changed by outliers.

We can use the information from a five number summary and the interquartile range to determine whether a data point can be considered an outlier.

Interquartile range (IQR)

The difference between the upper and lower quartile values in a set of data, representing 50 \% of the data set

To do this, we calculate the upper and lower bounds for outliers. Any data that is above the upper bound or below the lower bound will be considered an outlier.\text{Lower outliers: } \lt Q_1 - 1.5 \times \text{ IQR} \\ \text{Upper outliers: } \gt Q_3 + 1.5 \times \text{ IQR}That is, if the data point is less than the difference between the lower quartile and 1.5 times the IQR, it is an outlier. And if the data point is more than than the sum of the upper quartile and 1.5 times the IQR, it is an outlier.

Examples

Example 3

On the first three tests of the semester Kobe scored 77, 72, and 83 out of 100 points.

a

Determine the score out of 100 that Kobe needs on the next test to have an average of 80 over the four tests.

Worked Solution
Create a strategy

We can let x represent Kobe's score on the next test and find the average as the sum of all four scores divided by 4.

Apply the idea
\displaystyle \dfrac{77+72+83+x}{4}\displaystyle =\displaystyle 80Equation for the average of Kobe's test scores
\displaystyle \dfrac{232+x}{4}\displaystyle =\displaystyle 80Evaluate the addition
\displaystyle {232+x}\displaystyle =\displaystyle 320Multiplication property of equality
\displaystyle {x}\displaystyle =\displaystyle 88Subtraction property of equality

Kobe needs to score an 88 on his next test to have a test average of 80 over the four tests.

b

Compare what would happen to Kobe's mean and median test score if he scored a 50 out of 100 points on his fourth test.

Worked Solution
Create a strategy

We need to find the current mean and median of Kobe's test scores, then we can calculate the new mean and median with the score of 50 from his fourth test included to make a comparison.

Apply the idea

Before the fourth test:

The mean is \dfrac{77+72+83}{3}=\dfrac{232}{3}\approx 77.3.

To find the median, we first put the scores in order: \{72,77,83\}. Then, we can identify the middle data point.

The median is 77.

After the fourth test:

The mean is \dfrac{77+72+83+50}{4}=\dfrac{282}{4}\approx 70.5.

The scores in order are now: \{50, 72,77,83\} and the median is in between 72 and 77.

The median is \dfrac{72+77}{2}=74.5.

The mean test score drops from 77.3 to 70.5. The median score drops from 77 to 72. Both the mean and median are lowered by the low score of 50, but the mean drops by 6.8 points and the median only changes by 2.5 points.

Example 4

Consider the two sets of data, displayed as a box plot and a dot plot below: \text{Car fuel efficiency: } \{15, 17, 18, 22, 22, 22, 23, 25, 26, 31, 35, 50\}

Car fuel efficiency (Miles per gallon)
10
15
20
25
30
35
40
45
50
55
60
A dot plot titled Truck fuel efficiency in miles per gallon, ranging from 12 to 27. The number of dots is as follows: at 12, 1; at 13, 2; at 14, 1; at 15, 3; at 16, 2; at 17, 1; at 19, 1; at 27, 1.
a

Determine whether the data point representing the highest car and truck fuel efficiency in each set of data represents an outlier.

Worked Solution
Create a strategy

Calculate the five number summary for each set of data. Then, use the formula for calculating if each data point calls above the upper bound for outliers within their data set.

Apply the idea
Minimum15
Lower Quartile20
Median22.5
Upper Quartile28.25
Maximum50

For the car fuel efficiency, we need to determine if the car that gets 50 \text{ mpg} is an outlier. We have: 50 \gt 28.25 + \left(1.5 \times 8.25 \right) \to 50 \gt 40.625 Since 50 falls above the upper bound for outliers, we can state that it is an outlier in the data set.

Minimum12
Lower Quartile13.5
Median15
Upper Quartile16.5
Maximum27

For the truck fuel efficiency, we need to determine if the truck that gets 27 \text{ mpg} is an outlier. We have: 27 \gt 16.5 + \left(1.5 \times 3 \right) \to 27 \gt 21 Since 27 falls above the upper bound for outliers, we can state that it is an outlier in the data set.

b

Select a measure of center from both data sets to compare the fuel efficiency of cars versus trucks.

Worked Solution
Create a strategy

Since we know that outliers influence the mean of a data set, it's best to compare the median of each data set.

Apply the idea

The median miles per gallon of a car is 22.5 and the median miles per gallon of a truck is 15. That means that a car typically gets 7.5 miles more per gallon in fuel efficiency.

Reflect and check

Since both data sets have a higher-valued outlier, we can expect that the mean of each set is higher than the median and does not represent the typical value, as well as the median does.

Idea summary

When an outlier is greater than the rest of the data, it increases the value of the mean. When an outlier is less than the rest of the data, it decreases the value of the mean. Outliers have little to no impact on the median depending on the size of the data set.

Outcomes

S.ID.A.1

Represent data with plots on the real number line (dot plots, histograms, and box plots).

S.ID.A.2

Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (interquartile range, standard deviation) of two or more different data sets.

S.ID.A.3

Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).

What is Mathspace

About Mathspace