We saw measures of center: mean, median, and mode in 6th grade and used them to draw conclusions about sets of data. We will compare the same measures of center here with different sets of data and determine how outliers influence the ways in which we compare data sets.
A measure of center, sometimes called a measure of central tendency, describes the center, or typical, value in a data set. Measures of center include the mean, median, and mode.
When comparing sets of data we should note any differences in the measures of center and interpret what these differences tell us about the data.
The dot plot shows the air quality index (AQI) rating for 20 of the world's most polluted countries measured in micrograms per cubic centimeter (µ\text{g}/\text{cm}^3).
Interpret the meaning of the mean in the context of the data set.
Explain and interpret the meaning of the median in the context of the data set.
Interpret the meaning of the mode in the context of the data set.
Compare the mean and median of the data sets.
The following results from the same quiz taken by two different classes:
The following typing speeds, in words per minute, of students from two different classes:\text{Class A: }\{26, 28, 28, 29, 29, 29, 30, 31, 31, 31, 31, 32, 32, 32, 32, 33, 35, 35, 36\}
2 | 5\ 5\ 7\ 7\ 7\ 8 |
3 | 0\ 0\ 0\ 1\ 1\ 1\ 1\ 2\ 2\ 2\ 2\ 4\ 4\ 4\ 5\ 6\ 8\ 8\ 8\ 9 |
4 | 0\ 0\ 0\ 0\ 1\ 1\ 2\ 2\ 2 |
Key: 1\vert 4=14 words per minute
The mean and median can be used to describe the middle of a set of data, while the mode simply indicates the value that is most frequent. When the mean and median are different, it tells us that the graph is likely skewed.
Explore the applet by dragging Point P and clicking the button for a new set of data.
Drag Point P closer to the other points in the data set. Then, move Point P further away from the data set. What happens to the mean as you move Point P to the position of an outlier?
What happens to the median as you move Point P to the position of an outlier?
What can you conclude about an outlier's impact on measures of center?
The mean of a data set is impacted by the inclusion of an outlier. When analyzing sets of data, if the data has outliers, it is best to use the median to describe the data since the median is less changed by outliers.
We can use the information from a five number summary and the interquartile range to determine whether a data point can be considered an outlier.
To do this, we calculate the upper and lower bounds for outliers. Any data that is above the upper bound or below the lower bound will be considered an outlier.\text{Lower outliers: } \lt Q_1 - 1.5 \times \text{ IQR} \\ \text{Upper outliers: } \gt Q_3 + 1.5 \times \text{ IQR}That is, if the data point is less than the difference between the lower quartile and 1.5 times the IQR, it is an outlier. And if the data point is more than than the sum of the upper quartile and 1.5 times the IQR, it is an outlier.
On the first three tests of the semester Kobe scored 77, 72, and 83 out of 100 points.
Determine the score out of 100 that Kobe needs on the next test to have an average of 80 over the four tests.
Compare what would happen to Kobe's mean and median test score if he scored a 50 out of 100 points on his fourth test.
Consider the two sets of data, displayed as a box plot and a dot plot below: \text{Car fuel efficiency: } \{15, 17, 18, 22, 22, 22, 23, 25, 26, 31, 35, 50\}
Determine whether the data point representing the highest car and truck fuel efficiency in each set of data represents an outlier.
Select a measure of center from both data sets to compare the fuel efficiency of cars versus trucks.
When an outlier is greater than the rest of the data, it increases the value of the mean. When an outlier is less than the rest of the data, it decreases the value of the mean. Outliers have little to no impact on the median depending on the size of the data set.