topic badge

9.07 Analyzing lines of fit using residuals

Lesson

Concept summary

From a scatter plot and a line of fit, we can further analyze an association between two variables by comparing the data points to the line of fit.

The difference between the observed value (the data point) and the predicted value (the value of the trend line at that point) is called a residual.

18
19
20
21
22
x
55
60
65
70
75
80
y

By taking the residuals of each point in the data set and plotting them at their corresponding x-values, we form a residual plot for the data.

By observing the original scatter plot and the residual plot, we can analyze the data to see if there is a correlation between the two variables.

We can also use the scatter plot and residual plot to look for outliers in the data set - data points which stand out by being far away from their predicted value.

Following are some example scatter plots with the line of best fit and residuals, and their corresponding residual plots.

Scatter plot and residual plot of weak positive linear association with no outliers:

16
17
18
19
20
21
22
23
24
x
45
50
55
60
65
70
75
80
y
Scatter plot with residuals
16
17
18
19
20
21
22
23
24
x
-8
-4
4
8
12
\text{Residual } y
Residual plot

Scatter plot and residual plot of strong negative linear association with an outlier:

0.1
0.2
0.3
0.4
0.5
0.6
0.7
x
1
2
3
4
5
6
7
8
9
10
y
Scatter plot with residuals
0.1
0.2
0.3
0.4
0.5
0.6
0.7
x
-1
1
2
3
\text{Residual } y
Residual plot

Scatter plot and residual plot of non-linear association:

1
2
3
4
5
6
7
8
9
x
5
10
15
20
25
y
Scatter plot with residuals
1
2
3
4
5
6
7
8
9
x
-4
-3
-2
-1
1
2
3
4
\text{Residual } y
Residual plot

Worked examples

Example 1

Describe the strength and direction of the correlation for the scatterplot shown.

10
15
20
25
30
x
-5
5
10
15
y

Approach

The slope of the line of fit can be used to determine the direction of the correlation and the residuals can be used to describe the strength.

Solution

Since the slope of the line of fit is negative, the correlation is also negative. The data points mostly appear close to the line with relatively small residuals so we will describe this correlation as a strong negative correlation.

Example 2

Consider the data set shown in the table of values and scatter plot below with the line of fit y=-0.57x+16.74

x101113141819212325282931
y121392877423-1-2
10
15
20
25
30
x
-5
5
10
15
y
a

Create a residual plot for this data set.

Approach

The residual for each point is its vertical distance from the line of fit. We want to find this for each point, and plot it against the same x-axis scale.

Solution

A residual can be calculated by finding the difference between the actual output of each data point and the output predicted by the line of fit. This can be written as the formula: \text{residual}=\text{actual}-\text{predicted}

For example, the point at \left(10,12\right) has a predicted value of:

\displaystyle y\displaystyle =\displaystyle -0.57x+16.74Line of fit
\displaystyle =\displaystyle -0.57(10)+16.74Substitute into the line of fit
\displaystyle =\displaystyle -5.7+16.74Simplify the product
\displaystyle =\displaystyle 11.04Add

To find the residual we take \text{residual}=\text{actual}-\text{predicted}=12-11.04=0.96.

The residual can also be found by estimating the vertical distance between each data point and the line of fit.

10
15
20
25
30
x
-6
-4
-2
2
4
6
y
b

Identify a possible outlier in the scatterplot and calculate its residual.

Approach

We can identify outliers by finding data points that have a larger residual.

Solution

Looking at the residual plot, the point at x = 14 has a relatively larger residual compared to all of the other points. So the point \left(14, 2\right) is an outlier.

To calculate the residual, start by finding the output predicted by the line of fit.

1\displaystyle y=-0.57x+16.74\displaystyle =\displaystyle -0.57(14)+16.74Substitute into the line of fit
2\displaystyle =\displaystyle y=-7.98+16.74Simplify the product
3\displaystyle =\displaystyle y=-8.76Add

To find the residual we take \text{residual}=\text{actual}-\text{predicted}=2-8.76=-6.76.

Outcomes

MA.912.DP.2.4

Fit a linear function to bivariate numerical data that suggests a linear association and interpret the slope and y-intercept of the model. Use the model to solve real-world problems in terms of the context of the data.

MA.912.DP.2.5

Given a scatter plot that represents bivariate numerical data, assess the fit of a given linear function by plotting and analyzing residuals.

MA.912.DP.2.6

Given a scatter plot with a line of fit and residuals, determine the strength and direction of the correlation. Interpret strength and direction within a real-world context.

What is Mathspace

About Mathspace