topic badge

7.03 Analyzing fitted functions

Introduction

We analyzed the association between two variables using the correlation coefficient in lesson  7.01 Scatter plots and lines of fit  and lesson  7.02 Fitting functions to data  . In this lesson, we'll use another calculation to observe the correlation between two variables.

Analyzing fitted functions

Exploration

Consider the scatter plot of data relating the number of guests at a restaurant and the cost of the meal and the residual plot of the data:

Cost of meal vs number of guests
1
2
3
4
5
6
7
8
9
10
\text{Number of guests }x
10
20
30
40
50
60
70
80
90
100
110
120
\text{Cost (in dollars) }y
Line of best fit: y=12.07x+0.04
Residual plot
1
2
3
4
5
6
7
8
9
10
x
-12
-10
-8
-6
-4
-2
2
4
6
8
10
12
\text{Residual } y
Plot of residuals for the 'Cost of meal vs number of guests' graph
  1. Compare the points on the scatter plot with the points on the residual graph. What do you notice about the relationship of the points?

From a scatter plot and a line of fit, we can further analyze an association between two variables by examining the residuals of the model.

Residual

The residual value is the difference between the actual output of x and the predicted output value of x calculated using the line of best fit.

\text{residual}=\text{actual}-\text{predicted}

18
19
20
21
22
x
55
60
65
70
75
80
y

By taking the residuals of each point in the data set and plotting them at their corresponding x-values, we form a residual plot for the data.

The residual plot is constructed using the same x-axis scale and x-coordinates from the original scatter plot, and plotting the residual values as the y-coordinates.

A residual plot can be used to decide if a straight line is an appropriate model for the data. And, it identifies the strength of the relationship by showing how much the model over-predicts (negative residual) and under-predicts (positive residual) the actual data. Looking for unusually large residuals can help us identify outliers in the data set.

Two key features will help provide evidence about whether or not a linear model is appropriate, and indicate the strength of the relationship:

  • Pattern - if the linear model fitted is appropriate, then points on the residual plot should be randomly scattered about the x-axis without a noticeable pattern.

  • Size of residuals - residuals that are small in size relative to the data being predicted indicate a stronger association. Large residuals would indicate the model significantly under- or over-predicts the actual data.

Following are some example scatter plots with the line of best fit and residuals, and their corresponding residual plots.

Scatter plot and residual plot of weak positive linear association:

16
17
18
19
20
21
22
23
24
x
45
50
55
60
65
70
75
80
y
Scatter plot with residuals
16
17
18
19
20
21
22
23
24
x
-8
-4
4
8
12
\text{Residual } y
Residual plot

From the scatterplot we can see the association is positive. The residual plot has no obvious pattern, suggesting a linear model is appropriate. The residuals are relatively large indicating a weak relationship.

Scatter plot and residual plot of strong negative linear association with an outlier:

0.1
0.2
0.3
0.4
0.5
0.6
0.7
x
1
2
3
4
5
6
7
8
9
10
y
Scatter plot with residuals
0.1
0.2
0.3
0.4
0.5
0.6
0.7
x
-1
1
2
3
\text{Residual } y
Residual plot

From the scatterplot we can see the association is negative. Other than the outlier, the residuals are relatively small, indicating a strong relationship. The outlier in the scatterplot stands out in the residual plot. Its inclusion leads to most of the data points being over-predicted by the best fit line.

Scatter plot and residual plot of non-linear association:

1
2
3
4
5
6
7
8
9
x
5
10
15
20
25
y
Scatter plot with residuals
1
2
3
4
5
6
7
8
9
x
-4
-3
-2
-1
1
2
3
4
\text{Residual } y
Residual plot

The residual plot displays a clear pattern, indicating that a linear model is not appropriate for this data set.

Examples

Example 1

The scatter plot shows the relationship between the electricity usage of a household and the cost of their monthly utility bill.

Cost of energy usage
200
400
600
800
1000
1200
1400
\text{Energy Usage (in kWh)}
25
50
75
100
125
150
175
200
225
250
275
\text{Amount (in dollars)}

The equation of the line of best fit is y=0.255x-81.49

The residual plot of the data is shown:

Residual plot
200
400
600
800
1000
1200
1400
x
-20
-10
10
20
\text{Residual }y
a

Interpret the strength and linear association of the data using the line of best fit and residual plot.

Worked Solution
Apply the idea

The association between the data is a strong positive linear association.

Since the slope of the line of fit is positive, the correlation is also positive. The data points mostly appear close to the line with relatively small residuals so we will describe this correlation as a strong positive correlation.

The residual plot has no obvious pattern suggesting a linear model is appropriate for the data.

Reflect and check

The points on the residual plot are not close to the x-axis like the example of the strong linear association in the lesson. However, the residuals show that the predicted and actual values are mostly within \$10 of each other, which is relatively close for a prediction involving bills of up to \$250.

It is important to consider the size of the residuals relative to what you are predicting. This analysis leaves some room for interpretation, depending on how precise our predictions need to be.

b

Find and interpret the residual for the point \left(930, 150\right).

Worked Solution
Create a strategy

The residual for each point is its vertical distance from the line of fit.

Apply the idea

A residual can be calculated by finding the difference between the actual output of the data point and the output predicted by the line of fit. This can be written as the formula: \text{residual}=\text{actual}-\text{predicted}

The point \left(930, 150\right) tells us that an x-value of 930 has an actual y-value of 150. We can find the predicted value using the equation of the line of fit.

\displaystyle y\displaystyle =\displaystyle 0.255x-81.49Line of fit
\displaystyle =\displaystyle 0.255(930)-81.49Substitute x=930
\displaystyle =\displaystyle 237.15-81.49Evaluate the multiplication
\displaystyle =\displaystyle 155.66Evaluate the subtraction

Find the residual:

\displaystyle \text{residual}\displaystyle =\displaystyle \text{actual}-\text{predicted}Residual formula
\displaystyle =\displaystyle 150-155.66Substitute in the actual and predicted y-values
\displaystyle =\displaystyle -5.66Evaluate the subtraction

This means at a usage of 930 \text{ kWh} the trendline would have over-predicted the actual bill of \$150 by \$5.66.

Reflect and check

The sign of the residual tells us if the trendline over- or under-predicts the actual data:

  • A residual will be positive if the data lies above the line. Hence, the line under-predicts the value.

  • A residual will be negative if the data lies below the line. Hence, the line over-predicts the value.

We can visually check our answer by looking at the residual plot to see if the residual at x=930 is approximately -5,

Example 2

Consider the following data set and scatterplot with line of fit.

x1011131819212325282931
y12139877423-1-2
10
15
20
25
30
x
-5
5
10
15
y
a

Create a residual plot for the data.

Worked Solution
Create a strategy

The residual for each point is its vertical distance from the line of fit. We want to find this for each point, and plot it against the same x-axis scale.

Apply the idea

A residual is be calculated by finding the difference between the actual output of each data point and the output predicted by the line of fit.

For example, the point \left(10,12\right) tells us that an x-value of 10 has an actual y-value of 12. We can find the predicted value using the equation of the line of fit.

\displaystyle y\displaystyle =\displaystyle -0.653x+19.17Line of fit
\displaystyle =\displaystyle -0.653(10)+19.17Substitute x=10
\displaystyle =\displaystyle -6.53+19.17Evaluate the multiplication
\displaystyle =\displaystyle 12.64Evaluate the addition

Find the residual:

\displaystyle \text{residual}\displaystyle =\displaystyle \text{actual}-\text{predicted}Residual formula
\displaystyle =\displaystyle 12-12.64Substitute in the actual and predicted y-values
\displaystyle =\displaystyle -0.64Evaluate the subtraction

This residual value would be located at the point (10, -0.64). We can repeat this process for the remainder of x-values in the table to determine their residual points for the residual plot.

Residual plot
10
15
20
25
30
x
-6
-4
-2
2
4
6
\text{Residual }y

A rough sketch of the residual plot can be created by estimating the vertical distance between each data point and the line of fit.

b

Determine if a linear model is an appropriate choice for the data.

Worked Solution
Create a strategy

The residual plot can be used to determine if a linear model is an appropriate choice for the data.

Apply the idea

Since the data points are randomly dispersed around the x-axis on the residual plot of the data, a linear model appears to be appropriate for the data.

Idea summary

A residual plot shows the strength of the correlation between two variables. The closer the data points on a residual plot are to the x-axis, the stronger the correlation between the data. A model is considered strong when the residuals are small relative to the value being predicted. Calculate the residuals for a residual plot using the formula:\text{residual}=\text{actual}-\text{predicted}

In general, a residual plot with points randomly dispersed about the x-axis indicates that the model is appropriate for the data.

Outcomes

S.ID.B.6.A

Fit a function to the data; use functions fitted to data to solve problems in the context of the data. Use given functions or choose a function suggested by the context. Emphasize linear, quadratic, and exponential models.

S.ID.B.6.B

Informally assess the fit of a function by plotting and analyzing residuals.

S.ID.C.7

Interpret the slope (rate of change) and the intercept (constant term) of a linear model in the context of the data.

What is Mathspace

About Mathspace