topic badge
AustraliaVIC
VCE 11 General 2023

7.04 Line of good fit

Lesson

Line of good fit

A line of good fit (or "trend" line) is a straight line that good represents the data on a scatter plot. This line may pass through some of the points, none of the points, or all of the points. However, it always represents the general trend of the points, which then determines whether there is a positive, negative or no linear relationship between the two variables.

Lines of good fit are really handy as they help determine whether there is a relationship between two variables, which can then be used to make predictions.

To draw a line of good fit, we want to minimise the vertical distances from the points to the line. This will roughly create a line that passes through the centre of the points.

Examples

Example 1

The following scatter plot shows the data for two variables, x and y.

1
2
3
4
5
6
7
8
9
10
x
1
2
3
4
5
6
7
8
9
10
y
a

Draw a line of good fit for the data.

Worked Solution
Create a strategy

Draw a line that follows the trend of the points and have the same number of points above and below the line.

Apply the idea
1
2
3
4
5
6
7
8
9
10
x
1
2
3
4
5
6
7
8
9
10
y

Here is an example of a line of good fit that follows the trend of the data and has the same number of points above and below the line.

b

Use the line of good fit to estimate the value of y when x=4.5

1
2
3
4
5
6
7
8
9
10
x
1
2
3
4
5
6
7
8
9
10
y
A
4.5
B
5
C
5.5
D
6
Worked Solution
Create a strategy

Refer to the graph and find the y-coordinate when the x-coordinate is 4.5 using the line of good fit.

Apply the idea

Using the line of good fit when x=4.5, the y-value is between 4 and 5. So, the correct answer is option a.

c

Use the line of good fit to estimate the value of y when x=9

A
6.5
B
7
C
8.4
D
9.5
Worked Solution
Create a strategy

Refer to the graph and find the y-coordinate when the x-coordinate is 9 using the line of good fit.

Apply the idea

Using the line of good fit when x=9, the y-value is between 8 and 9. So, the correct answer is option a.

Idea summary

Lines of good fit are really handy as they help determine whether there is a relationship between two variables, which can then be used to make predictions.

Using a linear model to make predictions

Given a set of data relating two variables x and y, it may be possible to form a linear model. This model can then be used to understand the relationship between the variables and make predictions about other possible ordered pairs that fit this relationship.

A scatter plot with the x and y-axis ranging from 0 to 14. The graph shows points loosely sloping upward from the bottom left to the upper right of the graph.

Say we gathered several measurements on the height of a plant h over an 8 week period, where t is time measured in weeks. We can then plot the data on the xy-plane as shown.

A scatter plot with the x and y-axis ranging from 0 to 14. The graph shows points loosely sloping upward from the bottom left to the upper right of the graph with line that fits the model.

We can fit a model through the observed data to make predictions about the height at certain times after planting.

A scatter plot with the x and y-axis ranging from 0 to 14. The graph shows points loosely sloping upward from the bottom left to the upper right of the graph with point x=2 pointing to the corresponding y-value in the line.

To make a prediction on the height, two weeks after planting, we first identify the point on the line when t=2. Then we find the corresponding value of h. As you can see below, the model predicts that two weeks after planting, the height of the plant was roughly 4.6 \operatorname{cm}.

A scatter plot with the x and y-axis ranging from 0 to 14. The graph shows points loosely sloping upward from the bottom left to the upper right of the graph with point x=9 pointing to the corresponding y-value in the line.

A prediction which is made within the observed data set is called an interpolation. Roughly speaking, we've gathered data between t=0.8 andt=8.2 so a prediction at t=2 would be classified as an interpolation.

If we predict the population 9 weeks after planting, we find that the height is roughly 12.9 \operatorname{cm}. A prediction outside the observed data set such as this one is called an extrapolation.

How reliable are these predictions? Well, any model that fits the observed data will make reliable predictions from interpolations since the model roughly passes through the centre of the data points. We can say that the model follows the trend of the observed data.

However extrapolations are generally unreliable since we make assumptions about how the relationship continues outside of collected data. Sometimes extrapolation can be made more reliable if we have additional information about the relationship.

Examples

Example 2

Several cars underwent a brake test and their age, x, was measured against their stopping distance, y. The scatter plot shows the results and a line of good fit that approximates the positive correlation:

1
2
3
4
5
6
7
8
9
10
11
12
\text{age of cars(years)}
10
20
30
40
50
\text{stopping distance (m)}
a

According to the line, what is the stopping distance of a car that is 2 years old?

Worked Solution
Create a strategy

Refer to the graph and find the y-coordinate when the x-coordinate is 2 using the line of good fit.

Apply the idea

Using the line of good fit when x=2, the y-value is equal to 34\operatorname{m}.

b

Using the two marked points on the line, determine the slope of the line of good fit.

Worked Solution
Create a strategy

The slope of the line is the value of the ratio of change in y to change in x which is the same between any two points on a line.

Apply the idea

Identify two points on the line and use the formula:

m=\dfrac{y_2-y_1}{x_2-x_1}, where m is the slope of the line.

We can use the coordinates of the marked points, (6,42) and the point (10, 50) to find the slope of the line.

\displaystyle m\displaystyle =\displaystyle \dfrac{50-42}{10-6}Substitute the value of the coordinates of the points
\displaystyle =\displaystyle 2Evaluate

The slope, m=2.

c

Determine the value of the vertical intercept of the line.

Worked Solution
Create a strategy

Substitute the slope found in part b and one of the points into y=mx+b to determine the y-intercept, b.

Apply the idea

Substitute m=2 and the point (10, 50) into the formula y=mx+b.

\displaystyle 50\displaystyle =\displaystyle 2(10)+bSubstitute the value of the slope and the coordinates of the point.
\displaystyle 30\displaystyle =\displaystyle bEvaluate

The y-intercept is b=30.

d

Use the line of good fit to estimate the stopping distance of a car that is 6.5 years old.

Worked Solution
Create a strategy

Substitute x=6.5 into the equation of the line of good fit.

Apply the idea

Substitute x=6.5 into y=2x+30 to solve for y.

\displaystyle y\displaystyle =\displaystyle 2(6.5)+30Substitute x=6.5
\displaystyle =\displaystyle 43Evaluate

The stopping distance of a car that is 6.5 years old is 43 meters.

Example 3

The table shows the number of people who went to watch a movie x weeks after it was released.

\text{Weeks }(x)1234567
\text{Number of people }(y)37373333292925
a

Plot the points from the table.

Worked Solution
Create a strategy

Plot each x-value along with its corresponding y-value.

Apply the idea
1
2
3
4
5
6
7
8
x
25
30
35
40
y

The points from the table have the coordinates (1,37), \,(2,37), \,(3,33), \,(4,33), \,(5,29), \,(6,29), \\(7,25) .

b

If a line of good fit were drawn to approximate the relationship, which of the following could be its equation?

A
y=-2x+40
B
y=2x+40
C
y=-2x
D
y=2x
Worked Solution
Create a strategy

Check the trend in the scatterplot.

Apply the idea

We can see that the trend in the scatterplot is decreasing which means we have a negative gradient. So options B and D are incorrect. Also, option C is incorrect because it implies that the yintercept is zero, whereas the trend contradicts it.

So the correct answer is option A.

c

Graph the line of good fit whose equation is given by y=-2x+40.

Worked Solution
Create a strategy

To graph the line, identify any two points that satisfy the equation. One point may be the y-intercept.

Apply the idea

By substituting x=0 to the equation, we have: \begin{aligned} y&=-2(0)+40 \\ y&=40 \end{aligned}

Solving the next point, with x=2, we have: \begin{aligned} y&=-2(2)+40 \\ y&=36 \end{aligned}

1
2
3
4
5
6
7
8
x
25
30
35
40
y

Here is the scatterplot with its line of good fit.

Reflect and check

We can see that the line of good fit follows the trend of the data and has the same number of points above and below the line.

d

Use the equation of the line of good fit to find the number of people who went to watch the movie 12 weeks after it was released.

Worked Solution
Create a strategy

Substitute x=12 to the equation.

Apply the idea
\displaystyle \text{Number of people}\displaystyle =\displaystyle -2(12)+40Substitute x=12
\displaystyle =\displaystyle -24+40Perform the multiplication
\displaystyle =\displaystyle 16Evaluate

Example 4

A car company looked at the relationship between how much it had spent on advertising and the amount of sales each month over several months. The data has been plotted on the scatter graph and a line of good fit drawn. Two points on the line are \left(3200, 300\right) and \left(5600, 450\right).

1000
2000
3000
4000
5000
6000
7000
8000
A
100
200
300
400
500
600
700
800
S
a

Using the two given points, what is the slope of the line of good fit?

Worked Solution
Create a strategy

The slope of the line is the value of the ratio of change in y to change in x which is the same between any two points on a line.

Apply the idea

Identify two points on the line and use the formula:

m=\dfrac{y_2-y_1}{x_2-x_1}, where m is the slope of the line.

We can use the coordinates of the y-intercept, (3200,300) and the point (5600,450) to find the slope of the line.

\displaystyle m\displaystyle =\displaystyle \dfrac{450-300}{5600-3200}Substitute the value of the coordinates of the points
\displaystyle =\displaystyle \dfrac{1}{16}Simplify

The slope, m=\dfrac{1}{16}.

b

The line of good fit can be written in the form S = \dfrac{1}{16} A + b, where S is the money spent on sales in thousands of dollars, and A is the advertising costs.

Determine the value of b, the vertical intercept of the line.

Worked Solution
Create a strategy

Substitute the slope found in part (a) and one of the points into y=mx+b to determine the y-intercept, b.

Apply the idea

Substitute m=\dfrac{1}{16} and the point (3200, 300) into the formula y=mx+b.

\displaystyle 300\displaystyle =\displaystyle \dfrac{1}{16}(3200)+bSubstitute the value of the slope and the coordinates of the point.
\displaystyle 100\displaystyle =\displaystyle bEvaluate

The y-intercept is b=100.

c

Use the line of good fit to estimate the number of sales next month if \$4800 is to be spent on advertising.

Worked Solution
Create a strategy

Substitute x=4800 into the equation of the line of good fit.

Apply the idea

Substitute x=4800 into S=\left(\dfrac{1}{16}x+100\right)1000 to solve for y.

\displaystyle S\displaystyle =\displaystyle \left(\dfrac{1}{16}(4800)+100\right)1000Substitute x=4800
\displaystyle =\displaystyle 400\,000Evaluate

The number of sales next month if \$4800is is to be spent on advertising is 400\,000 dollars.

d

Which of the following is true about the prediction in part (c)?

A
Reliable as the prediction made was within the original set.
B
Reliable as the prediction made was outside of the original data set.
C
Unreliable as the prediction made was within the original set.
D
Unreliable as the prediction made was outside of the original data set.
Worked Solution
Create a strategy

Check if the prediction made is within or outside the observed data set.

Apply the idea

The regression line attempts to approximate the true relationship between sales and advertising costs, using the given data. For this reason, the regression line may not generalise well for predictions that are far away from the data. The correct answer is option (a).

Idea summary

A prediction made within the observed data is called an interpolation.

A prediction made outside the observed data is called an extrapolation.

Generally, extrapolation is less reliable than interpolation since the model makes assumptions about the relationship outside the observed data set.

Outcomes

U2.AoS1.4

the equation of a line of good fit

U2.AoS1.6

identify the explanatory variable and use the equation of a line of good fit by eye to the data to model an observed linear association

U2.AoS1.7

calculate the intercept and slope, and interpret the slope and intercept of the model in the context of data

U2.AoS1.8

use a linear model to make predictions, including the issues of interpolation and extrapolation

What is Mathspace

About Mathspace