topic badge
AustraliaVIC
VCE 11 General 2023

7.04 Line of good fit

Lesson

Line of good fit

A line of good fit (or "trend" line) is a straight line that best represents the data on a scatter plot. This line may pass through some of the points, none of the points, or all of the points. However, it always represents the general trend of the points, which then determines whether there is a positive, negative or no linear relationship between the two variables.

Lines of good fit are really handy as they help determine whether there is a relationship between two variables, which can then be used to make predictions.

To draw a line of good fit, we want to minimise the vertical distances from the points to the line. This will roughly create a line that passes through the centre of the points.

Practice questions

Question 1

The following scatter plot shows the data for two variables, $x$x and $y$y.

  1. Determine which of the following graphs contains the line of good fit.

    A

    B

    C

    D
  2. Use the line of good fit to estimate the value of $y$y when $x=4.5$x=4.5.

    $5.5$5.5

    A

    $6$6

    B

    $4.5$4.5

    C

    $5$5

    D
  3. Use the line of good fit to estimate the value of $y$y when $x=9$x=9.

    $8.4$8.4

    A

    $7$7

    B

    $9.5$9.5

    C

    $6.5$6.5

    D

Using a linear model to make predictions

Given a set of data relating two variables $x$x and $y$y, it may be possible to form a linear model. This model can then be used to make predictions about other possible ordered pairs that fit this relationship.

Exploration

Say we gathered several measurements on the height of a plant $h$h over an $8$8 week period, where $t$t is time measured in weeks. We can then plot the data on the $xy$xy-plane as shown below.

Height of a plant measured at several instances.

 

We can fit a model through the observed data to make predictions about the height at certain times after planting.

Linear graph modelling height of a plant over time.

 

To make a prediction on the height, two weeks after planting, we first identify the point on the line when $t=2$t=2. Then we find the corresponding value of $h$h. As you can see below, the model predicts that two weeks after planting, the height of the plant was roughly $4.6$4.6 cm.

A predicted height of $4.6$4.6 cm when $t=2$t=2.

 

A prediction which is made within the observed data set is called an interpolation. Roughly speaking, we've gathered data between $t=0.8$t=0.8 and $t=8.2$t=8.2 so a prediction at $t=2$t=2 would be classified as an interpolation.

If we predict the population $9$9 weeks after planting, we find that the height is roughly $12.9$12.9 cm. A prediction outside the observed data set such as this one is called an extrapolation.

A predicted population of $12.9$12.9 cm when $t=9$t=9.

 

How reliable are these predictions? Well, any model that fits the observed data will make reliable predictions from interpolations since the model roughly passes through the centre of the data points. We can say that the model follows the trend of the observed data.

However extrapolations are generally unreliable since we make assumptions about how the relationship continues outside of collected data. Sometimes extrapolation can be made more reliable if we have additional information about the relationship.

 

Remember!

A prediction made within the observed data is called an interpolation.

A prediction made outside the observed data is called an extrapolation.

Generally, extrapolation is less reliable than interpolation since the model makes assumptions about the relationship outside the observed data set.

Practice questions

Question 2

Several cars underwent a brake test and their age was measured against their stopping distance. The scatter plot shows the results and a line of good fit that approximates the positive correlation.

Loading Graph...

  1. According to the line, what is the stopping distance of a car that is $2$2 years old?

  2. Using the two marked points on the line, determine the slope of the line of good fit.

  3. Assuming the line of good fit is in the form $y=mx+b$y=mx+b, determine the value of $b$b, the vertical intercept of the line.

  4. Use the line of good fit to estimate the stopping distance of a car that is $6.5$6.5 years old.

Question 3

The table shows the number of people who went to watch a movie $x$x weeks after it was released.

Weeks ($x$x) $1$1 $2$2 $3$3 $4$4 $5$5 $6$6 $7$7
Number of people ($y$y) $29$29 $25$25 $25$25 $21$21 $21$21 $17$17 $17$17
  1. Plot the points from the table.

    Loading Graph...

  2. If a line of good fit were drawn to approximate the relationship, which of the following could be its equation?

    $y=2x$y=2x

    A

    $y=2x+30$y=2x+30

    B

    $y=-2x$y=2x

    C

    $y=-2x+30$y=2x+30

    D
  3. Graph the line of good fit whose equation is given by $y=-2x+30$y=2x+30.

    Loading Graph...

  4. Use the equation of the line of good fit to find the number of people who went to watch the movie $9$9 weeks after it was released.

QUESTION 4

A car company looked at the relationship between how much it had spent on advertising and the amount of sales each month over several months. The data has been plotted on the scatter graph and a line of good fit drawn. Two points on the line are $\left(2000,300\right)$(2000,300) and $\left(3500,450\right)$(3500,450).

Loading Graph...

  1. Using the two given points, what is the slope of the line of good fit?

  2. The line of good fit can be written in the form $S=\frac{1}{10}A+b$S=110A+b, where $S$S is the value of Sales in thousands of dollars and $A$A is advertising expenditure.

    Determine the value of $b$b, the vertical intercept of the line.

  3. Use the line of good fit to estimate the number of sales next month (in dollars) if $\$3900$$3900 is to be spent on advertising.

  4. Which of the following is true?

    The prediction in part (c) is:

    Reliable as the prediction made was within the original data set.

    A

    Reliable as the prediction made was outside of the original data set.

    B

    Unreliable as the prediction made was within the original data set.

    C

    Unreliable as the prediction made was outside of the original data set.

    D

Outcomes

U2.AoS1.4

the equation of a line of good fit

U2.AoS1.6

identify the explanatory variable and use the equation of a line of good fit by eye to the data to model an observed linear association

U2.AoS1.7

calculate the intercept and slope, and interpret the slope and intercept of the model in the context of data

U2.AoS1.8

use a linear model to make predictions, including the issues of interpolation and extrapolation

What is Mathspace

About Mathspace