topic badge
iGCSE (2021 Edition)

18.13 Lines of best fit

Lesson

When we display bivariate data that appears to have a linear relationship, we usually want to find a line that best models the relationship so we can see the trend and make predictions. We call this the line of best fit.

 

Exploration

We want to draw a line of best fit for the following scatter diagram:

Let's try drawing three lines across the data and consider which is most appropriate.

  • We can tell straight away that $A$A is not the right line. This data appears to have a positive linear relationship, but $A$A has a negative gradient
  • $B$B has the correct sign for its gradient, and it passes through three points! However, there are many more points above the line than below it, and we should try to make sure the line of best fit passes through the centre of all the points
  • That means that line $C$C is the best fit for this data as there is approximately an equal number of data points that lie above and below the line of best fit with the distance of the points from the line being minimised

Practice questions

Question 1

The average monthly temperature and the average wind speed in a particular location was plotted over several months. The graph shows the points for each month’s data and their line of  best fit.

 

Loading Graph...
A scatter plot is displayed with temperature (°C) on the horizontal axis, ranging from 0 to 10, and wind speed (knots) on the vertical axis, ranging from 0 to 8. A series of black dots represent individual data points plotted on the graph. The data points show a decreasing trend from the top left to the bottom right. A straight line descends diagonally across the plot, suggesting a linear relationship as it passes through the scattering of points. The line passes through points $\left(0,8\right)$(0,8) and $\left(10,2\right)$(10,2), but is not explicitly given in the image.

 

  1. Use the line of best fit to approximate the wind speed on a day when the temperature is $5$5°C.

Equation of a line of best fit

To calculate the equation of a given line of best fit, we need to be able to calculate:

  • gradient: $m=\frac{y_2-y_1}{x_2-x_1}$m=y2y1x2x1

  • vertical intercept: this is the $c$c term in $y=mx+c$y=mx+c

Once you can identify these features, you can use them to make conclusions and predictions about the data.

Remember!

There are different ways to calculate the equation of a straight line. These include:

  • gradient-intercept form: $y=mx+b$y=mx+b, where $m$m is the gradient and $c$c is the vertical intercept
  • point-gradient form: $y-y_1=m\left(x-x_1\right)$yy1=m(xx1)

Since lines of best fit are used in real statistical analyses, graphing them is similar to other linear functions that we looked at earlier.

Practice questions

Question 3

A car company looked at the relationship between how much it had spent on advertising and the amount of sales each month over several months. The data has been plotted on the scatter graph and a line of best fit drawn. Two points on the line are $\left(3200,300\right)$(3200,300) and $\left(5600,450\right)$(5600,450).

Loading Graph...

  1. Using the two given points, what is the gradient of the line of best fit?

  2. The line of best fit can be written in the form $S=\frac{1}{16}A+b$S=116A+b, where $S$S is the value of Sales in thousands of dollars and $A$A is advertising expenditure.

    Determine the value of $b$b, the vertical intercept of the line.

  3. Use the line of best fit to estimate the number of sales next month (in dollars) if $\$4800$$4800 is to be spent on advertising.

Question 4

The table shows the number of people who went to watch a movie $x$x weeks after it was released.

Weeks ($x$x) $1$1 $2$2 $3$3 $4$4 $5$5 $6$6 $7$7
Number of people ($y$y) $17$17 $17$17 $13$13 $13$13 $9$9 $9$9 $5$5
  1. Plot the points from the table.

    Loading Graph...

  2. If a line of best fit were drawn to approximate the relationship, which of the following could be its equation?

    $y=-2x+20$y=2x+20

    A

    $y=2x+20$y=2x+20

    B

    $y=-2x$y=2x

    C

    $y=2x$y=2x

    D
  3. Graph the line of best fit whose equation is given by $y=-2x+20$y=2x+20.

    Loading Graph...

  4. Use the equation of the line of best fit to find the number of people who went to watch the movie $10$10 weeks after it was released.

 

Making predictions

Once we have our line of best fit, we're ready to start making predictions. Since our line is the best possible fit for the data we have, we can use it as a model to predict the likely value for the dependent variable based on a value for the independent variable that we'd like to predict for.

Interpolation means you have used an $x$x value in your prediction that is within the available range of data that you were working with. Suppose the $x$x values range between $35$35 and $98$98, so any $x$x value you choose within this range would be considered an interpolation.

Extrapolation means you have used an $x$x value in your prediction that is outside the available range of data. Suppose the $x$xvalues range between $35$35 and $98$98, then anything below $35$35 or above $98$98 would be considered an extrapolation.

It is important to recognise that there are limitations to interpolating and extrapolating depending on the context. It is dangerous to make predictions that are a fair way outside the range of data. Therefore, it is important that you consider the context of the variables and whether it is reasonable or realistic.

 

 

Outcomes

0607C11.8B

Straight line of best fit (by eye) through the mean on a scatter diagram.

0607E11.8B

Straight line of best fit (by eye) through the mean on a scatter diagram.

What is Mathspace

About Mathspace