topic badge

7.04 Lines of best fit

Lesson

When we display bivariate data that appears to have a linear relationship, we usually want to find a line that best models the relationship so we can see the trend and make predictions. We call this the line of best fit.

Exploration

We want to draw a line of best fit for the following scatterplot:

Let's try drawing three lines across the data and consider which is most appropriate.

We can tell straight away that $A$A is not the right line. This data appears to have a positive linear relationship, but $A$A has a negative gradient. $B$B has the correct sign for its gradient, and it passes through three points! However, there are many more points above the line than below it, and we should try to make sure the line of best fit passes through the centre of all the points. The means that line $C$C is the best fit for this data.

Least squares regression

Finding a line of best fit using technology is easy and very useful, and the technique we will discuss is called least squares regression. In practice you enter the bivariate data as a set of coordinate pairs, and the line of best fit will be calculated automatically. But what calculations is the calculator or computer performing?

First it calculates the vertical distances from each data point to a line. These distances are called residuals. Then it squares every residual and adds them all together to find a total. As the line changes, this total will increase or decrease. The line of best fit is the one which has the smallest total.

Try manipulating this applet to find the line of best fit:

Here is a set of bivariate data:

Here is a line that we will apply least squares regression to:

We find the residuals, square them, and add them together to get a total:

This total can be lowered by moving the line. The lowest total possible comes from this line, the line of best fit for the data:

Regression methods

Least squares regression is only one way of finding the line of best fit. Statisticians will sometimes use other methods depending on the shape of the data (including outliers) and other factors to find the line of best fit.

Summary

Line of best fit - The line which most closely models a set of bivariate data.

Residual - The vertical distance between a data point and a line modelling it.

Least squares regression - A technique for finding the line of best fit involving minimising the sum of the squares of the residuals.

 

Practice questions

Question 1

Draw an approximate line of best fit by hand for the scatterplot below.

  1. Loading Graph...

Question 2

Use technology to find the line of best fit for the data below. Write the equation with the coefficient and constant term to the nearest two decimal places.

$x$x $24$24 $37$37 $19$19 $31$31 $32$32 $22$22 $14$14 $30$30 $23$23 $40$40
$y$y $-7$7 $-8$8 $-3$3 $-6$6 $-9$9 $-8$8 $-2$2 $-8$8 $-8$8 $-12$12

Question 3

Engineers measure the positions of various projectiles ($x$x) in metres at different times ($t$t) in seconds.

They calculate the line of best fit to be $x=5.06t+48.93$x=5.06t+48.93.

Loading Graph...

  1. What was the average velocity of the projectiles?

  2. What was the average starting position of the projectiles?

Outcomes

MS2-12-2

analyses representations of data in order to make inferences, predictions and draw conclusions

MS2-12-7

solves problems requiring statistical processes, including the use of the normal distribution and the correlation of bivariate data

What is Mathspace

About Mathspace