When we display bivariate data that appears to have a linear relationship, we usually want to find a line that best models the relationship so we can see the trend and make predictions. We call this the line of best fit.
We want to draw a line of best fit for the following scatter diagram:
Let's try drawing three lines across the data and consider which is most appropriate.
The average monthly temperature and the average wind speed in a particular location was plotted over several months. The graph shows the points for each month’s data and their line of best fit.
Use the line of best fit to approximate the wind speed on a day when the temperature is $5$5°C.
To calculate the equation of a given line of best fit, we need to be able to calculate:
gradient: $m=\frac{y_2-y_1}{x_2-x_1}$m=y2−y1x2−x1
vertical intercept: this is the $c$c term in $y=mx+c$y=mx+c
Once you can identify these features, you can use them to make conclusions and predictions about the data.
There are different ways to calculate the equation of a straight line. These include:
Since lines of best fit are used in real statistical analyses, graphing them is similar to other linear functions that we looked at earlier.
A car company looked at the relationship between how much it had spent on advertising and the amount of sales each month over several months. The data has been plotted on the scatter graph and a line of best fit drawn. Two points on the line are $\left(3200,300\right)$(3200,300) and $\left(5600,450\right)$(5600,450).
Using the two given points, what is the gradient of the line of best fit?
The line of best fit can be written in the form $S=\frac{1}{16}A+b$S=116A+b, where $S$S is the value of Sales in thousands of dollars and $A$A is advertising expenditure.
Determine the value of $b$b, the vertical intercept of the line.
Use the line of best fit to estimate the number of sales next month (in dollars) if $\$4800$$4800 is to be spent on advertising.
The table shows the number of people who went to watch a movie $x$x weeks after it was released.
Weeks ($x$x) | $1$1 | $2$2 | $3$3 | $4$4 | $5$5 | $6$6 | $7$7 |
Number of people ($y$y) | $17$17 | $17$17 | $13$13 | $13$13 | $9$9 | $9$9 | $5$5 |
Plot the points from the table.
If a line of best fit were drawn to approximate the relationship, which of the following could be its equation?
$y=-2x+20$y=−2x+20
$y=2x+20$y=2x+20
$y=-2x$y=−2x
$y=2x$y=2x
Graph the line of best fit whose equation is given by $y=-2x+20$y=−2x+20.
Use the equation of the line of best fit to find the number of people who went to watch the movie $10$10 weeks after it was released.
Once we have our line of best fit, we're ready to start making predictions. Since our line is the best possible fit for the data we have, we can use it as a model to predict the likely value for the dependent variable based on a value for the independent variable that we'd like to predict for.
Interpolation means you have used an $x$x value in your prediction that is within the available range of data that you were working with. Suppose the $x$x values range between $35$35 and $98$98, so any $x$x value you choose within this range would be considered an interpolation.
Extrapolation means you have used an $x$x value in your prediction that is outside the available range of data. Suppose the $x$xvalues range between $35$35 and $98$98, then anything below $35$35 or above $98$98 would be considered an extrapolation.
It is important to recognise that there are limitations to interpolating and extrapolating depending on the context. It is dangerous to make predictions that are a fair way outside the range of data. Therefore, it is important that you consider the context of the variables and whether it is reasonable or realistic.