When we make predictions using a line of best we first of all consider if the points appear to lie close to the line. If there is a strong linear relationship then we conclude that a relationship probably exists and it is appropriate to use the line of best fit for predictions.
For example, if in the experiment the smallest value of the independent variable ($x$x values) was $10$10 and the largest $85$85, then it would be considered interpolation and therefore reliable to predict using values from $10$10 to $85$85 (this can also be written as $[10,85]$[10,85] ). It would be considered unwise to try to predict what the response would be when the independent variable was smaller than $10$10 or larger than $85$85 as this would be extrapolation.
The bivariate data set on the right has generated a line of best fit, and the range of the $x$x-values has been highlighted. Making predictions within this range is interpolation, and making predictions outside this range is extrapolation. |
Interpolation means you have used an $x$x value in your prediction that is within the range of $x$x values in the data that you were working with. It is considered safe or reliable.
Extrapolation means you have used an $x$x value in your prediction that is outside the range of $x$x values in the data. It is considered unsafe or unreliable.
The scatter plots below are annotated to show examples of interpolation (left) and extrapolation (right) from the line of best fit.
The data points illustrated in the graph below show the sale price of an item of goods measured against the age of the item. The green line represents the line of best fit.
(a) Predict the value of the goods at $23$23 months.
Think: Draw a line up from $23$23 months to the line of best fit, then across to the vertical axis.
Do: Using the horizontal red line we can predict that at $23$23 months the value of the goods will be approximately $\$1250$$1250.
(b) Comment on whether this predicted value can be considered reliable.
Think: Interpolated results are reliable, extrapolated results are not.
Do: As $23$23 lies within the given set of independent variables (the values on the horizontal axis go from approximately $6$6 to $33$33) the prediction is interpolated and can be considered reliable.
A prediction for the $y$y-value when $x=5$x=5 is made from the data set below.
Is the prediction an extrapolation or an interpolation?
$x$x | $4$4 | $7$7 | $8$8 | $11$11 | $12$12 | $13$13 | $17$17 | $18$18 | $19$19 | $20$20 |
---|---|---|---|---|---|---|---|---|---|---|
$y$y | $0$0 | $2$2 | $4$4 | $7$7 | $6$6 | $4$4 | $8$8 | $8$8 | $11$11 | $8$8 |
Extrapolation
Interpolation
One litre of gas is raised to various temperatures and its pressure is measured.
The data has been graphed below with a line of best fit.
Temperature (K) | $300$300 | $302$302 | $304$304 | $308$308 | $310$310 |
---|---|---|---|---|---|
Pressure (Pa) | $2400$2400 | $2416$2416 | $2434$2434 | $2462$2462 | $2478$2478 |
Temperature (K) | $312$312 | $314$314 | $316$316 | $318$318 | $320$320 |
Pressure (Pa) | $2496$2496 | $2512$2512 | $2526$2526 | $2546$2546 | $2562$2562 |
The pressure was not recorded when the temperature was $306$306 K.
Is it reasonable to use the line of best fit to predict the pressure?
Yes
No
Predict the pressure when the temperature is $306$306 K.
Within which range of temperatures is it reasonable to use the line of best fit to predict pressure?
$\left[300,320\right]$[300,320]
$\left[300,600\right]$[300,600]
$\left[0,320\right]$[0,320]
$\left[280,340\right]$[280,340]
It's possible to find the equation of the line of best fit. We call the dependent variable $y$y and the independent variable $x$x . We can then use the equation of the line of best fit (sometimes called a linear regression line) to predict $y$y-values from given $x$x-values.
A hobby store records the age of their customers, $x$x, along with the amount of money the customer spends during their visit, $y$y, and generates a bivariate data set. The line of best fit for the data set is found to be
$y=2x-30$y=2x−30
According to this model, how much money should they expect a $25$25-year old to spend in a single visit?
Think: We need to find a $y$y-value using $x=25$x=25, so we will substitute $x=25$x=25 into the equation and solve for $y$y.
Do: When $x=25$x=25, $y=2\times25-30=20$y=2×25−30=20, so the store should expect a $25$25-year old to spend $\$20$$20.
The owner and operator of an online store selling custom computer keyboards keeps track of the number of keyboards she makes, $x$x, and how much profit she makes in dollars, $y$y, each week for several months. The fewest keyboards she made in any of the weeks was $2$2, and the most she made was $5$5. The bivariate data set has a line of best fit given by $y=400x-650$y=400x−650
(a) If she makes $3$3 keyboards next week, how much profit does this model predict she should expect? How confident should she be in this prediction?
Solution: When $x=3$x=3, $y=400\times3-650$y=400×3−650$=$=$\$550$$550.
Since $3$3 is between $2$2 and $5$5 this prediction is an interpolation and she should be confident in this prediction.
(b) If she makes $20$20 keyboards the week after, how much profit does this model predict she should expect? How confident should she be in this prediction?
Solution: When $x=20$x=20, $y=400\times20-650$y=400×20−650$=$=$\$7350$$7350.
Since $20$20 is not between $2$2 and $5$5, this prediction is an extrapolation and she should not be confident about this prediction.
A bivariate data set has a line of best fit with equation $y=-8.71x+6.79$y=−8.71x+6.79.
Predict the value of $y$y when $x=3.49$x=3.49.
The number of fish in a river is measured over a five year period.
The results are shown in the following table and plotted below.
Time in years ($t$t) | $0$0 | $1$1 | $2$2 | $3$3 | $4$4 | $5$5 |
---|---|---|---|---|---|---|
Number of fish ($F$F) |
$1903$1903 | $1998$1998 | $1900$1900 | $1517$1517 | $1693$1693 | $1408$1408 |
Which line best approximates the data?
Use this line to predict the number of years until there are no fish left in the river.
Now predict the number of fish remaining in the river after $7$7 years.
Predict how long it will be before there are $900$900 fish left in the river.
A car company looked at the relationship between how much it had spent on advertising and the amount of sales each month over several months. The data has been plotted on the scatter graph and a line of best fit drawn. Two points on the line are $\left(3200,300\right)$(3200,300) and $\left(5600,450\right)$(5600,450).
Using the two given points, what is the gradient of the line of best fit?
The line of best fit can be written in the form $S=\frac{1}{16}A+b$S=116A+b, where $S$S is the value of Sales in thousands of dollars and $A$A is advertising expenditure.
Determine the value of $b$b, the vertical intercept of the line.
Use the line of best fit to estimate the number of sales next month (in dollars) if $\$4800$$4800 is to be spent on advertising.
Several cars underwent a brake test and their age was measured against their stopping distance. The scatter plot shows the results and a line of best fit that approximates the positive correlation.
According to the line, what is the stopping distance of a car that is $6$6 years old?
Using the two points that lie on the line, determine the gradient of the line of best fit.
Assuming the line of best fit is in the form $y=mx+b$y=mx+b, determine the value of $b$b, the vertical intercept of the line.
Use the line of best fit to estimate the stopping distance of a car that is $7.5$7.5 years old.
Is the estimation in the previous part an example of interpolation or extrapolation?
Interpolation
Extrapolation
Is the predicted value in part (d) reliable or unreliable?
Reliable
Unreliable