topic badge
AustraliaVIC
VCE 12 General 2023

INVESTIGATION: Errors in extrapolation

Lesson

To extrapolate means to make predictions about results that would occur beyond those that can be justified by the experimental evidence.

Consider a statistical experiment in which the values of the response variable are dependent on the explanatory variables.  A linear regression between these variables has been calculated and it was found to have a strong correlation.  We would feel justified in making predictions about what will occur in further trials provided the explanatory variable takes values that are within the range of values already tested. Such predictions are called interpolations.

However, it is unsafe to make a prediction about what would occur in response to a value of the explanatory variable that is outside the experimentally verified range. Such a prediction would be called an extrapolation.

EXAMPLE 1

A gardener plants dwarf beans and observes that after germination they seem to grow at a steady rate. Conducting an experiment, the gardener again plants some seeds and after the first shoots appear, measures the heights of the plants daily over a period of eight days. The average height of the plants after each day is recorded to the nearest millimetre. The results are given in the table below.

 

average height (mm) - days 1 to 8

4 9 15 21 27 32 38 42

 

 

 

Displayed graphically, with a linear least squares line inserted, we have:

According to the linear regression, the height h as a function of the day number d is approximately h = -1.6+5.6d.

The formula predicts that after 14 days the dwarf bean plants should reach a height of 77 mm, and after 28 days they would reach 155 mm. These results, although they are extrapolations, are not hard to believe. But, if we go further from the experimental data and use the formula to predict the height of the plants after, say, 6 months, we would expect the plants to be over one metre tall, which is unlikely to be the case given the type of plant and the actual length of the growing season.

1) What are the reasons for not being confident in the formula to predict the height of the plants after 6 months?

2) If the maximum height of a dwarf bean plant is around 450 mm, what would the graph look life over a period of 6 months?

EXAMPLE 2 

In the wild, a female rabbit can have around 20 offspring a year. Suppose roughly 5 of these are females who themselves survive to maturity and begin to reproduce. Imagine that to a small island previously uninhabited by rabbits, a breeding pair is introduced. Ecologists then observe the above breeding pattern over a period of four years.

The number of female rabbits, from year-to-year is summarised by the sequence \left \{ 1, 5, 25, 125,... \right \}

which, in turn, may be summarised by the formula N=5^y where y is the number of years after the introduction and N is the number of female rabbits y  years later.

If you were to use this formula to predict the number of female rabbits that were living on the island after 3 years, you could be confident of your answer. However, if you wished to make a prediction about the number of female rabbits that would be living on the island after 10 years, the formula would give an impossible answer of close to one hundred million female rabbits.

Clearly, other factors must intervene to limit the rabbit population in the years beyond the four years in which the observation was carried out. The formula was correct for the period for which evidence was obtained but a more sophisticated formula is needed to model what is observed in subsequent years.

 

Question 1

Rather than using an exponential formula, plot the number of female rabbits each year on a graph and fit a least squares regression line. Use this line to predict the number of female rabbits living on the island after 10 years. 

Question 2

Is this a better prediction of the number of female rabbits living on the island after 10 years? Why/Why not?

Outcomes

U3.AoS1.26

calculate the coefficient of determination, 𝑟^2, and interpret in the context of the association being modelled and use the model to make predictions, being aware of the problem of extrapolation

U3.AoS1.25

use the least squares line of best fit to model and analyse the linear association between two numerical variables and interpret the model in the context of the association being modelled

What is Mathspace

About Mathspace