Regression Analysis

NZ Level 7 (NZC) Level 2 (NCEA)

Errors in Extrapolations (Investigation)

Lesson

To extrapolate means to make predictions about results that would occur beyond those that can be justified by the experimental evidence.

Consider a statistical experiment in which it is assumed that the values taken by a variable that is being studied depend on values taken by an independent variable. Suppose a number of trials of the experiment have been carried out. We would feel justified in making predictions about what will occur in further trials provided the independent variable takes values that are within the range of values already tested. Such predictions are called *interpolations*.

However, it is unsafe to make a prediction about what would occur in response to a value of the independent variable that is outside the experimentally verified range. Such a prediction would be called an *extrapolation*.

A gardener plants dwarf beans and observes that after germination they seem to grow at a steady rate. Conducting an experiment, the gardener again plants some seeds and after the first shoots appear, measures the heights of the plants daily over a period of eight days. The average height of the plants after each day is recorded to the nearest millimetre. The results are given in the table below.

$4$4 | $9$9 | $15$15 | $21$21 | $27$27 | $32$32 | $38$38 | $42$42 |

Displayed graphically, with a linear trend-line inserted, we have:

According to the linear regression, the height $h$`h` as a function of the day number $d$`d` is approximately $h=5.6d-1.6$`h`=5.6`d`−1.6.

The formula predicts that after $14$14 days the dwarf bean plants should reach a height of $77$77 mm, and after $28$28 days they would reach $155$155 mm. These results, although they are extrapolations, are not hard to believe. But, if we go further from the experimental data and use the formula to predict the height of the plants after, say, $6$6 months, we would expect the plants to be over one metre tall, which is unlikely to be the case given the type of plant and the actual length of the growing season.

In the wild, a female rabbit lives for about a year on average and in that time can have around $20$20 offspring. Suppose roughly $5$5 of these are females who themselves survive to maturity and begin to reproduce. Imagine that to a small island previously uninhabited by rabbits, a breeding pair is introduced. Ecologists then observe the above breeding pattern over a period of four years.

The number of female rabbits, from year-to-year is summarised by the sequence $\left\{1,5,25,125,...\right\}${1,5,25,125,...}

which, in turn, may be summarised by the formula $N=5^y$`N`=5`y` where $y$`y` is the number of years after the introduction and $N$`N` is the number of female rabbits $y$`y` years later.

If you were to use this formula to predict the number of female rabbits that were living on the island after $30$30 months, you could be confident of your answer. However, if you wished to make a prediction about the number of female rabbits that would be living on the island after 10 years, the formula would give an impossible answer of close to one hundred million female rabbits.

Clearly, other factors must intervene to limit the rabbit population in the years beyond the four years in which the observation was carried out. The formula was correct for the period for which evidence was obtained but a more sophisticated formula is needed to model what is observed in subsequent years.

S7-2 Make inferences from surveys and experiments: A making informal predictions, interpolations, and extrapolations B using sample statistics to make point estimates of population parameters C recognising the effect of sample size on the variability of an estimate

Use statistical methods to make an inference