8. Two Variable Data Analysis

Ontario Grade 9 (MTH1W) 2021 Edition

Investigation: Errors in extrapolation

Lesson

To extrapolate means to make predictions about results that would occur beyond those that can be justified by the experimental evidence.

Consider a statistical experiment in which the values of the dependent variable are dependent on the independent variables. A linear regression between these variables has been calculated and it was found to have a strong correlation. We would feel justified in making predictions about what will occur in further trials provided the independent variable takes values that are within the range of values already tested. Such predictions are called *interpolations*.

However, it is unsafe to make a prediction about what would occur in response to a value of the independent variable that is outside the experimentally verified range. Such a prediction would be called an *extrapolation*.

A gardener plants dwarf beans and observes that after germination they seem to grow at a steady rate. Conducting an experiment, the gardener again plants some seeds and after the first shoots appear, measures the heights of the plants daily over a period of eight days. The average height of the plants after each day is recorded to the nearest millimetre. The results are given in the table below.

$4$4 | $9$9 | $15$15 | $21$21 | $27$27 | $32$32 | $38$38 | $42$42 |

Displayed graphically, with a linear least squares line inserted, we have:

According to the linear regression, the height $h$`h` as a function of the day number $d$`d` is approximately $h=-1.6+5.6d$`h`=−1.6+5.6`d`.

The formula predicts that after $14$14 days the dwarf bean plants should reach a height of $77$77 mm, and after $28$28 days they would reach $155$155 mm. These results, although they are extrapolations, are not hard to believe. But, if we go further from the experimental data and use the formula to predict the height of the plants after, say, $6$6 months, we would expect the plants to be over one metre tall, which is unlikely to be the case given the type of plant and the actual length of the growing season.

1) What are the reasons for not being confident in the formula to predict the height of the plants after $6$6 months?

2) If the maximum height of a dwarf bean plant is around $450$450 mm, what would the graph look life over a period of $6$6 months?

In the wild, a female rabbit can have around $20$20 offspring a year. Suppose roughly $5$5 of these are females who themselves survive to maturity and begin to reproduce. Imagine that to a small island previously uninhabited by rabbits, a breeding pair is introduced. Ecologists then observe the above breeding pattern over a period of four years.

The number of female rabbits, from year-to-year is summarised by the sequence $\left\{1,5,25,125,...\right\}${1,5,25,125,...}

which, in turn, may be summarised by the formula $N=5^y$`N`=5`y` where $y$`y` is the number of years after the introduction and $N$`N` is the number of female rabbits $y$`y` years later.

If you were to use this formula to predict the number of female rabbits that were living on the island after $3$3 years, you could be confident of your answer. However, if you wished to make a prediction about the number of female rabbits that would be living on the island after $10$10 years, the formula would give an impossible answer of close to one hundred million female rabbits.

Clearly, other factors must intervene to limit the rabbit population in the years beyond the four years in which the observation was carried out. The formula was correct for the period for which evidence was obtained but a more sophisticated formula is needed to model what is observed in subsequent years.

Rather than using an exponential formula, plot the number of female rabbits each year on a graph and fit a least squares regression line. Use this line to predict the number of female rabbits living on the island after 10 years.

Is this a better prediction of the number of female rabbits living on the island after 10 years? Why/Why not?

Create a scatter plot to represent the relationship between two variables, determine the correlation between these variables by testing different regression models using technology, and use a model to make predictions when appropriate.

Describe the value of mathematical modelling and how it is used in real life to inform decisions.

Report how the model can be used to answer the question of interest, how well the model fits the context, potential limitations of the model, and what predictions can be made based on the model.