Regression Analysis

Lesson

Many statistical experiments compare measurements of pairs of variables. The aim is to see whether the two variables are related.

Examples of pairs of variables that might be studied include:

- time: number of fish in a river
- percentage of children vaccinated: number of disease cases recorded
- age of cars sold: prices obtained

Data collected as pairs of measurements can be displayed in a scatterplot. One variable is taken to be the *independent *or *explanatory *variable and its levels are measured along the horizontal axis. The other variable is called the *dependent *or *response *variable and its levels are measured along the vertical axis. The data points plotted on the plane defined by the axes may or may not show a trend or a relationship between the variables.

If the points are scattered without any clear pattern we conclude that no relationship has been detected between the variables.

If the points appear to lie close to a line, we conclude that a relationship probably exists. We insert the line that best fits the data and use it to make predictions about possible further observations.

The scatterplot above indicates a strong negative correlation between the variables. This means the dependent variable tends to decrease in a predictable way with increases in the independent variable.

In a well-designed experiment, a researcher is careful not to use the fitted line to make predictions about the response that would be observed to values of the independent variable that are outside the range of the values used in the experiment. For example, if in the experiment the smallest value of the independent variable was 10 and the largest 85, then it would be unwise to try to predict what the response would be when the independent variable was smaller than 10 or larger than 85.

To make such predictions beyond the range of the data is called *extrapolation *and is considered unsafe.

The usual mathematical method for fitting a line to a data set accurately is called *least squares regression. *A spreadsheet or a statistical application will do the calculations for this automatically.* *It is always possible to fit a line to a scatterplot, even when there is no genuine relationship between the variables. However, when making a judgment about whether or not a relationship exists we consider how *close *the data points are to the fitted line.

The data points illustrated in the graph above turned out to be based on the sale price of an item of goods measured against the age of the item.

The regression line can be used to predict that at 23 months the value of the goods will be approximately $\$1250$$1250. Considering the amounts by which the data points are above and below the regression line, it could easily happen that the estimated value of the goods at 23 months is $\$100$$100 too low or too high.

A dam used to supply water to the neighboring town had the following data recorded for its volume over a number of months.

Month | $1$1 | $2$2 | $3$3 | $4$4 |
---|---|---|---|---|

Volume (billions of litres) | $120$120 | $108$108 | $106$106 | $86$86 |

Given the data in the table, plot the points on the number plane.

Loading Graph...Which of the following lines best fits the data?

Loading Graph...ALoading Graph...BLoading Graph...CLoading Graph...DLoading Graph...ALoading Graph...BLoading Graph...CLoading Graph...D

The number of fish in a river is measured over a five year period.

The results are shown in the following table and plotted below.

Time in years ($t$t) |
$0$0 | $1$1 | $2$2 | $3$3 | $4$4 | $5$5 |
---|---|---|---|---|---|---|

Number of fish ($F$ |
$1903$1903 | $1994$1994 | $1995$1995 | $1602$1602 | $1695$1695 | $1311$1311 |

Loading Graph...

Which line best fits the data?

Loading Graph...ALoading Graph...BLoading Graph...CLoading Graph...DLoading Graph...ALoading Graph...BLoading Graph...CLoading Graph...DPredict the number of years until there are no fish left in the river.

Predict the number of fish remaining in the river after $7$7 years.

According to the line of best fit, how many years are there until there are $900$900 fish left in the river?

One litre of gas is raised to various temperatures and its pressure is measured.

The data has been graphed below with a line of best fit.

Temperature (K) | $300$300 | $302$302 | $304$304 | $308$308 | $310$310 |
---|---|---|---|---|---|

Pressure (Pa) | $2400$2400 | $2416$2416 | $2434$2434 | $2462$2462 | $2478$2478 |

Temperature (K) | $312$312 | $314$314 | $316$316 | $318$318 | $320$320 |

Pressure (Pa) | $2496$2496 | $2512$2512 | $2526$2526 | $2546$2546 | $2562$2562 |

Loading Graph...

The pressure was not recorded when the temperature was $306$306 K.

Is it reasonable to use the line of best fit to predict the pressure?

Yes

ANo

BYes

ANo

BPredict the pressure when the temperature is $306$306 K.

Within which range of temperatures is it reasonable to use the line of best fit to predict pressure?

$\left[300,320\right]$[300,320]

A$\left[300,600\right]$[300,600]

B$\left[0,320\right]$[0,320]

C$\left[280,340\right]$[280,340]

D$\left[300,320\right]$[300,320]

A$\left[300,600\right]$[300,600]

B$\left[0,320\right]$[0,320]

C$\left[280,340\right]$[280,340]

D

S7-2 Make inferences from surveys and experiments: A making informal predictions, interpolations, and extrapolations B using sample statistics to make point estimates of population parameters C recognising the effect of sample size on the variability of an estimate

Use statistical methods to make an inference