Regression Analysis

Lesson

In the sciences, and particularly in the biological sciences, researchers examine data to see whether values of one variable are related to values of another variable.

We speak of an *explanatory *or *treatment *variable and a corresponding *response *variable.

When the value of a response variable can be predicted (at least approximately) by the level of the explanatory variable, we say there is an *association *between the two variables. An association can be weak if the predicted response is likely to be only a very rough estimate, or it can be strong if the explanatory variable is expected to predict the response accurately.

When data consisting of pairs of numbers (values of the treatment and response variables) are displayed in a scatterplot with the help of a spreadsheet program or other software, the program can be asked to find the line of best fit or regression line. The program finds the formula for the regression line and it also calculates what is called an $R^2$`R`2-number that indicates the strength of the linear association between the two variables.

The $R^2$`R`2-number can be anywhere between $0$0 and $1$1. An $R^2$`R`2-number near $0$0 means the association is weak or absent. When the $R^2$`R`2-number is near $1$1, the association is strong.

An increase in the explanatory variable can be accompanied by a decrease in the response variable. This is referred to as a *negative *association. The following diagram illustrates a negative association.

Data pairs are shown in the columns on the left. These are plotted and the regression line, its formula and the $R^2$`R`2-number are provided.

Careful!

An *association *between two variables does not imply a causal relationship.

Without further evidence, we are not able to say that the explanatory or treatment variable *causes *the variation in the response variable. It may be that both variables are changing in response to some other hidden factor.

The term biometric data is used by businesses dealing with security to mean measurements of facial features, fingerprints, iris colour and pattern, and any other characteristics of a person relating to their identity.

The term is also used by the fitness industry when referring to a person's capacity in physical activities.

More broadly, biometric data concerns the application of mathematical or statistical theory to any measurements in biology, human or otherwise. Thus, biometric data appears in the biological sciences which include medical, veterinary and agricultural science.

To complete this learning activity, you should design and carry out your own investigation into a possible association between two biometric variables or between a biometric variable and a non-biometric variable.

There are many variables of the numerical kind that you could choose to study. Here are a few:

- Time taken for an activity
- Time taken for a physical change
- Height or weight of a person
- Length of a limb or digit
- Blood sugar level
- Blood alcohol level
- Heart rate
- Respiration rate
- Blood pressure
- Hearing acuity
- Growth rate of plants, animals or other living things
- Nutrient level
- Density of air pollution
- Length of daylight
- Temperature
- Ionising radiation level
- Altitude
- Rainfall

Some of these would need specialised techniques and measuring equipment for their collection but others could be studied more conveniently. You should add to this list if necessary and choose variables that are of interest to you.

S7-2 Make inferences from surveys and experiments: A making informal predictions, interpolations, and extrapolations B using sample statistics to make point estimates of population parameters C recognising the effect of sample size on the variability of an estimate

Use statistical methods to make an inference