Bivariate Data

Lesson

Bivariate data is the technical name for numerical data consisting of pairs of values where the question of interest is whether there is a simple relation between the numbers in each pair.

We may, for example, conduct an experiment in which an input quantity is varied and the level of a response is recorded for each input level. The experimenter would then check the results to see whether the level of the input is a predictor of the level of the output.

The paired values in a bivariate data set are called the *explanatory variable* and the *response variable*. They may also be called, in more mathematical language, the in*dependent variable * and the *dependent variable. *In this context, the word *independent * means that the variable can be varied freely. The *dependent *variable changes in response to the chosen level of the independent variable.

It is usual when displaying bivariate data graphically, to plot the data points with the value of the independent variable on the horizontal axis and the value of the dependent variable on the vertical axis.

A single data point in a bivariate data set might be written in the form $\left(a,b\right)$(`a`,`b`) and it would be understood that $a$`a` is the explanatory variable and $b$`b` is the response variable. It should be recognised, however, that there may not be a real causal relationship between the two variables.

The experiment or observational study that produced the bivariate data set may have been designed with the aim of looking for a correspondence between two variables but it is not valid to conclude that changes in the value of $a$`a` *cause *$b$`b` to change or that the value of $b$`b` causes a corresponding value of $a$`a` even when a relation is apparent. It may be that both $a$`a` and $b$`b` depend on some other hidden variable.

A plot of a bivariate data set may reveal a linear relation between two variables or a non-linear relation. The plot may show that the relation is strong, if the data points lie close to a line or simple curve, or weak otherwise. The plot may also show the absence of a relation between the two variables if the data points appear to be scattered in an unsystematic way.

Mathematical techniques exist for finding the line or curve that best fits a bivariate data set. If the relation is strong, then the line or curve of best fit can be used as a model of the behaviour of the system under investigation.

Consider the following variables:

- Number of ice cream cones sold
- Temperature (°C)

Which of the following statements makes sense?

The temperature affects the number of ice cream cones sold.

AThe number of ice cream cones sold affects the temperature.

BWhich is the dependent variable and which is the independent variable?

The independent variable is the temperature and the dependent variable is the number of ice cream cones sold.

AThe independent variable is the number of ice cream cones sold and the dependent variable is temperature.

B

For the following sets of axes, which have the variables placed in the correct position?

- ABCDE

The scatter plot shows the relationship between sea temperature and the amount of healthy coral.

Which variable is the dependent variable?

Sea temperature

ALevel of healthy coral

BWhich variable is the independent variable?

Sea temperature

ALevel of healthy coral

B