Bivariate data is the technical name for numerical data consisting of pairs of values where the question of interest is whether there is a simple relation between the numbers in each pair.
We may, for example, conduct an experiment in which an input quantity is varied and the level of a response is recorded for each input level. The experimenter would then check the results to see whether the level of the input is a predictor of the level of the output.
The paired values in a bivariate data set are called the explanatory variable and the response variable. They may also be called, in more mathematical language, the independent variable and the dependent variable. In this context, the word independent means that the variable can be varied freely. The dependent variable changes in response to the chosen level of the independent variable.
It is usual when displaying bivariate data graphically, to plot the data points with the value of the independent variable on the horizontal axis and the value of the dependent variable on the vertical axis.
A single data point in a bivariate data set might be written in the form $\left(a,b\right)$(a,b) and it would be understood that $a$a is the explanatory variable and $b$b is the response variable. It should be recognised, however, that there may not be a real causal relationship between the two variables.
The experiment or observational study that produced the bivariate data set may have been designed with the aim of looking for a correspondence between two variables but it is not valid to conclude that changes in the value of $a$a cause $b$b to change or that the value of $b$b causes a corresponding value of $a$a even when a relation is apparent. It may be that both $a$a and $b$b depend on some other hidden variable.
A plot of a bivariate data set may reveal a linear relation between two variables or a non-linear relation. The plot may show that the relation is strong, if the data points lie close to a line or simple curve, or weak otherwise. The plot may also show the absence of a relation between the two variables if the data points appear to be scattered in an unsystematic way.
Mathematical techniques exist for finding the line or curve that best fits a bivariate data set. If the relation is strong, then the line or curve of best fit can be used as a model of the behaviour of the system under investigation.
Consider the following variables:
Which of the following statements makes sense?
Which is the dependent variable and which is the independent variable?
For the following sets of axes, which have the variables placed in the correct position? Select all the correct options.
The scatter plot shows the relationship between sea temperature and the amount of healthy coral.
Which variable is the dependent variable?
Which variable is the independent variable?
Plan and conduct investigations using the statistical enquiry cycle: A justifying the variables and measures used B managing sources of variation, including through the use of random sampling C identifying and communicating features in context (trends, relationships between variables, and differences within and between distributions), using multiple displays D making informal inferences about populations from sample data E justifying findings, using displays and measures.
Investigate bivariate numerical data using the statistical enquiry cycle