Bivariate data is the name for numerical data consisting of pairs of values. We generate these pairs to find out whether there is a simple relation between the numbers in each pair.
For example, we may conduct an experiment on a group of people where each person’s bone density is measured against their age. Their age is the input quantity and this could be any value. Their bone density is the level of response that is recorded against their age.
Then each person’s age and bone density make a pair of values in the bivariate data set. So the two variables are age and bone density, which is why it is bivariate data rather than univariate data where there is just one variable that we are investigating. For instance, if we were just recording people's ages then there would only be one variable and the data would be univariate data.
The paired values in a bivariate data set are called the independent variable and the dependent variable. They may also be called the independent variable and the dependent variable. In the above context, the independent variable is the person’s age and the dependent variable is their bone density. We could then check whether age is a good predictor for bone density. In other words, we could determine whether bone density depends on a person’s age.
A single data point in a bivariate data set is written in the form $\left(x,y\right)$(x,y), with the first number $x$x being the independent variable and the second number $y$y being the dependent variable. We display bivariate data graphically by plotting the data points with the value of the independent variable on the horizontal axis and the value of the dependent variable on the vertical axis. This is known as a scatterplot.
Scientists want to see how quickly a plant grows under controlled conditions. They start with ten seedlings of the same height and give each a different measure of weekly fertiliser. They then measure the height of the plants after $6$6 weeks and record the data in the table below.
Weekly amount of fertiliser (in cups) $1$1 cup $=250$=250 ml |
$1$1 | $2$2 | $3$3 | $4$4 | $5$5 | $6$6 | $7$7 | $8$8 | $9$9 | $10$10 |
---|---|---|---|---|---|---|---|---|---|---|
Height (cm) | $1.55$1.55 | $2.32$2.32 | $3.32$3.32 | $4.51$4.51 | $5.75$5.75 | $6.91$6.91 | $7.86$7.86 | $8.58$8.58 | $9.09$9.09 | $9.43$9.43 |
Create a scatterplot and describe the relationship between the two variables.
Think: We are interested in what happens to the height as the number of cups of fertiliser increases. In other words, the fertiliser explains the change in height. So fertiliser is the independent variable (plotted on the $x$x axis) and height is the dependent variable (plotted on the $y$y axis).
We can write these data points as ordered pairs, $\left(1,1.55\right),\left(2,2.32\right),\dots$(1,1.55),(2,2.32),…
Do: Writing the data points as ordered pairs doubles as writing them as coordinates on a scatterplot. To make a scatterplot we plot each of the data points on a number plane.
For example, to plot the first data point, $\left(1,1.55\right)$(1,1.55) we plot the point where $x=1$x=1 and $y=1.55$y=1.55.
We do this for every data point and we have our finished scatterplot with our horizontal axis labelled with the independent variable and the vertical axis labelled with the dependent variable.
By creating the scatterplot, we can more easily see the relationship between the number of cups of fertiliser and the height of the plant.
When we have bivariate data, we want to determine what sort of relationship the two variables have. Broadly speaking, there are three possibilities.
If the data points move from the bottom-left to the top-right of the scatterplot, then the dependent variable is increasing as the independent variable increases. | If the data points move from the top-left to the bottom-right, then the dependent variable is decreasing as the independent variable increases. |
Even when two variables have a relationship, it may not be a causal relationship. We cannot say for sure that a change in the value of $x$x causes $y$y to change or that the value of $y$y causes a corresponding value of $x$x even when a relation is apparent. It may be that both $x$x and $y$y have a relationship with some other hidden variable, which creates an indirect relationship between $x$x and $y$y.
Bivariate data - Data consisting of ordered pairs of two variables
Independent variable - A variable which is not determined by another variable. Also called an independent variable
Dependent variable - A variable which is determined by some other variable. Also called a dependent variable
Data point - A value or ordered pair taken from a data set
Scatterplot - A visualisation of bivariate data where ordered pairs are plotted on a number plane
Using the scatterplot of the created in example one, describe the relationship between the height of a plant and the amount of fertiliser given.
Think: Looking at the scatterplot, the data points move from the bottom-left to the top-right. So as the independent variable is increasing the dependent variable is also increasing.
Do: As the number of cups of fertiliser increases, the height of the plant also increases.
Consider the following variables:
Which of the following statements makes sense?
The temperature affects the number of ice cream cones sold.
The number of ice cream cones sold affects the temperature.
Which is the dependent variable and which is the independent variable?
The independent variable is the temperature and the dependent variable is the number of ice cream cones sold.
The independent variable is the number of ice cream cones sold and the dependent variable is temperature.
A student was performing an experiment to study the relationship between the current and voltage through a resistor. He noted his results in the following table.
Current ($x$x) | $1$1 | $2$2 | $3$3 | $4$4 | $5$5 | $6$6 | $7$7 | $8$8 | $9$9 | $10$10 |
Voltage ($y$y) | $5$5 | $14$14 | $15$15 | $26$26 | $33$33 | $34$34 | $44$44 | $47$47 | $57$57 | $57$57 |
Plot the data from the table on the graph below.
A cafe records the number of soups sold and the daily maximum temperature.
Describe the relationship between temperature and soups sold in the data.
As temperature increases, soups sold decreases.
As temperature increases, soups sold increases.
Temperature has no effect on soups sold.
Scientists were looking for a relationship between the number of hours of sleep we receive and the effect it has on our motor and process skills. Some subjects were asked to sleep for different amounts of time, and were all asked to undergo the same driving challenge in which their reaction time was measured. The table shows the results, which are to be presented as a scatter plot.
Amount of sleep (hours) | Reaction time (seconds) |
---|---|
$9$9 | $3$3 |
$6$6 | $3.3$3.3 |
$4$4 | $3.5$3.5 |
$10$10 | $3$3 |
$3$3 | $3.7$3.7 |
$7$7 | $3.2$3.2 |
$2$2 | $3.85$3.85 |
$5$5 | $3.55$3.55 |
By moving the points, create a scatter plot for the observations in the table.
According to the results, which of the following is true of the relationship between amount of sleep and reaction time?
As the amount of sleep decreases, the reaction time decreases.
As sleeping time decreases, reaction time improves.
Sleeping for longer improves reaction time.
The amount of sleep has no effect on the reaction time.