topic badge
iGCSE (2021 Edition)

18.11 Bivariate data analysis and scatter diagrams

Lesson

Bivariate data is the name for numerical data consisting of pairs of values. We generate these pairs to find out whether there is a simple relation between the numbers in each pair.

For example, we may conduct an experiment on a group of people where each person’s bone density is measured against their age. Their age is the input quantity and this could be any value. Their bone density is the level of response that is recorded against their age.

Then each person’s age and bone density make a pair of values in the bivariate data set.

The paired values in a bivariate data set are called the independent variable and the dependent variable. In the above context, the independent variable is the person’s age and the dependent variable is their bone density. We could then check whether age is a good predictor for bone density. In other words, we could determine whether bone density depends on a person’s age.

The strength of the relationship between the two variables is called correlation.

Displaying bivariate data

A single data point in a bivariate data set is written in the form $\left(x,y\right)$(x,y), with the first number $x$x being the independent variable and the second number $y$y being the dependent variable. We display bivariate data graphically by plotting the data points with the value of the independent variable on the horizontal axis and the value of the dependent variable on the vertical axis. This is known as a scatter diagram.

Worked example

Example 1

Scientists want to see how quickly a plant grows under controlled conditions. They start with ten seedlings of the same height and give each a different measure of weekly fertiliser. They then measure the height of the plants after $6$6 weeks and record the data in the table below.

Weekly amount of
fertiliser (in cups)
$1$1 cup $=250$=250 ml
$1$1 $2$2 $3$3 $4$4 $5$5 $6$6 $7$7 $8$8 $9$9 $10$10
Height (cm) $1.55$1.55 $2.32$2.32 $3.32$3.32 $4.51$4.51 $5.75$5.75 $6.91$6.91 $7.86$7.86 $8.58$8.58 $9.09$9.09 $9.43$9.43

Create a scatter diagram and describe the relationship between the two variables.

Think: We are interested in what happens to the height as the number of cups of fertiliser increases. In other words, the fertiliser explains the change in height. So fertiliser is the independent variable (plotted on the $x$x axis) and height is the dependent variable (plotted on the $y$y axis).

We can write these data points as ordered pairs, $\left(1,1.55\right),\left(2,2.32\right),\dots$(1,1.55),(2,2.32),

Do: To make a scatter diagram we plot each of the data points on a cartesian plane.

For example, to plot the first data point, $\left(1,1.55\right)$(1,1.55) we plot the point where $x=1$x=1 and $y=1.55$y=1.55.

We do this for every data point and we have our finished scatter diagram:

By creating this scatter diagram, we can more easily see the relationship between the number of cups of fertiliser and the height of the plant. 

When two variables have a relationship we say they correlate.

 

Interpreting bivariate data

When we have bivariate data, we want to determine what sort of relationship the two variables have. Just by observation, we may notice the following:

- A simple relationship: if the distribution of points appears to follow a trend either linear or non-linear depending on if the points appear to follow the shape of a line or not.

- Outliers: in a scatter diagram, any data points that are very different from the other data points will be quite obvious especially if the rest of the points appear to have a relationship.

- A complex correlation or no correlation: if the distribution of points in the scatter diagram does not follow a trend then it may suggest no correlation between the variables.

We will look at correlation in more detail in the next lesson.

Causal relationships

Even when two variables have a relationship, it may not be a causal relationship. We cannot say for sure that a change in the value of $x$x causes $y$y to change or that the value of $y$y causes a corresponding value of $x$x even when a relationship is apparent. It may be that both $x$x and $y$y have a relationship with some other hidden variable, which creates an indirect relationship between $x$x and $y$y.

Summary

Bivariate data - Data consisting of ordered pairs of two variables

Independent variable - A variable that is not determined by another variable. 

Dependent variable - A variable that is determined by some other variable. 

Data point - A value or ordered pair taken from a data set

Scatter diagram - A visualisation of bivariate data where ordered pairs are plotted on a number plane

Worked example

Example 2

Using the scatter diagram of the height of a plant versus fertiliser, describe the relationship between the height of the plant and the number of cups of fertiliser.

Think: Looking at the scatter diagram, the data points move from the bottom-left to the top-right. That is, for each addition of fertiliser, the plant is higher. We could draw an approximate line with a positive gradient that shows the general trend of the points.

Do: As the number of cups of fertiliser increases, the height of the plant also increases. 

 

Practice questions

Question 1

Create a scatter plot for the set of data in the table.

$x$x $1$1 $3$3 $5$5 $7$7 $9$9
$y$y $3$3 $7$7 $11$11 $15$15 $19$19
  1. Loading Graph...

Question 2

Scientists were looking for a relationship between the number of hours of sleep we receive and the effect it has on our motor and process skills. Some subjects were asked to sleep for different amounts of time, and were all asked to undergo the same driving challenge in which their reaction time was measured. The table shows the results, which are to be presented as a scatter plot.

Amount of sleep (hours) Reaction time (seconds)
$9$9 $3$3
$6$6 $3.3$3.3
$4$4 $3.5$3.5
$10$10 $3$3
$3$3 $3.7$3.7
$7$7 $3.2$3.2
$2$2 $3.85$3.85
$5$5 $3.55$3.55
  1. By moving the points, create a scatter plot for the observations in the table.

    Loading Graph...

  2. According to the results, which of the following is true of the relationship between amount of sleep and reaction time?

    As the amount of sleep decreases, the reaction time decreases.

    A

    As sleeping time decreases, reaction time improves.

    B

    Sleeping for longer improves reaction time.

    C

    The amount of sleep has no effect on the reaction time.

    D

Question 3

The market price of bananas varies throughout the year. Each month, a consumer group compared the average quantity of bananas supplied by each producer to the average market price (per unit).

Supply (kg) Price (dollars)
$550$550 $15.25$15.25
$600$600 $14.75$14.75
$650$650 $14.75$14.75
$700$700 $14.75$14.75
$750$750 $14.25$14.25
$800$800 $14.00$14.00
$850$850 $13.75$13.75
$900$900 $13.25$13.25
$950$950 $13.50$13.50
$1000$1000 $13.25$13.25
  1. Complete the scatter plot by adding the missing observations from the table.

    Loading Graph...

  2. Which best describes the relationship between the supply quantity and the market price of bananas?

    Positive linear

    A

    No direct relationship

    B

    Negative linear

    C
  3. According to this data, when would a supplier of bananas receive a higher price per banana?

    When very few bananas are available to be sold.

    A

    When the supply of bananas increases.

    B

Outcomes

0607C11.3E

Scatter diagrams

0607E11.3E

Scatter diagrams

What is Mathspace

About Mathspace