topic badge
AustraliaVIC
VCE 12 General 2023

2.05 Correlation and r value

Lesson

Introduction

As we have seen, a correlation is a way of expressing a relationship between two variables - in particular, how strongly pairs of variables are related.

In this chapter we will be looking at linear relationships and measuring the strength of a linear correlation between variables by a quantity called the Pearson correlation coefficient (or just correlation coefficient). This coefficient is given the symbol r, and takes a value between -1 and +1.

Pearson's correlation coefficient

The correlation coefficient is a value that describes the strength and direction of a linear relationship between two variables.

The value of the correlation coefficient varies from -1 to +1, where -1 describes perfect negative correlation and +1 describes perfect positive correlation. Any other type of correlation corresponds to a value between these two extremes, with 0 describing no correlation.

A number line from -1 to 1 describing correlation. Ask your teacher for more information.

We further divide up this range of values to indicate other strengths of correlation, using descriptions of weak, moderate and strong (for both positive and negative correlations).

If the correlation coefficient takes a value between 0 and 1, then it describes a positive correlation:

  • A value of r close to +1 indicates a strong positive linear correlation

  • A value of r that is positive but closer to 0 indicates a weak positive linear correlation

If the correlation coefficient takes a value between -1 and 0, then it describes a negative correlation:

  • A value of r close to -1 indicates a strong negative linear correlation

  • A value of r that is negative but closer to 0 indicates a weak negative linear correlation

If the correlation coefficient is 0, or very close to 0, it indicates that there is no linear correlation between the variables. This may be because the variables are unrelated, or it might be that they have a non-linear relation instead.

Even when two variables have a strong relationship and r is close to 1 or -1, we cannot say that one variable causes change in the other variable. If asked "does change in the explanatory variable cause change in response variable?" we always write "No - correlation is not causation".

For example, it has been shown that there is a strong, positive, linear relationship between sunglasses sold and ice-cream cone sales. But we cannot say that sunglasses sales cause ice-cream cone sales. There is a third variable at work here; increase in temperature causes both variables to also increase. Increase in temperature is called a confounding variable.

Coincidence is also a plausible reason an association occurs. It's possible to find variables with unlikely strong correlations. For example, per capita consumption of cheese and deaths from being strangled by a bedsheet have been shown to have a strong correlation. But we cannot say that one causes the other. It is a coincidence. A website containing graphs of variables with spurious correlations can be found here.

Exploration

Explore this applet to see how the correlation coefficient changes. Move each point. Try having one outlier and see how much that can change the correlation coefficient. Try moving the points so they are in a perfect straight line. What happens to the correlation coefficient value?

Loading interactive...

The closer the points are to being in a straight line, the closer r is to 1 or -1.

If the points are trending upwards from left to right, the correlation coefficient is positive. If the points are trending downwards from left to right, the correlation coefficient is negative.

Examples

Example 1

Identify the correlation between the temperature and the number of heaters sold.

A
A positive correlation
B
A negative correlation
C
No correlation
Worked Solution
Create a strategy

Consider the number of heaters being sold when the temperature increases or decreases.

Apply the idea

When the temperature increases, then it is unlikely that more heaters are to be sold. On the other hand, when the temperature decreases, then more heaters are likely to be sold.

This means that there is a negative correlation between the temperature and the number of heaters sold as more heaters are likely to be sold when it is cold. So, the correct answer is Option B.

Example 2

A study found a strong correlation between the approximate number of pirates out at sea and the average world temperature.

a

Does this mean that the number of pirates out at sea has an impact on world temperature?

A
Yes
B
No
Worked Solution
Create a strategy

Think of a reasonable argument for the direct relationship between these two variables.

Apply the idea

There is no direct relationship between the number of pirates out at sea and the average world temperature as the number of pirates out at sea is logically not going to affect the world temperature. So, the correct answer is Option B.

b

Which of the following is the most likely explanation for the strong correlation?

A
Contributing variables - there are other causal relationships and variables that come in to play and these may lead to an indirect positive association between phrase.
B
Coincidence - there are no other contributing factors or reasonable arguments to be made for the strong positive association between phrase.
Worked Solution
Create a strategy

Think of any common factor or variable that may have a direct relationship with both the approximate number of pirates out at sea and the average world temperature.

Apply the idea

Since the number of pirates out at sea does not affect the world temperature, then there are no other contributing factors or reasonable arguments to explain the strong correlation between these two variables. This means that this is a coincidence and the correct answer is Option B.

c

Which of the following is demonstrated by the strong correlation between the approximate number of pirates out at sea and the average world temperature?

A
If there is correlation between two variables, then there must be causation.
B
If there is correlation between two variables, there isn't necessarily causation.
C
If there is correlation between two variables, then there is no causation.
Worked Solution
Create a strategy

Use the answers found on part (a) and (b).

Apply the idea

There is a strong correlation between these two variables, but there is no reasonable argument to suggest that one causes the other. This means that we attribute the strong correlation between the number of pirates out at sea and the world temperature to coincidence. So, the correct answer is Option B.

Idea summary

If the correlation coefficient takes a value between 0 and 1, then it describes a positive correlation:

  • A value of r close to +1 indicates a strong positive linear correlation

  • A value of r that is positive but closer to 0 indicates a weak positive linear correlation

If the correlation coefficient takes a value between -1 and 0, then it describes a negative correlation:

  • A value of r close to -1 indicates a strong negative linear correlation

  • A value of r that is negative but closer to 0 indicates a weak negative linear correlation

If the correlation coefficient is 0, or very close to 0, it indicates that there is no linear correlation between the variables.

Calculation of correlation coefficient r

The calculation required to determine r is very tricky to do by hand, but can be easily done using technology. To do so, we enter the raw data into two separate lists, then perform a linear regression analysis. This will calculate a number of values, though the only one we are interested in right now is the r value.

Examples

Example 3

For the graph depicted, choose the correlation coefficient that best represents it.

5
10
15
20
x
-5
5
10
15
20
y
A
1
B
0
C
-1
D
-0.64
Worked Solution
Create a strategy

Check if you could fit a straight line to the points plotted.

Apply the idea

Based on the graph, a straight line cannot be fitted on the plotted points. This means that the two variables in data set are not linearly correlated. So, we can say that the correlation coefficient is 0, and the correct answer is Option C.

Example 4

Given the following data:

x14710131619
y44.254.554.44.454.754.2
a

Calculate the correlation coefficient and give your answer to two decimal places.

Worked Solution
Create a strategy

Use technology to find the correlation coefficient.

Apply the idea

Using the Statistics mode in your calculator, enter each x-value along with its y-value into a data table on your calculator then find the linear regression.

Look for the correlation coefficient (r):r=0.47

b

Choose the best description of this correlation.

A
Moderate negative
B
Strong positive
C
Weak negative
D
Moderate positive
E
Strong negative
F
Weak positive
Worked Solution
Create a strategy

Use the figure below to identify the best description of the correlation:

A number line showing values and descriptions of correlation from negative 1 to 1. Ask your teacher for more information.
Apply the idea

The value of r=0.47 is positive, and between 0 and 0.5. So there is a weak positive correlation between the variables.

The correct answer is F.

Idea summary

The correlation coefficient, r, tells us the strength and direction of the correlation between two variables.

If r is negative the direction of the correlation is negative. If r is positive the direction of the correlation is positive.

Outcomes

U3.AoS1.9

correlation coefficient, 𝑟, its interpretation, the issue of correlation and cause and effect

U3.AoS1.22

calculate the correlation coefficient, 𝑟, and interpret it in the context of the data

What is Mathspace

About Mathspace