NZ Level 8 (NZC) Level 3 (NCEA) [In development]

Correlation Coefficient

Lesson

A correlation coefficient is a value that tells you the strength of a relationship between two variables. It is denoted by the letter $r$`r`. Remember that a correlation does not necessarily imply a causation.

A perfect positive correlation has a value of $1$1. That means that if we graphed the variables on a Cartesian plane, it would show a perfect, positive linear relationship. A perfect negative correlation has a value of $-1$−1. It's a perfect negative linear relationship. No correlation therefore has a value of $0$0, indicating there is no relationship between the variables.

So far, so reasonable. What if I have a correlation coefficient of $0.6$0.6? $-0.53$−0.53? What do they show?

Well consider the entire correlation extremes ranging from $-1$−1 to $1$1 as a continuum like this.

Right in the middle is $0$0, we call this no correlation.

We further divide up the line to indicate other values with descriptions like Weak, Moderate and Strong (positive or negative).

Whereabouts we place these divisions can, in some ways, be a little arbitrary. Ultimately the larger $|r|$|`r`| gets, the closer to perfect it is and the closer to $0$0, the more it reflects no correlation.

A weak correlation, indicates there is some correlation but it is not considered to be very significant. Values less than $0.5$0.5 are generally considered weak.

A strong correlation indicates that the connection between the variables is quite significant. The exact value that is placed on where 'strong' begins is slightly different in different parts of the world ranging from statements that values larger than $0.7$0.7 are strong, or larger than $0.8$0.8 are strong. But ultimately it's the idea that the larger the value the stronger the relationship that really matters here!

A moderate correlation falls between weak or strong.

Play with this applet to see how the correlation coefficient changes. Move each point. Try having one outlier and see how much that can change the correlation coefficient. Try moving the points so they are in a perfect straight line. What happens to the correlation coefficient value?

Remember!

Remember to always state if the correlation is positive or negative.

i.e. use words like weak negative, moderate positive or strong positive.

If I wanted to investigate the correlation between the number of schools in an area, and the number of cancer diagnoses, I would find there is a strong positive correlation coefficient. Does this mean that schools cause cancer?

No! This is because high numbers of cancer diagnoses will occur in highly populated areas. Highly populated areas will also need more schools. This is why you need to be careful and use common sense when considering the correlation coefficient and what you can interpret from it.

If, however, I looked into the price of a barrel of crude oil, and the price of petrol I would again find a strong positive coefficient. With some common sense, again you can work out that the change in the price of crude oil would be causing the change in the price of petrol, **not** the other way around.

Consider a case where a council continually raises the fine for a speeding ticket over a few months and records the number of speeding tickets issued. The council calculates the correlation coefficient to be $-0.74$−0.74 for the price of a ticket and the number of speeding tickets issued. What does this mean? Well we know the council is changing the value of the fine, so this is the cause. The higher the fine, the lower the number of speeding tickets, so the higher fine is a good deterrent for speeding.

This looks more complex than it is. Be careful when substituting in your values.

Remember!

As we are comparing two data sets we consider one data set as the $x$`x` values, and the other as the $y$`y` values.

$n$`n` is the number of data points in each set.

Given the following data:

$x$x |
$1$1 | $2$2 | $3$3 | $4$4 | $5$5 | $6$6 | $7$7 |
---|---|---|---|---|---|---|---|

$y$y |
$-2$−2 | $-2.3$−2.3 | $-2.66$−2.66 | $-2.48$−2.48 | $-2.54$−2.54 | $-2.9$−2.9 | $-2.24$−2.24 |

a) Calculate the correlation coefficient and give your answer to two decimal places.

Think: Substitute the relevant values into the equation.

Do:

$r$`r` =$\frac{7\times\left(-70.28\right)-28\times\left(-17.12\right)}{\sqrt{\left(7\times140-28^2\right)\left(7\times42.3952-\left(-17.12\right)^2\right)}}$7×(−70.28)−28×(−17.12)√(7×140−282)(7×42.3952−(−17.12)2)

=$\frac{-12.6}{26.82744863}$−12.626.82744863

=$-0.4697$−0.4697 (to 4 d.p.)

b) Choose the best description of this correlation

Think: What does a negative correlation coefficient mean? What does the $-0.4697$−0.4697 mean?

Do: Weak negative correlation.

For the graph depicted, choose the correlation coefficient that best represents it.

Loading Graph...

$1$1

A$0$0

B$-1$−1

C$-0.64$−0.64

D$1$1

A$0$0

B$-1$−1

C$-0.64$−0.64

D

In a study, it was found that the correlation coefficient between heights of women and probability of being turned down for a promotion was found to be $-0.90$−0.90.

Which is the most appropriate statement?

There is no evidence of a linear relationship between heights of women and probability of being turned down for a promotion.

AAs the heights of women increases the probability of being turned down for a promotion increases.

BAs the heights of women increases the probability of being turned down for a promotion decreases.

CThere is no evidence of a linear relationship between heights of women and probability of being turned down for a promotion.

AAs the heights of women increases the probability of being turned down for a promotion increases.

BAs the heights of women increases the probability of being turned down for a promotion decreases.

C

Given the following data:

x | $1$1 | $4$4 | $7$7 | $10$10 | $13$13 | $16$16 | $19$19 |
---|---|---|---|---|---|---|---|

y | $4$4 | $4.25$4.25 | $4.55$4.55 | $4.4$4.4 | $4.45$4.45 | $4.75$4.75 | $4.2$4.2 |

Calculate the correlation coefficient and give your answer to two decimal places.

Choose the best description of this correlation.

Moderate negative

AStrong positive

BWeak negative

CModerate positive

DStrong negative

EWeak positive

FModerate negative

AStrong positive

BWeak negative

CModerate positive

DStrong negative

EWeak positive

F

Carry out investigations of phenomena, using the statistical enquiry cycle: A conducting experiments using experimental design principles, conducting surveys, and using existing data sets B finding, using, and assessing appropriate models (including linear regression for bivariate data and additive models for time-series data), seeking explanations, and making predictions C using informed contextual knowledge, exploratory data analysis, and statistical inference D communicating findings and evaluating all stages of the cycle.

Investigate bivariate measurement data

Use statistical methods to make a formal inference