topic badge

3.075 Pearson's correlation coefficient

Lesson

Pearson's correlation coefficient

We can now describe the direction and strength of trends in bivariate data, but there is a more precise way of measuring how strong a linear model fits the data. It is called the correlation coefficient (also called Pearson's correlation coefficient), and it is represented by the letter $r$r. It measures how close bivariate data is to a straight line, and also tells us whether the line has a positive or negative gradient.

Calculating the correlation coefficient by hand is very time-consuming. People usually use technology (either a calculator or a computer) to calculate the coefficient, and then do the more difficult task of interpreting their result.

Direction of a linear relationship

The direction of a linear relationship is given by the sign of the correlation coefficient.

  • If $r$r is positive then there is evidence of a positive linear relationship.
  • If $r$r is negative then there is evidence of a negative linear relationship.

Positive linear relationship
$r>0$r>0

Negative linear relationship
$r<0$r<0

Strength of a linear relationship

The strength of a linear relationship is given by the value of $r$r.  The closer $r$r is to $1$1 or $-1$1, the stronger the relationship is likely to be. 
 

  • No linear
  • relationship
  • $$

  • Weak linear
  • relationship
  • $$

  • Moderate linear
  • relationship
  • $$

  • Strong linear
  • relationship
  • $$

The values would be similar for negative correlations.

Note: these ranges are only a guide, not a rule.

There are some special cases to consider.

  • If $r=1$r=1 then all of the data points lie on a line with a positive gradient (a perfect positive correlation)
  • If $r=-1$r=1 then all of the data points lie on a line with a negative gradient (a perfect negative correlation)
  • If $r$r is close to $0$0 then there is probably no linear relationship. This could mean that the variables are unrelated, but it could also mean that there is a non-linear relationship between them.
  • Occasionally we might find that the correlation coefficient is impossible to calculate. This usually indicates that one or both of the variables is actually constant.

Perfect positive linear
relationship
$r=1$r=1

Perfect negative linear
relationship
$r=-1$r=1

Non-linear
relationship
$r=0.1$r=0.1

Constant
relationship
$r$r is undefined

Correlation applet

Play with this applet to see how the correlation coefficient changes. Move each point. Try having one outlier and see how much that can change the correlation coefficient. Try moving the points so they are in a perfectly straight line. What happens to the correlation coefficient value?

Using technology to calculate the correlation coefficient

Many websites have a correlation coefficient calculator, where you enter the ordered pairs in your data set and the calculator returns the coefficient. The correlation coefficient is often labelled $r$r or $R$R. Image 1 shows pairs of values of $x$x and $y$y entered separately but in corresponding order to calculate $R$R. Of course, you can also use a scientific calculator to enter the ordered pairs and have it return the correlation coefficient value. Image 2 shows that in the Statistics mode, you can again enter the $x$x and corresponding $y$y-values in lists 1 and 2 and retrieve the correlation coefficient $R$R.

The important work lies in interpreting these calculations, and what they mean for the strength of the relationship between the variables.

Practice questions

Question 1

Calculate the correlation coefficient for the bivariate data in the table below to the nearest two decimal places.

$x$x $7$7 $12$12 $6$6 $19$19 $8$8 $16$16 $20$20 $9$9 $11$11 $19$19
$y$y $4$4 $8$8 $2$2 $12$12 $5$5 $8$8 $9$9 $5$5 $4$4 $10$10

Question 2

Describe the relationship between variables with a correlation coefficient of $-0.34$0.34.

  1. No linear relationship

    A

    Strong negative linear relationship

    B

    Weak positive linear relationship

    C

    Strong positive linear relationship

    D

    Weak negative linear relationship

    E

Question 3

The linear relationship between a set of data for variables $x$x and $y$y has a correlation coefficient of $0.3$0.3.

The linear relationship between a set of data for variables $x$x and $t$t has a correlation coefficient of $-0.9$0.9.

  1. Do the two linear relationships have the same direction?

    Yes

    A

    No

    B
  2. Which relationship has a stronger correlation?

    Between $x$x and $y$y.

    A

    Between $x$x and $t$t.

    B

Question 4

For the graph depicted, choose the correlation coefficient that best represents it.

Loading Graph...

  1. $1$1

    A

    $0$0

    B

    $-1$1

    C

    $-0.64$0.64

    D

Calculating the correlation coefficient using the formula

This looks more complex than it is. Be careful when substituting your values.

Remember!

As we are comparing two data sets we consider one data set as the $x$x values, and the other as the $y$y values. 

$n$n is the number of data points in each set.

Worked example

Example 1

Given the following data:

$x$x $1$1 $2$2 $3$3 $4$4 $5$5 $6$6 $7$7
$y$y $-2$2 $-2.3$2.3 $-2.66$2.66 $-2.48$2.48 $-2.54$2.54 $-2.9$2.9 $-2.24$2.24

a) Calculate the correlation coefficient and give your answer to two decimal places.

Think: Substitute the relevant values into the equation.

Do:

$r$r =$\frac{7\times\left(-70.28\right)-28\times\left(-17.12\right)}{\sqrt{\left(7\times140-28^2\right)\left(7\times42.3952-\left(-17.12\right)^2\right)}}$7×(70.28)28×(17.12)(7×140282)(7×42.3952(17.12)2)

=$\frac{-12.6}{26.82744863}$12.626.82744863

=$-0.4697$0.4697 (to 4 d.p.)

b) Choose the best description of this correlation

Think: What does a negative correlation coefficient mean? What does the $-0.4697$0.4697 mean?

Do: Weak negative correlation.

Practice question

Question 5

Given the following data:

x $1$1 $4$4 $7$7 $10$10 $13$13 $16$16 $19$19
y $4$4 $4.25$4.25 $4.55$4.55 $4.4$4.4 $4.45$4.45 $4.75$4.75 $4.2$4.2
  1. Calculate the correlation coefficient and give your answer to two decimal places.

  2. Choose the best description of this correlation.

    Moderate negative

    A

    Strong positive

    B

    Weak negative

    C

    Moderate positive

    D

    Strong negative

    E

    Weak positive

    F

Outcomes

MA12-8

solves problems using appropriate statistical processes

What is Mathspace

About Mathspace