We can now describe the direction and strength of trends in bivariate data, but there is a more precise way of measuring how strong a linear model fits the data. It is called the correlation coefficient (also called Pearson's correlation coefficient), and it is represented by the letter $r$r. It measures how close bivariate data is to a straight line, and also tells us whether the line has a positive or negative gradient.
Calculating the correlation coefficient by hand is very time-consuming. People usually use technology (either a calculator or a computer) to calculate the coefficient, and then do the more difficult task of interpreting their result.
The direction of a linear relationship is given by the sign of the correlation coefficient.
The strength of a linear relationship is given by the value of $r$r. The closer $r$r is to $1$1 or $-1$−1, the stronger the relationship is likely to be.
The values would be similar for negative correlations.
Note: these ranges are only a guide, not a rule.
There are some special cases to consider.
Play with this applet to see how the correlation coefficient changes. Move each point. Try having one outlier and see how much that can change the correlation coefficient. Try moving the points so they are in a perfectly straight line. What happens to the correlation coefficient value?
|
Many websites have a correlation coefficient calculator, where you enter the ordered pairs in your data set and the calculator returns the coefficient. The correlation coefficient is often labelled $r$r or $R$R. Image 1 shows pairs of values of $x$x and $y$y entered separately but in corresponding order to calculate $R$R. | Of course, you can also use a scientific calculator to enter the ordered pairs and have it return the correlation coefficient value. Image 2 shows that in the Statistics mode, you can again enter the $x$x and corresponding $y$y-values in lists 1 and 2 and retrieve the correlation coefficient $R$R. |
The important work lies in interpreting these calculations, and what they mean for the strength of the relationship between the variables.
Calculate the correlation coefficient for the bivariate data in the table below to the nearest two decimal places.
$x$x | $7$7 | $12$12 | $6$6 | $19$19 | $8$8 | $16$16 | $20$20 | $9$9 | $11$11 | $19$19 |
---|---|---|---|---|---|---|---|---|---|---|
$y$y | $4$4 | $8$8 | $2$2 | $12$12 | $5$5 | $8$8 | $9$9 | $5$5 | $4$4 | $10$10 |
Describe the relationship between variables with a correlation coefficient of $-0.34$−0.34.
No linear relationship
Strong negative linear relationship
Weak positive linear relationship
Strong positive linear relationship
Weak negative linear relationship
The linear relationship between a set of data for variables $x$x and $y$y has a correlation coefficient of $0.3$0.3.
The linear relationship between a set of data for variables $x$x and $t$t has a correlation coefficient of $-0.9$−0.9.
Do the two linear relationships have the same direction?
Yes
No
Which relationship has a stronger correlation?
Between $x$x and $y$y.
Between $x$x and $t$t.
For the graph depicted, choose the correlation coefficient that best represents it.
$1$1
$0$0
$-1$−1
$-0.64$−0.64
This looks more complex than it is. Be careful when substituting your values.
As we are comparing two data sets we consider one data set as the $x$x values, and the other as the $y$y values.
$n$n is the number of data points in each set.
Given the following data:
$x$x | $1$1 | $2$2 | $3$3 | $4$4 | $5$5 | $6$6 | $7$7 |
---|---|---|---|---|---|---|---|
$y$y | $-2$−2 | $-2.3$−2.3 | $-2.66$−2.66 | $-2.48$−2.48 | $-2.54$−2.54 | $-2.9$−2.9 | $-2.24$−2.24 |
a) Calculate the correlation coefficient and give your answer to two decimal places.
Think: Substitute the relevant values into the equation.
Do:
$r$r =$\frac{7\times\left(-70.28\right)-28\times\left(-17.12\right)}{\sqrt{\left(7\times140-28^2\right)\left(7\times42.3952-\left(-17.12\right)^2\right)}}$7×(−70.28)−28×(−17.12)√(7×140−282)(7×42.3952−(−17.12)2)
=$\frac{-12.6}{26.82744863}$−12.626.82744863
=$-0.4697$−0.4697 (to 4 d.p.)
b) Choose the best description of this correlation
Think: What does a negative correlation coefficient mean? What does the $-0.4697$−0.4697 mean?
Do: Weak negative correlation.
Given the following data:
x | $1$1 | $4$4 | $7$7 | $10$10 | $13$13 | $16$16 | $19$19 |
---|---|---|---|---|---|---|---|
y | $4$4 | $4.25$4.25 | $4.55$4.55 | $4.4$4.4 | $4.45$4.45 | $4.75$4.75 | $4.2$4.2 |
Calculate the correlation coefficient and give your answer to two decimal places.
Choose the best description of this correlation.
Moderate negative
Strong positive
Weak negative
Moderate positive
Strong negative
Weak positive