Pearson's correlation coefficient is a value that tells you the strength of the linear relationship between two variables. Sometimes the raw data's upward or downward trend is not very obvious or it is non-linear, so we can clarify this trend by ranking the data and calculating Spearman's rank correlation coefficient.
A class of students sat a maths quiz. The teacher wanted to see if there was a correlation between the number of absences of a student, and their mark on the quiz. The data from the class is shown below:
Absences, $x$x | $1$1 | $2$2 | $3$3 | $4$4 | $7$7 | $8$8 | $9$9 | $10$10 |
---|---|---|---|---|---|---|---|---|
Quiz mark, $y$y | $95$95 | $94$94 | $89$89 | $90$90 | $87$87 | $88$88 | $79$79 | $76$76 |
(a) Graph the data on a Cartesian plane.
(b) Calculate the correlation coefficient, to three significant figures.
(c) Find the ranks of the values.
(d) Graph the ranks on a Cartesian plane.
(e) Calculate Spearman's rank coefficient.
(a) Graph the data on a Cartesian plane.
Since the number of absences is the independent variable, it should be plotted on the horizontal axis. Since quiz mark is the dependent variable it should be plotted on the vertical axis. Below is the graph of the data:
(b) Calculate the correlation coefficient, to three significant figures.
Using your CAS calculator or other technology, you can calculate Pearson's correlation coefficient to be: $r_p=-0.907$rp=−0.907. So there is a strong negative correlation.
(c) Find the ranks of the values.
We can now add the ranks to the original table of values. We do this for both the $x$x-values, and then the $y$y-values separately. We assign the lowest value a rank of $1$1, then the second lowest value a $2$2, and so on until all the values for that variable are given a rank.
Absences, $x$x | $1$1 | $2$2 | $3$3 | $4$4 | $7$7 | $8$8 | $9$9 | $10$10 |
---|---|---|---|---|---|---|---|---|
Rank of $x$x | $1$1 | $2$2 | $3$3 | $4$4 | $5$5 | $6$6 | $7$7 | $8$8 |
Quiz mark, $y$y | $95$95 | $94$94 | $89$89 | $90$90 | $87$87 | $88$88 | $79$79 | $76$76 |
Rank of $y$y | $8$8 | $7$7 | $5$5 | $6$6 | $3$3 | $4$4 | $2$2 | $1$1 |
(d) Graph the ranks on a Cartesian plane.
Rank of $x$x is the independent variable, so it should be plotted on the horizontal axis. Rank of $y$y is the dependent variable, so it should be plotted on the vertical axis. Below is the graph of the ranks:
We can see the negative trend more clearly in this graph.
(e) Calculate Spearman's rank coefficient.
To calculate Spearman's rank correlation coefficient, $r_s$rs, we just calculate Pearson's correlation coefficient of the ranks for the data to get: $r_s=-0.95$rs=−0.95
This value confirms the negative/downward trend of the original data.
Spearman's rank coefficient can be found by ranking the values for each variable in a bivariate data set, and calculating the Pearson's correlation coefficient for the variables' ranks.
Spearman's rank coefficient is more appropriate than Pearson's correlation coefficient for data that does not have a clear linear trend.
Occasionally you may get equal data values for a particular variable in a bivariate data set. In this case you should first rank all the values giving the equal values consecutive rankings, e.g. $5$5th and $6$6th, or $9$9th, $10$10th and $11$11th. Then you should take the average of the ranks for the equal data values and allocate this average as the rank for the equal data values.
Consider the following table:
$x$x | $1$1 | $4$4 | $3$3 | $3$3 | $7$7 |
---|---|---|---|---|---|
Rank of $x$x |
We can see that there are two values of $3$3 for $x$x. So to rank the values at first we give the data values of $3$3 consecutive rankings of $2$2nd and $3$3rd since they rank after the first data value:
$x$x | $1$1 | $4$4 | $3$3 | $3$3 | $7$7 |
---|---|---|---|---|---|
Rank of $x$x | $1$1 | $4$4 | $2$2 | $3$3 | $5$5 |
So now we take the average of the two ranks for the equal data values:
rank | $=$= | $\frac{2+3}{2}$2+32 |
$=$= | $2.5$2.5 |
So now we must give the equal data values this rank:
$x$x | $1$1 | $4$4 | $3$3 | $3$3 | $7$7 |
---|---|---|---|---|---|
Rank of $x$x | $1$1 | $4$4 | $2.5$2.5 | $2.5$2.5 | $5$5 |
Only once this is done can Spearman's rank be calculated for a bivariate data set with equal data values.
If a data set has equal data values:
For five days, the number of ice cream sales at a particular store and the outside temperature was recorded. The data is shown in the table below:
Temperature, $x^\circ\text{C}$x°C | $15$15 | $21$21 | $32$32 | $38$38 | $38$38 | $40$40 | $42$42 |
---|---|---|---|---|---|---|---|
Ice cream sales, $y$y | $1000$1000 | $1600$1600 | $2800$2800 | $4200$4200 | $5800$5800 | $9000$9000 | $11000$11000 |
This data is shown on the graph below:
(a) Which correlation coefficient is more suitable for this data? Explain your answer.
Spearman's rank correlation coefficient is more suitable for this data because the data's trend is no linear. It seems to be exponentially increasing from the looks of the graph.
(b) Calculate Spearman's rank correlation coefficient.
Think: First we must add the ranks to the above table:
Temperature, $x^\circ\text{C}$x°C | $15$15 | $21$21 | $32$32 | $38$38 | $38$38 | $40$40 | $42$42 |
---|---|---|---|---|---|---|---|
Rank of $x$x | $1$1 | $2$2 | $3$3 | $4.5$4.5 | $4.5$4.5 | $6$6 | $7$7 |
Ice cream sales, $y$y | $1000$1000 | $1600$1600 | $2800$2800 | $4200$4200 | $5800$5800 | $9000$9000 | $11000$11000 |
Rank of $y$y | $1$1 | $2$2 | $3$3 | $4$4 | $5$5 | $6$6 | $7$7 |
Do: Now we can use our calculator to calculate Spearman's rank coefficient using these ranks:
$r_s$rs | $=$= | $0.991$0.991 |