When we discuss the coefficient of determination, or the value of $r^2$r2, we can already tell that it must be related to the value of the correlation coefficient ($r$r) and something to do with measuring the relationship between two variables.
Firstly, if we already have the value of $r$r, we can square the value to get $r^2$r2. ($r^2=r\times r$r2=r×r)
So if $r=0.8$r=0.8, then $r^2=0.64$r2=0.64
If $r=-0.9$r=−0.9 , then $r^2=0.81$r2=0.81
If we don't already have the value of $r$r, our calculators will calculate it for us. Notice below that the $r^2$r2 value is given on the same screen as the $r$r value.
$r^2$r2 tells us the proportion of the dependent variable ($y$y) that can be explained by the variation in the independent variable ($x$x).
For example, if $r^2=0.92$r2=0.92 then we can say that $92%$92% of the variation in the dependent variable is explained by the variation in the independent variable. Alternatively we can say that $8%$8% of the variation in the dependent variable is not explained by the variation in the independent variable.
The closer $r^2$r2 is to $1$1, the more that the variation in the dependent variable is explained by the variation in the independent variable.
We're not saying that the closer $r^2$r2 is to $1$1 the more the $x$x variable is causing the $y$y variable to happen - be very careful with the language you use! There is a great entry written here about the difference between causation and correlation.
Consider the graph on the right.
Would calculating the correlation coefficient be appropriate for this data set?
Yes
No
Why would it not be suitable to calculate the correlation coefficient?
Select all statements that apply.
The relationship graphed is not linear.
The outlier will bias the result.
There are not enough values given.
A scientist investigated the link between the number of cancer cells killed by a certain drug and the strength of the drug used. The results were recorded and the coefficient of determination $r^2$r2 was found to be $0.92$0.92.
Which of the following is true?
Select all that apply.
There is a strong relationship between the strength of the drug used and the cancer cells killed.
The number of cancer cells killed causes the strength of the drug used.
We cannot infer a causal relationship between strength of the drug used and the cancer cells killed.
The strength of the drug used causes the cancer cells to be killed.
There is a weak relationship between the strength of the drug used and the cancer cells killed.
A linear association between two data sets is such that the correlation coefficient is $-0.72$−0.72.
What proportion of the variation can be explained by the linear relationship?
Give your answer to the nearest percent.
The heights (in cm) and the weights (in kg) of $8$8 primary school children is shown on the scattergraph below.
Calculate the value of the coefficient of determination.
Give your answer to two decimal places.
Hence or otherwise calculate the value of the correlation coefficient.
Give your answer to two decimal places.
What percentage of the variation in weight is accounted for by the height of the child?
Give your answer to the nearest whole percent.
Consider these two comments on the claim “The weight of a child is primarily influenced by their height.”
Which do you think is most correct?
This claim is valid and is supported by the strong relationship between the two variables.
While this claim is supported by a strong relationship between the two variables, we cannot state causality as there may be other factors influencing the outcome.