topic badge
AustraliaVIC
VCE 12 General 2023

2.06 Coefficient of determination

Lesson

Coefficient of determination

The coefficient of determination, r^2, is related to Pearson's correlation coefficient r, and is useful when interpreting a linear relationship between two variables. The coefficient of determination is between 0 and 1 and is most useful when converted to a percentage. It is used to describe the level of accuracy with which one variable can be used to predict another.

If we have already identified the value of r, this can simply be squared to get the value of r^2. For instance, if r=0.8 then r^2=0.64, and if r=-0.9 then r^2=0.81.

If we don't already have the value of r, technology can be used to calculate both.

For example, the CAS image below shows the r^2 value is given on the same screen as the r value.

A CAS calculator showing a stat calculation of linear regression. Ask your teacher for more information.

r^2 tells us the proportion of the response variable (y) that can be explained by the variation in the explanatory variable (x).

For example, if r^2=0.92 then we can say that 92\% of the variation in the response variable is explained by the variation in the explanatory variable.

The closer the value of r^2 is to 1, the more that the variation in the response variable is explained by the variation in the explanatory variable. Note that this does not mean that the closer r^2 is to 1, the more the x variable is causing the y variable to happen. Be very careful with the language you use - correlation does not imply causation.

Examples

Example 1

A scientist investigated the link between the number of cancer cells killed by a certain drug and the strength of the drug used. The results were recorded and the coefficient of determination r^2 was found to be 0.92.

Which of the following is true?

A
There is a strong relationship between the strength of the drug used and the cancer cells killed.
B
The number of cancer cells killed causes the strength of the drug used.
C
We cannot infer a causal relationship between the strength of the drug used and the cancer cells killed.
D
The strength of the drug used causes the cancer cells to be killed.
E
There is a weak relationship between the strength of the drug used and the cancer cells killed.
Worked Solution
Create a strategy

The coefficient of determination with a value close to 1 is considered strong, and a value close to 0 is considered weak.

Apply the idea

Option A is correct because it describe the strong relationship as the recorded coefficient of determination, 0.92, is close to 1.

Option B is incorrect because the statement does not make any sense as we see the number of cancer cells killed depends on the strength of the drug used.

Option C is correct because a high coefficient of determination value does not necessarily imply a causation (nor does a low one).

Option D is incorrect because again correlation does not infer causation.

Option E is incorrect because the relationship between the two variables is described as strong.

The correct answers are options A and C.

Example 2

A linear association between two data sets is such that the correlation coefficient is -0.72.

What proportion of the variation can be explained by the linear relationship? Give your answer to the nearest percent.

Worked Solution
Create a strategy

The proportion of variation is also the same as the coefficient of determination. So, we square the correlation coefficient.

Apply the idea
\displaystyle r^2\displaystyle =\displaystyle (-0.72)^2Square -0.72
\displaystyle =\displaystyle 0.52Evaluate and round up

So 52\% of the variation can be explained by the linear relationship.

Example 3

The heights (in \text{cm}) and the weights (in \text{kg}) of 8 primary school children is shown on the scattergraph below.

110
115
120
125
130
135
140
145
\text{Height}
45
50
55
60
\text{Weight}
a

Calculate the value of the coefficient of determination. Give your answer to two decimal places.

Worked Solution
Create a strategy

Use the linear regression function on your calculator.

Apply the idea

Using the Statistics mode, enter each x-coordinate along with its y-coordinate into a data table on your calculator then find the linear regression.

Look for the coefficient of determination (r^2):r^2=0.93

b

Calculate the value of the correlation coefficient. Give your answer to two decimal places.

Worked Solution
Create a strategy

Take the square root of the coefficient of determination.

Apply the idea
\displaystyle r\displaystyle =\displaystyle \sqrt{0.93}Take the square root of 0.93
\displaystyle =\displaystyle 0.96Evaluate
Reflect and check

Or we could use the same procedure as in part (a), and look for the correlation of coefficient (r):r=0.96

c

What percentage of the variation in weight is accounted for by the height of the child? Give your answer to the nearest whole percent.

Worked Solution
Create a strategy

Convert the coefficient of determination into a percentage.

Apply the idea

We have already found a value for the coefficient of determination (as a decimal).

\displaystyle 0.93 \displaystyle =\displaystyle 0.93 \times 100\% Multiply by 100\%
\displaystyle =\displaystyle 93\%Evaluate
d

Consider these two comments on the claim β€œThe weight of a child is primarily influenced by their height.”

Which do you think is most correct?

A
This claim is valid and is supported by the strong relationship between the two variables.
B
While this claim is supported by a strong relationship between the two variables, we cannot state the causality as there may be other factors inluencing the outcome.
Worked Solution
Create a strategy

Consider your answers to previous parts of this question and choose the statement that you think is most correct.

Apply the idea

The claim describes a strong relationship as the coefficient of determination, 0.93, is close to 1. But a high coefficient of determination (or correlation) value does not necessarily imply causation.

So the correct answer is B.

Idea summary

r^2 tells us the proportion of the response variable (y) that can be explained by the variation in the explanatory variable (x).

Outcomes

U3.AoS1.10

coefficient of determination, its interpretation

U3.AoS1.26

calculate the coefficient of determination, π‘Ÿ^2, and interpret in the context of the association being modelled and use the model to make predictions, being aware of the problem of extrapolation

What is Mathspace

About Mathspace