topic badge
CanadaON
Grade 12

r^2 (coefficient of determination)

Lesson

When we discuss the coefficient of determination, or the value of $r^2$r2, we can already tell that it must be related to the value of the correlation coefficient ($r$r) and something to do with measuring the relationship between two variables. 

How do we calculate $r^2$r2?

Firstly, if we already have the value of $r$r, we can square the value to get $r^2$r2. ($r^2=r\times r$r2=r×r)

So if $r=0.8$r=0.8, then $r^2=0.64$r2=0.64

If $r=-0.9$r=0.9 , then $r^2=0.81$r2=0.81

If we don't already have the value of $r$r, our calculators will calculate it for us. Notice below that the $r^2$r2 value is given on the same screen as the $r$r value.

What information does $r^2$r2 give us?

$r^2$r2 tells us the proportion of the dependent variable ($y$y) that can be explained by the variation in the independent variable ($x$x).

For example, if $r^2=0.92$r2=0.92 then we can say that $92%$92% of the variation in the dependent variable is explained by the variation in the independent variable. Alternatively we can say that $8%$8% of the variation in the dependent variable is not explained by the variation in the independent variable.

The closer $r^2$r2 is to $1$1, the more that the variation in the dependent variable is explained by the variation in the independent variable.

We're not saying that the closer $r^2$r2 is to $1$1 the more the $x$x variable is causing the $y$y variable to happen - be very careful with the language you use! There is a great entry written here about the difference between causation and correlation.  

Worked Examples

Question 1

Consider the graph on the right.

Loading Graph...

  1. Would calculating the correlation coefficient be appropriate for this data set?

    Yes

    A

    No

    B
  2. Why would it not be suitable to calculate the correlation coefficient?

    Select all statements that apply.

    The relationship graphed is not linear.

    A

    The outlier will bias the result.

    B

    There are not enough values given.

    C

Question 2

A scientist investigated the link between the number of cancer cells killed by a certain drug and the strength of the drug used. The results were recorded and the coefficient of determination $r^2$r2 was found to be $0.92$0.92.

  1. Which of the following is true?

    Select all that apply.

    There is a strong relationship between the strength of the drug used and the cancer cells killed.

    A

    The number of cancer cells killed causes the strength of the drug used.

    B

    We cannot infer a causal relationship between strength of the drug used and the cancer cells killed.

    C

    The strength of the drug used causes the cancer cells to be killed.

    D

    There is a weak relationship between the strength of the drug used and the cancer cells killed.

    E

Question 3

A linear association between two data sets is such that the correlation coefficient is $-0.72$0.72.

What proportion of the variation can be explained by the linear relationship?

Give your answer to the nearest percent.

Question 4

The heights (in cm) and the weights (in kg) of $8$8 primary school children is shown on the scattergraph below.

Loading Graph...

  1. Calculate the value of the coefficient of determination.

    Give your answer to two decimal places.

  2. Hence or otherwise calculate the value of the correlation coefficient.

    Give your answer to two decimal places.

  3. What percentage of the variation in weight is accounted for by the height of the child?

    Give your answer to the nearest whole percent.

  4. Consider these two comments on the claim “The weight of a child is primarily influenced by their height.”

    Which do you think is most correct?

    This claim is valid and is supported by the strong relationship between the two variables.

    A

    While this claim is supported by a strong relationship between the two variables, we cannot state causality as there may be other factors influencing the outcome.

    B

Outcomes

12D.D.2.1

Recognize that the analysis of two-variable data involves the relationship between two attributes, recognize the correlation coefficient as a measure of the fit of the data to a linear model, and determine, using technology, the relevant numerical summaries

What is Mathspace

About Mathspace