topic badge

2.01 Bivariate data and line of best fit

Worksheet
Line of best fit
1

Draw a line of best fit for the following scatter plots:

a
1
2
3
4
5
6
7
8
9
x
1
2
3
4
5
6
7
8
9
y
b
10
20
30
40
50
60
70
80
90
x
20
22
24
26
28
30
32
34
36
y
c
20
22
24
26
28
30
32
34
36
\text{Price}
100
110
120
130
140
150
160
170
\text{Sales}
2

The life expectancy, E, in years, of individuals at different annual incomes, I, per \$1000, is plotted.

The line of best fit is given by: E = 0.09 I + 73.24

a

By how much does average life expectancy change for each \$1000 of annual income?

b

What is the average life expectancy of someone who earns no income?

50
100
150
200
I\,(\$1000)
75
100
E
3

Scientists record the number of aphids, A, in areas with different numbers of ladybeetles, L, in the given scatterplot:

The line of best fit is given by: A = - 3.82 L + 3865.21

a

How much does the average aphid population change with each extra ladybeetle? Round your answer to the nearest whole number.

b

What is the average aphid population of a region with no ladybeetles? Give your answer to the nearest whole number.

200
400
600
800
1000
L
800
1600
2400
3200
4000
A
Least squares regression line
4

Use a graphics calculator (or other technology) to find the line of best fit for the data below. Write the equation with the coefficient and constant term rounded to two decimal places.

p24371931322214302340
t-7-8-3-6-9-8-2-8-8-12
5

For each of the least squares regression lines:

i

State the gradient of the line.

ii

What does the gradient of the line indicate?

iii

Describe what happens to y, if x increases by 1 unit.

iv

State the value of the y-intercept.

a

y = 3.59 x + 6.72.

b

y = - 3.67 x + 8.42.

c

y = - 7.26 x + 4.35.

6

The age and price of various second-hand Mitsubishi Lancers are shown in the adjacent table:

a

Find the equation of the least squares regression line for the price, y, in terms of the age, x. Round all values to the nearest integer.

b

State the value of the vertical intercept.

c

What is the price of a Mitsubishi Lancer when it is brand new?

d

State the gradient of the line.

e

What does the gradient of the line indicate about the correlation between age and price?

f

What happens to the price of the car if its age increases by 1 year?

\text{Age}\text{Price } (\$)
116\,000
213\,000
021\,990
510\,000
78600
412\,500
311\,000
411\,000
84500
214\,500
7

The amount of money families spend on dining out each week, D, is measured against their weekly income, I. The following linear regression model is fitted to the data:

D = 0.3 I + 27
a

What is the average spending on dining out if the family has no income?

b

Is this a reasonable prediction? Explain your answer.

c

If the weekly income of a family increases by \$200, by how much can we expect their spending on dining out to increase by?

8

The number of hours spent watching TV each evening, h, is measured against the percentage results, p, achieved in the Economics exam.

The following linear regression model is fitted to the data:

p = - 10 h + 97

a

What does the vertical intercept indicate in context?

b

Is this a reasonable prediction? Explain your answer.

c

State the gradient of the line.

d

If a student increases the amount of TV they watch by 3.5 hours, by how much can we expect their Economic exam mark to drop?

9

The average number of pages read to a child each day and the child’s growing vocabulary are measured and the results are recorded in the table below:

Pages read per day252729313311829295
Total vocabulary40244046776220487295457460106
a

Find the equation of the least squares regression line for the total vocabulary, V, in terms of pages read per day, P. Round all values to one decimal place.

b

What is the child's vocabulary if the child doesn't read any pages each day?

c

Is this a reasonable prediction? Explain your answer.

d

How many words will the child's vocabulary increase for each additional page that a child reads each day?

10

Concern over student use of the social media app SnappyChatty leads to a study of student marks in Mathematics versus minutes spent using the app. The results are shown in the table below:

\text{Minutes, } M29215335425311421957162254
\text{Mark, }P\%26631337978951985936
a

Find the equation of the least squares regression line for the mark as a percentage, P, in terms of minutes spent using SnappyChatty, M. Round all values to one decimal place.

b

Predict the mark of a student that spends no time using the app.

c

Is this a reasonable prediction? Explain your answer.

d

Describe what will happen to the student's test mark for each additional minute that the student uses SnappyChatty.

Correlation coefficient
11

Would calculating the correlation coefficient be appropriate for the data in the following scatter plots? Explain your answer.

a
1
2
3
4
5
6
7
8
9
x
-100
100
200
300
400
500
600
y
b
1
2
3
4
5
6
7
8
9
10
x
-2
2
4
6
8
10
12
14
16
18
20
y
12

A researcher plotted the life expectancy of a group of men against the number of cigarettes they smoke a day. The results were recorded and the correlation coefficient r was found to be - 0.88.

Describe the correlation between the number of cigarettes smoked and the life expectancy of the group of men in terms of strength, direction and shape.

13

A researcher was evaluating the correlation between the number of years in education a person completes and the number of pets they have. The results were recorded and correlation coefficient r was found to be - 0.3.

Describe the correlation between the number of years in education a person completes and the number of pets they have.

14

In recent years, beekeepers and scientists have become concerned over a phenomenon known as colony collapse disorder (CCD), where the majority of worker bees in a hive disappear, leaving behind the queen and immature bees.

The percentage of beehive losses that can be attributed to CCD each year, since 2005, is shown in the table.

a

Create a scatter plot for H against Y.

\text{Year}Y\text{Hives lost to CCD, }(H)\%
\text{2005} 0 21
\text{2006} 116
\text{2007} 226
\text{2008} 329
\text{2009} 431
\text{2010} 533
b

A calculator has been used to fit a regression line to the data. Use the calculator output shown to state the equation of the regression line.

c

Use your regression equation to predict the percentage of hive losses, H, that will be due to CCD in 2016. Give your answer correct to two significant figures.

LinReg

\text{y = ax + b}

\text{a = 3.085714286}

\text{b = 20.28571429}

\text{r}^2\text{= 0.8010989011}

\text{r = 0.8950412846}

d

Describe the correlation between the number of years passed and the number of hives lost to CCD.

15

Consider the following set of data:

x15.713.116.11118.615.812.712.814.316.8
y28.328.828.42927.928.428.52928.528.6
a

Using a graphics calculator (or other technology), calculate the correlation coefficient between these scores. Round your answer to two decimal places.

b

Describe the correlation between these two variables.

c

Form an equation for the least squares regression line. Round all values to one decimal place.

16

Consider the following set of data:

a

Using a graphics calculator (or other technology), calculate the correlation coefficient between these scores. Round your answer to two decimal places.

b

Describe the correlation between these two variables.

c

Form an equation for the least squares regression line. Round all values to one decimal place.

xy
48.50166.10
24.60156.30
38.5063.40
39.70143.90
42.20142.20
29.60148.70
40.5090.90
42.60174.00
45.80149.50
23.80-52.30
47.60195.50
20.8065.50
Coefficient of determination
17

A scientist investigated the link between the number of cancer cells killed by a certain drug and the strength of the drug used. The results were recorded and the coefficient of determination, r^{2}, was found to be 0.92.

a

Describe the relationship between the strength of drug used and the cancer cells killed.

b

Can we say that there is a causal relationship between the strength of drug used and the cancer cells killed?

18

Dave sells bicycles. He thinks that he sells more bicycles when there is a full moon. He recorded his number of sales during the different moon phases, and the resultant coefficient of determination r^{2} was found to be 0.19.

a

Describe the relationship between the phase of the moon and number of bicycles sold.

b

Can we say that there is a causal relationship between the phase of the moon and number of bicycles sold?

19

Mae has a small herb garden with a thyme plant. She suspects that the growth of the thyme plant is exponentially dependent on the amount of tea she drinks.

For a month she keeps a daily log of the amount of tea she drinks and the millimetres that the thyme plant has grown. The resultant coefficient of determination r^{2} was found to be 0.13.

a

What proportion of the variation in thyme plant growth can be explained by the amount of tea Mae drinks? Give your answer to the nearest percent.

b

Can we say that there is a causal relationship between the growth of the thyme plant and the amount of tea that Mae drinks.

20

Shannon is convinced that the number of watch factories in a town is directly related to the number of department stores in that town. Shannon collects data on the number of watch factories and department stores in different towns across the country. The resultant coefficient of determination, r^{2}, was found to be 0.44.

a

What proportion of the variation in the number of watch factories can be explained by the number of department stores? Give your answer to the nearest percent.

b

Does the number of department stores determine the number of watch factories?

21

Calculate the value of the coefficient of determination for each of the following data sets. Round your answers to two decimal places.

a

A linear association between two data sets is such that the correlation coefficient is - 0.61.

b

A linear association between two data sets is such that the correlation coefficient is - 0.44.

22

A linear association between two data sets is such that the correlation coefficient is - 0.72.

What proportion of the variation can be explained by the linear relationship? Give your answer to the nearest percent.

23

A linear association between two data sets is such that the coefficient of determination, r^{2}, is 0.80.

a

Calculate the correlation coefficient, to four decimal places, if the relationship is negative.

b

Describe the strength of the relationship.

24

A linear association between two data sets is such that the coefficient of determination, r^{2}, is 0.66.

a

Calculate the correlation coefficient, to four decimal places, if the relationship is positive.

b

Describe the strength of the relationship.

25

The table alongside shows the years of teaching experience for various teachers across Australia along with their current salary:

a

Calculate the value of the correlation coefficient, rounded to two decimal places.

b

Hence or otherwise, calculate the value of the coefficient of determinationm rounded to four decimal places.

c

What percentage of the variation in teacher salaries can be explained by their years of teaching experience? Round your answer to two decimal places.

d

Is it reasonable to suggest that teaching experience is the sole influence on teacher salaries since r and r^{2} are both very strong? Explain your answer.

Years of ExperienceSalary
572\,400
667\,200
665\,700
667\,400
573\,600
364\,500
1087\,400
1182\,000
157\,300
1186\,900
1184\,700
668\,800
26

WhichBank is researching the number of applications they receive for home loans per day based on their advertised interest rate. The adjacent table shows their results:

a

Calculate the value of the coefficient of determination, rounded to two decimal places.

b

Hence or otherwise, calculate the value of the correlation coefficient, rounded to two decimal places.

c

What percentage of the variation in the number of applicants is accounted for by the interest rate? Round your answer to the nearest whole number.

d

Consider the following claim:

“If you want people to take out a home loan, then you need the interest rates to be low”.

Is this claim valid? Explain your answer.

Interest rateNumber of Applicants
3.2170
4.2158
4.4155
0.3196
6.5134
7.3125
4.8154
0.7191
10.395
2.4176
8.9110
10.795
27

Advertisers are researching the optimal length of a UTube advertisement so that viewers don’t click 'Skip'.

The results in the table alongside show the length of the advertisement in minutes, and the number of proportion of viewers who click 'Skip':

a

What percentage of the variation in the proportion of viewers pressing “Skip” is accounted for by the length of the advertisement? Round your answer to the nearest whole number.

b

Comment on the following claim:

“A UTube ad over 2 minutes in length will mean everyone presses 'Skip' ”.

MinutesProportion
5.11
5.91
0.40.6
1.40.4
0.80.7
40.8
4.10.8
0.60.3
2.70.9
5.80.8
1.40.7
0.70
1.10.4
0.80.8
4.80.8
Sign up to access Worksheet
Get full access to our content with a Mathspace account

Outcomes

3.1.2.3

calculate and interpret the correlation coefficient (𝑟) to quantify the strength of a linear association using Pearson’s correlation coefficient, where covariance and standard deviation are determined, using appropriate technology

3.1.3.3

model a linear relationship by fitting a least-squares line to the data

3.1.3.5

interpret the intercept and slope of the fitted line

3.1.3.6

use, not calculate, the coefficient of determination (R2) to assess the strength of a linear association in terms of the explained variation

3.1.4.3

solve practical problems by identifying, analysing and describing associations between two categorical variables or between two numerical variables.

What is Mathspace

About Mathspace