topic badge

2.04 Bivariate data - calculator free

Worksheet
Describing relationships
1

A science student is testing solar panels and constructs a scatter plot to compare the solar radiation intensity to electricity output. The scatter plot shows the electrical output as the explanatory variable and the solar radiation as the response variable.

Describe the correlation between the solar radiation and electrical output.

1
2
3
4
5
6
7
8
x
0.2
0.4
0.6
0.8
1
1.2
y
2

A software company has collected data for the number of workers employed on a project and the number of days to complete the project.

a

Describe the relationship between number of workers and the number of days to complete a project.

b

Is there a causal relationship between the variables?

c

The scatter plot shows a single point with the coordinates \left(3, 17\right). What does this point represent?

5
10
15
20
25
30
35
\text{Days}
5
10
15
20
\text{Workers}
3

Retail assistants in an electronics store were encouraged to take part in a sales improvement training course. After the training, the manager constructed a scatter plot to compare total sales in the last month against the number of hours of training undertaken.

a

What does each point on the scatter plot represent?

b

Describe the correlation between the number of hours spent training and the monthly sales.

2
4
6
8
10
12
14
x
1
2
3
4
5
6
7
8
y
4

The water authority wants to determine how the area of a block of land, A \text{ } (100 \text{ m}^2), affects the amount of water used, W (kL/year). The scatter plot comparing the variables is adjacent:

a

Describe the correlation between the size of the land and the annual water consumption.

b

It is determined that approximately 81\% of the variation of water use can be explained by the variation in the land area. What is the correlation coefficient?

c

The scatter plot shows a single point with the coordinates (6.7, 256). What does this point represent in context?

2
4
6
8
10
12
14
16
18
A
50
100
150
200
250
300
350
400
450
W
d

The equation of the least-squares line for the graph is W = 100A + 20.

i

State the significance of the slope of this line.

ii

State the significance of the vertical intercept of this line.

iii

Is your answer to part (ii) reasonable in the given context?

iv

Predict the annual water use for a block with an area of 500 \text{ m}^2.

v

Comment on the reliability of the prediction in part (iv).

vi

Is it possible to say if there is a causal relationship between land size and water consumption?

5

The electricity corporation is investigating how electricity consumption P (units per year) is related to the floor area of the house, A \text{ m}^2. The data they found is presented in the following scatter plot:

a

Describe the correlation between the size of the house and the annual electricity consumption.

b

It is determined that the correlation coefficient is 0.6. What percentage of change in electricity consumption can be attributed to a change in area of the house.

50
100
150
200
250
300
350
A
1000
2000
3000
4000
5000
6000
7000
8000
P
c

The equation of the least-squares line for the graph is P = 20A - 15.

i

State the significance of the slope of this line.

ii

State the significance of the vertical intercept of this line.

iii

Is your answer to part (ii) reasonable in the given context?

iv

Predict the annual electricity consumption for a house with a floor area of 200 \text{ m}^2.

v

Comment on the reliability of the prediction in part (iv).

vi

Is there a causal relationship between floor area and electricity consumption?

6

Bushfires are a constant threat in Australia, especially during the dry, summer months. The scatterplot below shows how the number of volunteer fire fighters, F, who attend bush fires is related to the area of land, A \text{ m}^2, that is burnt out by a fire.

a

Describe the correlation between the variables.

b

Write down an estimate of the correlation coefficient for number of firefighters and area of burnt out land.

c

Is it reasonable to conclude that if fewer fire fighters are sent to a fire then the area of burnt out land will be reduced? Explain your answer.

d

State a possible non-causal explanation for the observed association between the number of firefighters and the area of burnt out land.

A
F
7

The scatter plot below shows the recorded number of births and deaths in an Australian town over a 5 year period:

a

Draw a line of best fit.

b

Estimate the coordinates of the outlier.

c

Redraw the line of best fit for the remaining data without this outlier.

d

Describe the effect of the outlier on the vertical intercept, in relation to the given context.

e

Estimate the increase/decrease in deaths for every 100 births, using your line of best fit.

f

Give a non-causal reason why an increase in the number of deaths is associated with an increase in the number of births.

100
200
300
400
500
600
700
800
900
B
100
200
300
400
500
600
700
D
g

Peter notices that there is a strong correlation between the number of births and deaths, so he decides to move to another town with a lower birth rate so that he has a lower chance of dying. Use mathematical reasoning to explain why Peter's decision is not valid.

8

A medical study measured the blood glucose, G, and hormone, H, levels of a group of patients. The results are displayed in the scatter plot below, together with the least-squares regression line. The correlation coefficient for this data set is - 0.48.

a

How many patients with a hormone level of less than 8 units had a glucose level less than 150 units?

b

Determine the upper and lower glucose levels for the patients involved in this study.

c

Having no knowledge of the effects of insulin and glucose, one researcher involved in the study claims that a high insulin (hormone) level will cause a patient to have a low glucose level.

Is this claim correct?

d

Is there a causal relationship between the blood glucose and hormone levels of a group of patients?

1
2
3
4
5
6
7
8
9
10
H
60
80
100
120
140
160
180
200
220
240
G
e

State the number of patients involved in the survey.

f

How could the size of the study influence an explanation for an association between the variables?

9

Scatter plots for two sets of data are shown below:

Set A

x
y

Set B

x
y
a

Which data set has the strongest linear correlation between the variables?

b

Which data set appears to have a non-linear relationship? Explain your answer?

c

Why does data set B have the weakest correlation between its variables?

d

State whether the following pairs of variables could be represented by data set B:

i

Marks in an English examination and distance travelled from home to school.

ii

Cost of cars and cost of petrol.

iii

Distance travelled in a car and the cost of a driver’s license.

iv

Height and weight of students at school.

Least-squares regression line
10

Soil salinity is a problem that affects large areas of farmland in Australia. A farmer has measured wheat production W (in tonnes per hectare) for a number of paddocks with various salt levels S (in kg per hectare). The results are shown in the following scatter plot:

a

Describe the correlation between wheat production and salt levels.

b

It is determined that approximately 49\% of the variation in wheat production can be explained by the variation in salt levels of the land. What is the correlation coefficient?

c

The equation of the least-squares line is

W = - \dfrac{3}{5} S + 300

How many tonnes per hectare of wheat could a paddock unaffected by salt produce?

d

Use the least-squares equation to predict the wheat production for a paddock with a salt level of 600 kg/hectare.

e

Is this predicted value reasonable given the context? Justify your answer.

50
100
150
200
250
300
S
50
100
150
200
250
300
350
W
11

The fleet manager for the Australian Automotive Association wants to estimate how car maintenance costs C (in hundreds of dollars) are related to the distance K (in thousands of kilometres) driven each year. The data collected is shown in the scatter plot below:

a

The equation of the least-squares line for the graph is C = 9.9 K + 146.2.

As the least-square line predicts, by how much will the maintenance costs increase for each 1000 km increase in distance driven?

b

How much is the cost of the maintenance of a car that is not driven at all?

c

Is this interpretation of the intercept reasonable given the context? Justify your answer.

d

Use the least-squares equation to predict the annual maintenance cost for a car that is driven 40\,000 \text{ km} per year.

e

Is this predicted value reliable? Justify your answer.

5
10
15
20
25
30
35
K
100
200
300
400
500
C
f

Is there any reason to believe that there is a causal relationship between distance driven and maintenance costs? Justify your answer.

12

In a laboratory experiment, a scientist measures the time, T seconds, that it takes for a chemical reaction to finish, after adding acids of different strengths, P pH. The pH scale, which is used to measure the strength of an acid, allows values from -7 (a strong acid) to +7 (a very strong base). The results of the experiment are shown in the table below:

a

It is found that approximately 81\% of the variation in reaction time can be explained by variation in the acid strength. Find the correlation coefficient.

b

The equation of the least-squares line for the data is T = - 10 P + 140.

What is the reaction time of an acid with a pH value of 0?

c

Use the least-squares equation to find the acid strength that would be required to give a prediction of 50 seconds.

d

Is the predicted value reasonable? Justify your answer.

pHtime
-4187
-3176
-2137
-1155
0153
1125
2103
3124
4101
Residuals
13

For the data sets below:

i

Complete the table of residuals.

ii

Plot the residuals on a scatter plot.

iii

Determine if the model is a good fit for the data.

a

The table shows a company's costs y (in millions) in week x. The equation

y = 5 x + 12 is being used to model the data.

xy\text{Model value}\text{Residual}
122
225
433
639
953
1269
1481
1799
b

The table shows a company's revenue y (in millions) in week x. The equation

y = 2 x + 14 is being used to model the data.

xy\text{Model value}\text{Residual}
219
321
522
627
730
1032
1339
1440
Sign up to access Worksheet
Get full access to our content with a Mathspace account

What is Mathspace

About Mathspace