topic badge

2.02 Making predictions

Worksheet
Predictions from an equation
1

Predict the value of y, using the given value of x and the line of best fit:

a

y = - 8.71 x + 6.79; x = 3.49

b

y = 8.84 x; x = 7.68

c

y = - 0.84 x - 0.19; x = - 43.15

d

y = 22.42 x + 2.93; x = 0.26

2

A bivariate data set has a line of best fit with equation t = 4.24 s.

Predict the value of t when s = 3.76.

3

A bivariate data set has a line of best fit with equation B = - 3.37 A + 9.87.

Predict the value of B when A = 8.26.

4

Find the value of x, using the given value of y and the line of best fit:

a

y = 9.23 x - 4.18; y = 24.8945

b

y = - 7.76 x - 5.89; y = 713.7724

c

y = - 6.83 x; y = - 59.6259

d

y = 0.45 x + 7.62; y = 7.566

5

A bivariate data set has a line of best fit with equation u = - 9.12 v - 6.93.

Find the value of v that gives a prediction of u = 575.6556.

Interpolation and extrapolation
6

Decide whether the prediction is an extrapolation or an interpolation for each of the data sets below:

a

A prediction for the y-value when x = 5 is made from the data set below:

x47811121317181920
y02476488118
b

A prediction for the y-value when x = 33 is made from the data set below:

x37545859435560386435
y7253262173471210210112
c

A prediction of y = 95.69 is made from the following data set using the line of best fit with equation y = - 0.07 x + 96.18:

x191017141125178
y9494.49796.494.497.894.995.99694.4
d

A prediction of y = 72.77 is made from the following data set using the line of best fit with equation y = 1.26 x - 57.01:

x93578697789668695492
y51.225.438.958.638.260.826.328.55.492
Reliability of predictions
7

For each of the following data sets:

i

Predict the value of y for the given x value.

ii

Comment on whether the prediction is reliable, referring to both the strength of correlation and whether interpolation or extrapolation is used.

a

Correlation coefficient: r = 0.93, x = 15, line of best fit: y = 0.84 x + 2.66.

x107141613195209
y9101436101992113
b

Correlation coefficient: r = 0.46, x = 49, line of best fit: y = 0.64 x + 54.31.

x373352100658183185951
y47.636.4145.61377493.878.482.4101.2117.3
c

Correlation coefficient: r = - 1, x = 1, line of best fit: y = - 3.06 x + 93.51.

x48969951434282856954
y-45-202-213-72-37-38-163-156-113-73
d

Correlation coefficient: r = 0.45, x = 143, line of best fit: y = 0.22 x + 0.7.

x23811144509151955382
y135.2112.8231.721.23512.9
8

Research on the average number of cigarettes per day, x, smoked during pregnancy and the birth weight of the newborn baby, y, was conducted. Results are recorded in the adjacent table:

a

Calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between these two variables in terms of strength, direction and shape.

c

Use your graphing calculator to form an equation for the least squares regression line of y \text{ on } x. Round all values to two decimal places.

d

Use your regression line to predict the birth weight of a newborn whose mother smoked on average 5 cigarettes per day. Round your answer to two decimal places.

e

Explain the reliability of the prediction.

xy
46.303.90
13.005.80
21.405.00
25.004.80
8.605.50
36.504.50
1.007.00
17.905.10
10.605.50
13.405.10
37.303.80
18.505.70
9

A sample of families were interviewed about their annual family income and their average monthly expenditure. The results are given in the table below:

a

Calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between these two variables.

c

Use your graphing calculator to form an equation for the least squares regression line of y \text{ on } x. Round all values to three decimal places.

d

Use your regression line to predict the monthly expenditure for a family whose annual income is \$80\,000.

e

For which of the following annual incomes would the regression line produce the most reliable prediction?

A
\$99\,000
B
\$51\,000
C
\$80\,000
\text{Income }(x)\text{Expenditure }(y)
66\,0001100
75\,0001700
65\,0001400
73\,0001300
54\,000600
90\,0001800
87\,0001100
87\,0001500
94\,0001800
96\,0002200
10

The estimated iron ore grades (percentage of iron content) and the actual recovered iron ore grades are recorded to determine the accuracy of the models used by a mining company. The percentages are recorded in the table below:

\text{Estimated }(x)38\%32\%31\%33\%38\%40\%
\text{Actual }(y)48\%29\%42\%32\%40\%50\%
a

Calculate the correlation coefficient between the two variables, rounded to two decimal places.

b

Describe the statistical relationship between these two variables.

c

Use your graphing calculator to form an equation for the least squares regression line of y \text{ on } x. Round all values to two decimal places.

d

Use your regression line to predict the actual iron ore grade from a pit where the estimated grade is 35\%. Round your answer to two decimal places.

e

For which of the following estimated grades would the regression line produce the least reliable prediction of the actual grade?

A
\text{Estimate of } 32\%
B
\text{Estimate of } 29\%
C
\text{Estimate of } 45\%
11

A team of salespeople submit their expenses and their sales for the month of March to their manager. Figures are recorded in the adjacent table:

a

Calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between these two variables.

c

Use your graphing calculator to form an equation for the least squares regression line of y \text{ on } x. Round all values to two decimal places.

d

Use your regression line to predict the expenses of a person in this department who made sales of \$6000 for the month.

e

Explain the validity of this prediction.

\text{Sales (hundreds) }(x)\text{Expenses }(y)
55.743.8
4828.9
62.773.9
23.411.9
23.89
60.650
20.220.2
52.128.2
23.419.2
33.331.1
12

The forecast maximum temperature, in degrees Celsius, and the observed maximum temperature are recorded to determine the accuracy in the temperature prediction models used by the weather bureau. Results are displayed in the adjacent table:

a

Calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between these two variables.

c

Use your graphing calculator to form an equation for the least squares regression line of y \text{ on } x. Round all values to two decimal places.

d

Use your regression line to predict the observed maximum temperature on a day in the same month when the forecast was 25 \degree \text{C}. Round your answer to two decimal places.

e

For which of the following forecast temperatures would the regression line produce the most reliable prediction?

A
x = 39
B
x = 25
C
x = 32
\text{Forecast }(x)\text{Observed }(y)
30.1031.20
27.2027.80
26.9024.50
29.0027.60
30.2031.60
34.2031.50
33.2033.70
36.9034.30
27.1025.00
30.1031.90
13

During an alcohol education programme, 10 adults were offered up to 6 drinks and were then given a simulated driving test where they scored a result out of a possible 100. The results are displayed in the table below:

\text{Number of drinks }(x)3264416342
\text{Driving score }(y)66614358567331645562
a

Calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between these two variables.

c

Use your graphing calculator to form an equation for the least squares regression line of y \text{ on } x. Round all values to two decimal places.

d

Use your regression line to predict the driving score of a young adult who consumed 5 drinks. Round your answer to one decimal place.

e

Describe the reliability of the prediction in part (d).

14

Research has been conducted to deduce whether there is any relationship between the number of pirated downloads as a percentage of total downloads, x, and the incidences of malware infection in computers, y.

x40.561.177.434.665.363.85666.534.320.168.750.8
y3.77.116.710.65.215.812.712.81.50.98.512.9
a

Calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between these two variables.

c

Use your graphing calculator to form an equation for the least squares regression line of y \text{ on } x. Round all values to two decimal places.

d

Use your regression line to predict the number of computers in 1000 that you would expect to be affected by malware if in that region 80\% of downloads were pirated.

e

Describe the reliability of this prediction.

15

The results, as percentages, for a practice spelling test and the real spelling test were collected for 8 students. The results are displayed in the table below:

\text{Practice }(x)56.3079.0059.4077.0071.6064.4061.6068.20
\text{Real }(y)48.9069.3051.9066.0062.5056.1054.5059.20
a

Calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between these two variables.

c

Use your graphing calculator to form an equation for the least squares regression line of y \text{ on } x. Round all values to two decimal places.

d

Use your regression line to predict the real Spelling Test result of a student who scored 60\% in their Practice Spelling Test. Round the answer to two decimal places.

e

Describe the reliability of this prediction.

16

The data in the table and scatter plot below show the frequencies per month of online marketing emails sent out to subscribers compared with the proportion of subscribers who click and open the email:

\text{Frequency }(x)\text{Proportion }(y)
30.41
40.59
11.28
50.47
40.26
70.62
100.75
1
2
3
4
5
6
7
8
9
10
11
x
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
y
a

Using a graphics calculator (or other technology), calculate the correlation coefficient between these two variables. Round your answer to two decimal places.

b

Which piece of data appears to be an outlier?

c

Remove the outlier and recalculate the correlation coefficient. Round your answer to two decimal places.

d

Describe the statistical relationship between these two variables (with the outlier removed).

e

Use your graphing calculator to form an equation for the least squares regression line of y on x (with the outlier removed). Round all values in the equation to two decimal places.

f

Use your regression line to predict the proportion of emails opened if they are sent 20 times a month. Round your answer to two decimal places.

g

Describe the reliability of the prediction in part (f).

17

In a laboratory experiment, a scientist measures the time, T seconds, that it takes for a chemical reaction to finish, after adding acids of different strengths, P pH. The pH scale, which is used to measure the strength of an acid, allows values from -7 (a strong acid) to +7 (a very strong base). The results of the experiment are shown in the table below:

a

It is found that approximately 81\% of the variation in reaction time can be explained by variation in the acid strength. Find the correlation coefficient.

b

The equation of the least-squares line for the data is T = - 10 P + 140.

What is the reaction time of an acid with a pH value of 0?

c

Use the least-squares equation to find the acid strength that would be required to give a prediction of 50 seconds.

d

Is the predicted value reasonable? Explain your answer.

pHtime
-4187
-3176
-2137
-1155
0153
1125
2103
3124
4101
18

Soil salinity is a problem that affects large areas of farmland in Australia. A farmer has measured wheat production W (in tonnes per hectare) for a number of paddocks with various salt levels S (in kg per hectare). The results are shown in the following scatter plot:

a

Describe the relationship between wheat production and salt levels.

b

It is determined that approximately 49\% of the variation in wheat production can be explained by the variation in salt levels of the land. What is the correlation coefficient?

c

The equation of the least-squares line is

W = - \dfrac{3}{5} S + 300

How many tonnes per hectare of wheat could a paddock unaffected by salt produce?

d

Use the least-squares equation to predict the wheat production for a paddock with a salt level of 600 kg/hectare.

e

Is this predicted value reasonable given the context? Explain your answer

50
100
150
200
250
300
S
50
100
150
200
250
300
350
W
Sign up to access Worksheet
Get full access to our content with a Mathspace account

Outcomes

3.1.3.7

use the equation of a fitted line to make predictions

3.1.3.8

distinguish between interpolation and extrapolation when using the fitted line to make predictions, recognising the potential dangers of extrapolation

3.1.4.2

identify and communicate possible non-causal explanations for an association, including coincidence and confounding due to a common response to another variable

What is Mathspace

About Mathspace