topic badge
AustraliaVIC
VCE 12 General 2023

3.02 Interpret and make predictions

Worksheet
Least squares regression line
1

Find the equation of the least squares regression line if:

a

An x-value of 5 gives a predicted value of y = 9, and an x-value of 8 gives a predicted value of y = 3.

b

An x-value of 4 gives a predicted value of y = 3, and an x-value of 6 gives a predicted value of y = 7.

2

Consider the following set of data:

x0.81.772.73.624.95.778.610.1
y3.42.92.52.41.91.91.81.71.54

The equation of the least squares line fitted to this data is y = 3.15 - 0.18 x.

a
i

Predict the value of y when x = 3.

ii

Is this an interpolation or extrapolation?

b
i

Predict the value of y when x = 30.

ii

Is this an interpolation or extrapolation?

3

Consider the following set of data:

Number of tests258111417202326
Average test score72.960.856.641.838.335.532.927.425

The equation of the least squares line fitted to this data is

\text{Average test score} = 70.34 - 1.92 \times \text{Number of tests}
a

Predict the average test score when the number of tests is 4.

b

Is this an interpolation or extrapolation?

Interpreting the gradient and y-intercept
4

A least squares regression line is given by y = 3.59 x + 6.72.

a

State the gradient of the regression line.

b

Interpret the meaning of the gradient.

c

State the value of the y-intercept.

5

A least squares regression line is given by y = - 3.67 x + 8.42.

a

State the gradient of the regression line.

b

Interpret the meaning of the gradient.

c

State the value of the y-intercept.

6

The price of various new and second-hand Mitsubishi Lancers are shown in the table:

a

Find the equation of the Least Squares Regression Line for the price \left( y \right) in terms of age \left( x \right). Round all values to the nearest integer.

b

State the value of the y-intercept.

c

Interpret the meaning of the y-intercept.

d

State the gradient of the line.

e

Interpret the meaning of the gradient in this context.

\text{Age}\text{Price (\$)}
116\,000
213\,000
021\,990
510\,000
78600
412\,500
311\,000
411\,000
84500
214\,500
7

Concern over student use of the social media app SnappyChatty leads to a study of student marks in Mathematics versus minutes spent using the app. Data collected from ten students is displayed in the table below:

\text{Minutes}29215335425311421957162254
\text{Mark } (\%)26631337978951985936
a

Find the equation of the Least Squares Regression Line for the mark as a percentage \left( y \right) in terms of minutes spent using SnappyChatty \left( x \right). Round all values to one decimal place.

b

State the value of the y-intercept.

c

Interpret the meaning of the y-intercept in this context.

d

State the gradient of the line.

e

Interpret the meaning of the gradient in this context.

8

The average number of pages read to a child each day and the child’s total vocabulary are measured. Data collected from ten children is displayed in the table below:

Pages read per day252729313311829295
Total vocabulary40244046776220487295457460106
a

Find the equation of the Least Squares Regression Line for the Total vocabulary \left ( y \right) in terms of Pages read per day \left( x \right). Round all values to one decimal place.

b

State the value of the y-intercept.

c

Interpret the meaning of the y-intercept in this context.

d

State the gradient of the line.

e

Interpret the meaning of the gradient in this context.

9

The amount of money households spend on dining out each week, D, is measured against their weekly income, I. The following linear regression model is fitted to the data:

D = 0.1 I + 25

a

Interpret the meaning of the y-intercept in this model.

b

State the gradient of the line.

c

If the weekly income of a family increases by \$100, what effect would we expect this to have on the amount of money spent on dining out?

10

The number of hours spent watching TV each evening, h, is measured against the percentage results, m, achieved in the Economics exam.

The following linear regression model is fitted to the data:

m = - 15 h + 97
a

Interpret the meaning of the y-intercept in this model.

b

Does the interpretation in the previous part make sense in this context?

c

State the gradient of the line.

d

If a student increases the amount of TV they watch by 3.5 hours, what effect do we expect this to have on their Economics exam mark?

Coefficient of determination
11

Consider the following scatter plots of bivariate data. Given the coefficient of determination, calculate the correlation coefficient, r, to two decimal places.

a

The coefficient of determination is 0.77.

2
4
6
8
10
12
14
16
18
20
22
x
5
10
15
20
25
30
35
40
45
50
55
y
b

The coefficient of determination is 0.94.

2
4
6
8
10
12
14
16
18
20
x
5
10
15
20
25
30
35
40
45
50
55
y
12

Consider the following set of data:

x- 1.20.51.90.11.120910.51.1
y33.335.634.641.521.242.336.540.232.1
a

Calculate r^{2}. Round your answer to two decimal places.

b

Interpret your result.

13

For each of the following sets of data, calculate the percentage of variation in y that can be explained by the variation in x. Round your answers to the nearest percent.

a
x- 1.30.18.13.23.16.715.911.210.9
y66.949.729.637.72619.91613.27.7
b
x- 1.50.37.24.24.17.517.712.310.2
y54.348.328.935.827.219.816.1137.8
14

For each set of summary statistics, along with the equation of the least squares line:

i

Find the coefficient of determination. Round your answer to two decimal places.

ii

Interpret your result.

a
\overline{x} = 180, \quad s_{x} = 5.3, \quad \overline{y} = 169, \quad s_{y} = 4.8, \quad y = 30.44 + 0.77 x
b
\overline{x} = 180,\quad s_{x} = 4.7,\quad \overline{y} = 169,\quad s_{y} = 3.8,\quad y = 21.54 + 0.79 x
Residuals
15

The following tables show the sets of data \left( x, y \right) and the predicted \hat{y} values based on a least-squares regression line. Complete the tables by finding the residuals. Round all values to one decimal place.

a
x\text{-values}13579
y\text{-values}22.722.324.221.821.5
\hat{y}25.223.421.619.818
\text{Residuals}
b
x\text{-values}56789
y\text{-values}37.737.221.127.144
\hat{y}28.930.431.933.434.9
\text{Residuals}
16

For each residual plot, comment on whether the association between the variables can be described as linear:

a
2
4
6
8
10
12
14
16
18
20
x
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
1
2
3
4
y
b
2
4
6
8
10
12
14
16
18
20
x
-10
-8
-6
-4
-2
2
4
6
8
10
y
17

The table shows a company's revenue, y in millions, in week x. The equation \\ y = 2 x + 14 can be used to model the data.

a

Complete the table.

b

Plot the residuals as a scatter plot.

c

Comment on the suitability of this model for the data.

xy\text{Value generated} \\ \text{ by model}\text{Residual}
219
321
522
627
730
1032
1339
1440
18

The table shows a company's costs, y in millions, in week x. The equation \\ y = 5 x + 12 can be used to model the data.

a

Complete the table.

b

Plot the residuals as a scatter plot.

c

Comment on the suitability of this model for the data.

xy\text{Value generated} \\ \text{ by model}\text{Residual}
122
225
433
639
953
1269
1481
1799
Regression analysis
19

The results (as percentages) for a practice spelling test and the real spelling test were collected for 8 students:

\text{Practice } (x)56.3079.0059.4077.0071.6064.4061.6068.20
\text{Real } (y)48.9069.3051.9066.0062.5056.1054.5059.20
a

Calculate the correlation coefficient for the scores. Round your answer to three decimal places.

b

Describe the statistical relationship between these two variables.

c

Using technology, find the equation for the least squares regression line of y on x. Round all values to two decimal places.

d

Use your regression line to predict the real spelling test result of a student who scored 60\% in their practice spelling test. Round the answer to two decimal places.

e

Comment on the validity of this prediction.

20

The forecast maximum temperature, in degrees Celsius, and the observed maximum temperature are recorded to determine the accuracy in the temperature prediction models used by the weather bureau.

a

Calculate the correlation coefficient for these temperatures. Round your answer to two decimal places.

b

Describe the statistical relationship between these two variables.

c

Use your graphing calculator to find the equation for the least squares regression line of y on x.

d

Use your regression line to predict the observed maximum temperature on a day in the same month when the forecast was 25 \degree \text{C}.

\text{Forecast } (x)\text{Observed } (y)
30.1031.20
27.2027.80
26.9024.30
29.0027.60
30.2031.60
34.2031.50
33.2033.70
36.9034.30
27.1025.00
30.1031.90
e

Which of the following forecast temperatures would give the most reliable predicted temperature? Explain your answer.

39 \degree \text{C}, 25 \degree \text{C}, \text{ or } 32 \degree \text{C}
21

During an alcohol education programme, 10 adults were offered up to 6 drinks and were then given a simulated driving test where they scored a result out of a possible 100. The results are displayed in the following table:

\text{Number of drinks } (x)3264416342
\text{Driving score } (y)66614358567331645562
a

Calculate the correlation coefficient for the data. Round your answer to two decimal places.

b

Describe the statistical relationship between the two variables.

c

Use your graphing calculator to find the equation of the least squares regression line of y on x. Round all values to one decimal place.

d

Use your regression line to predict the driving score of a young adult who consumed 5 drinks. Round your answer to one decimal place.

e

Comment on the validity of your prediction.

22

Research on the number of cigarettes smoked during pregnancy and the birth weights of the newborn babies was conducted and results displayed in the table below:

a

Calculate the correlation coefficient for the data. Round your answer to three decimal places.

b

Describe the statistical relationship between these two variables.

c

Use your graphing calculator to find the equation of the least squares regression line of y on x. Round all values to two decimal places where necesssary.

d

Use your regression line to predict the birth weight of a newborn whose mother smoked on average 5 cigarettes per day.

e

Comment on the reliability of your prediction.

\text{Average number of} \\ \text{ cigarettes per day } (x)\text{Birth weight} \\ \text{ in kilograms } (y)
46.303.90
13.005.80
21.405.00
25.004.80
8.605.50
36.504.50
1.007.00
17.905.10
10.605.50
13.405.10
37.303.80
18.505.70
23

A sample of families were interviewed about their annual family income, x, and their average monthly expenditure, y. Results are displayed in the table below:

a

Calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between the two variables.

c

Use your graphing calculator to find the equation for the least squares regression line of y on x. Round all values to three decimal places.

d

Use your regression line to predict the monthly expenditure for a family whose annual income is \$80\,000.

e

Which of the following annual incomes would give the more reliable prediction of average monthly expenditure? \$99\,000, \$51\,000 \text{ or } \$80\,000Explain your answer.

\text{Annual income } \\ (x)\text{Average monthly} \\ \text{ expenditure } (y)
66\,0001100
75\,0001700
65\,0001400
73\,0001300
54\,000600
90\,0001800
87\,0001100
87\,0001500
94\,0001800
96\,0002200
24

The data in the table and scatter plot below show the frequencies per month of online marketing emails sent out to subscribers, compared with the proportion of subscribers who click and open the email:

2
4
6
8
10
x
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
y
\text{Frequency } (x)\text{Proportion} \\ \text{click and open } (y)
30.41
40.59
11.28
50.47
40.26
70.62
100.75
a

Calculate the correlation coefficient between these two variables. Round answer to two decimal places.

b

Which piece of data appears to be an outlier?

c

Remove the outlier and recalculate the correlation coefficient.

d

Describe the statistical relationship between the two variables.

e

Use your graphing calculator to find the equation for the least squares regression line of y on x (with the outlier removed). Round values to two decimal places.

f

Use your regression line to predict the proportion of emails opened if they're sent 20 times a month.

g

Comment on the validity of your prediction.

25

Hospital patients aged between 18 and 65 years of age had their ages in years \left( x \right) and their blood pressures in millimeters of mercury \left( y \right) recorded. The following summary statistics are available:

\overline{x} = 51, \quad s_{x} = 11.53, \quad \overline{y} = 138, \quad s_{y} = 14.07, \quad r = 0.87
a

Calculate the gradient of the least squares regression line. Round your answer to two decimal places.

b

Calculate the vertical intercept of the least squares regression line.

c

Hence, state the equation that can be used to predict a patient’s blood pressure from their age.

d

Predict the blood pressure for a patient who is 20 years old.

e

Comment on the validity of your prediction.

26

12 states in the USA with populations ranging from 700\,000 to 10\,000\,000 residents were asked for their budget expenditure. Summary statistics on the expenditure (y in billions of dollars) and the population (x in millions) are presented below:

\overline{x} = 5.649, \quad s_{x} = 4.734, \quad \overline{y} = 7.227, \quad s_{y} = 6.633, \quad r = 0.963
a

Which variable is the explanatory variable?

b

Calculate the gradient of the least squares regression line. Round your answer to two decimal places.

c

Calculate the vertical intercept of the least squares regression line.

d

Predict the expenditure for a state with 600\,000 residents.

e

Comment on the reliability of your prediction.

27

The climates of various cities were studied for their latitude (x in degrees), altitude (w in metres) and mean daily temperature (y in degrees Celsius).

a

The coefficient of determination for latitude against temperature was 0.565. Calculate the correlation coefficient. Round your answer to three decimal places.

b

Consider the following summary statistics:

\overline{w} = 201.9, \quad s_{w} = 103.519, \quad \overline{y} = 9.49, \quad s_{y} = 10.91, \quad r = - 0.827

Which is the better predictor of temperature, latitude or altitude?

c

Hence, calculate the gradient of the best least squares regression line for predicting the temperature of a city. Round your answer to two decimal places.

d

Calculate the vertical intercept of this least squares regression line. Round your answer to two decimal places.

e

Predict the temperature for a city with an altitude of 100 \text{ m}. Round your answer to two decimal places.

f

Given the lowest recorded altitude in this data set was 90 \text{ m}, comment on the validity of the prediction.

Sign up to access Worksheet
Get full access to our content with a Mathspace account

Outcomes

U3.AoS1.26

calculate the coefficient of determination, 𝑟^2, and interpret in the context of the association being modelled and use the model to make predictions, being aware of the problem of extrapolation

U3.AoS1.24

determine the equation of the least squares line giving the coefficients correct to a required number of decimal places or significant figures as specified, and distinguish between correlation and causation

U3.AoS1.25

use the least squares line of best fit to model and analyse the linear association between two numerical variables and interpret the model in the context of the association being modelled

What is Mathspace

About Mathspace