topic badge

3.085 Linear regression

Worksheet
Least squares regression line
1

The centroid for the data represented on the right is \left(18.04, 13.02\right) and the y-intercept is an integer:

Find the equation of the least squares regression line.

2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
x
2
4
6
8
10
12
14
16
18
20
y
2

The centroid for the data represented on the right is \left(4.7, 5.9\right) and the y-intercept is an integer:

Find the equation of the least squares regression line.

2
4
6
8
10
x
-28
-24
-20
-16
-12
-8
-4
4
8
12
16
20
24
28
y
3

Find the equation of the least squares regression line if:

a

An x-value of 5 gives a predicted value of y = 9, and an x-value of 8 gives a predicted value of y = 3.

b

An x-value of 4 gives a predicted value of y = 3, and an x-value of 6 gives a predicted value of y = 7.

4

For each of least squares regression lines given below:

i

State the gradient of the regression line.

ii

Interpret the meaning of the gradient.

iii

State the value of the y-intercept.

a

y = 3.59 x + 6.72

b

y = - 3.67 x + 8.42

5

The amount of money households spend on dining out each week, D, is measured against their weekly income, I. The following linear regression model is fitted to the data:

D = 0.1 I + 25

a

Interpret the meaning of the y-intercept in this model.

b

State the gradient of the line.

c

If the weekly income of a family increases by \$100, what effect would we expect this to have on the amount of money spent on dining out?

6

For each of the data sets below:

i

Use technology to calculate the correlation coefficient. Round your answer to two decimal places.

ii

Describe the statistical relationship between the two variables.

iii

Using technology, form an equation for the least squares regression line of y on x. Give all values to one decimal place.

a
x15.713.116.310.918.615.912.712.814.216.8
y28.328.828.228.82828.428.62928.428.5
b
x49.524.239.140.241.929.841.242.648.121.1
y167.397.463.0144.1143.3148.390.9173.3194.865.4
7

The price of various new and second-hand Mitsubishi Lancers are shown in the table:

a

Find the equation of the least squares regression line for the price, y, in terms of age, x. Round all values to the nearest integer.

b

State the value of the y-intercept.

c

Interpret the meaning of the y-intercept.

d

State the gradient of the line.

e

Interpret the meaning of the gradient in this context.

\text{Age}\text{Price (\$)}
116\,000
213\,000
021\,990
510\,000
78600
412\,500
311\,000
411\,000
84500
214\,500
8

The average number of pages read to a child each day and the child’s total vocabulary are measured. Data collected from ten children is displayed in the table below:

Pages read per day252729313311829295
Total vocabulary40244046776220487295457460106
a

Find the equation of the Least Squares Regression Line for the Total vocabulary \left ( y \right) in terms of Pages read per day \left( x \right). Round all values to one decimal place.

b

State the value of the y-intercept.

c

Interpret the meaning of the y-intercept in this context.

d

State the gradient of the line.

e

Interpret the meaning of the gradient in this context.

9

The following scatter plot shows the value (in dollars) of various 4-bedroom, 2-bathroom homes in a new suburb against their age (in years):

1
2
3
4
5
6
7
\text{Age}
500000
1000000
1500000
2000000
2500000
3000000
\text{Value}
a

What does the y-intercept of \$ 14\,811 indicate?

b

Does the interpretation in part (a) make sense in this context? Explain your answer.

c

Use the y-intercept and the centroid (whose coordinates are given on the graph) to calculate the gradient of the least squares regression line. Round your answer to one decimal place.

d

Interpret the meaning of the gradient in this context.

10

The following scatter plot shows the heights (in centimetres) and the weights (in kilograms) of 8 primary school children:

20
40
60
80
100
120
140
x
30
40
50
y
a

State the y-value of the y-intercept.

b

What does the y-intercept indicate?

c

Does the interpretation in the previous part make sense in this context? Explain your answer.

d

Use the y-intercept and the centroid (whose coordinates are shown on the graph) to calculate the gradient of the least squares regression line. Round your answer to one decimal place.

e

Interpret the meaning of the gradient in this context.

11

Concern over student use of the social media app SnappyChatty leads to a study of student marks in Mathematics versus minutes spent using the app. Data collected from ten students is displayed in the table below:

\text{Minutes}29215335425311421957162254
\text{Mark } (\%)26631337978951985936
a

Find the equation of the least squares regression line for the mark as a percentage, y in terms of minutes spent using SnappyChatty, x. Round all values to one decimal place.

b

State the value of the y-intercept.

c

Interpret the meaning of the y-intercept in this context.

d

State the gradient of the line.

e

Interpret the meaning of the gradient in this context.

Predictions
12

Research on the number of cigarettes smoked during pregnancy and the birth weights of the newborn babies was conducted:

a

Using technology, calculate the correlation coefficient for the data. Round your answer to three decimal places.

b

Describe the statistical relationship between these two variables.

c

Use technology to form an equation for the least squares regression line of y on x. Round all values to two decimal places.

d

Use your regression line to predict the birth weight of a newborn whose mother smoked on average 5 cigarettes per day. Round your answer to two decimal places.

e

Comment on the reliability of your prediction.

\text{Average number of} \\ \text{ cigarettes per day } (x)\text{Birth weight} \\ \text{ in kilograms } (y)
45.904.00
13.105.70
21.94.90
24.405.00
9.305.70
36.704.30
0.507.00
18.005.10
10.005.60
13.005.20
37.304.00
18.905.80
13

The forecast maximum temperature, in degrees Celsius, and the observed maximum temperature are recorded to determine the accuracy in the temperature prediction models used by the weather bureau. Results are displayed in the table below:

a

Using technology, calculate the correlation coefficient between these temperatures. Round your answer to two decimal places.

b

Describe the linear relationship between these two variables.

c

Use technology to form an equation for the least squares regression line of y on x. Round all values to two decimal places.

d

Use your regression line to predict the observed maximum temperature on a day in the same month when the forecast was 25 \degree\text{C}. Round your answer to two decimal places.

e

For which of the following forecast temperatures would the regression line produce the most reliable prediction?

A

x = 32

B

x = 38

C

x = 25

\text{Forecast }(x)\text{Observed }(y)
30.0030.80
27.0027.90
26.9024.60
28.9025.00
29.8031.40
34.1031.50
33.0033.80
37.1034.30
27.0025.10
30.2031.50
14

A team of salespeople submit their expenses and their sales for the month of March to their manager. The data collected is displayed in the following table:

\text{Sales } (\times 100 \, \$)55.44862.623.623.860.620.151.823.433.3
\text{Expenses (\$)} 43.929.27412949.820.22819.131.2
a

Using technology, calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between the two variables.

c

Use technology to form an equation for the least squares regression line of y (expenses in dollars) on x (sales in hundreds of dollars). Round all values to two decimal places.

d

Use your regression line to predict the expenses of a person in this department who made sales of \$6000 for the month. Round you answer to two decimal places.

15

During an alcohol education programme, 10 adults were offered up to 6 drinks and were then given a simulated driving test where they scored a result out of a possible 100. The results are displayed in the following table:

\text{Number of drinks } (x)3264416342
\text{Driving score } (y)65604257567433635562
a

Using technology, calculate the correlation coefficient between these variables. Round your answer to two decimal places.

b

Describe the correlation between the two variables.

c

Use technology to find the equation for the least squares regression line of y on x. Round all values to one decimal place.

d

Use your regression line to predict the driving score of a young adult who consumed 5 drinks. Round your answer to one decimal place.

16

Research has been conducted to deduce whether there is any relationship between the number of pirated downloads (as a percentage of total downloads) and the incidences of malware infection per 1000 computers. The results are displayed in the table:

a

Using technology, calculate the correlation coefficient between these two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between the two variables.

c

Use technology to form an equation for the least squares regression line of y on x. Round all values to two decimal places.

d

Use your regression line to predict the number of computers in 1000 that you would expect to be affected by malware if in that region 80\% of downloads were pirated. Round your answer to two decimal places.

\text{Pirated downloads} \\ \text{as percentage of} \\ \text{ total downloads }(x)\text{Incidences of} \\ \text{malware per}\\ \text{1000 computers }(y)
40.53.7
60.16.1
77.415.7
34.610.6
66.35.2
63.814.8
5712.7
67.512.8
35.33.5
19.11.9
69.77.5
51.810.9
17

A sample of families were interviewed about their annual family income and their average monthly expenditure. The results are given in the table below:

a

Using technology, calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between the two variables.

c

Use technology to form an equation for the least squares regression line of y on x. Round all values to three decimal places.

d

Use your regression line to predict the monthly expenditure for a family whose annual income is \$80\,000. Round your answer to two decimal places.

e

For which of the following annual incomes would the regression line produce the most reliable prediction?

A

\$101\,000

B

\$51\,000

C

\$80\,000

\text{Income }(x)\text{Expenditure }(y)
67\,000700
72\,0001600
62\,0001700
70\,0001000
54\,000600
88\,0001900
83\,0001200
91\,0001000
92\,0002000
98\,0002100
18

The data in the table and scatter plot below show the frequencies per month of online marketing emails sent out to subscribers compared with the proportion of subscribers who click and open the email:

\text{Frequency }(x)\text{Proportion }(y)
30.41
40.58
11.31
50.51
40.33
70.61
100.75
1
2
3
4
5
6
7
8
9
10
11
x
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
y
a

Using technology, calculate the correlation coefficient between these two variables. Round your answer to two decimal places.

b

Which piece of data appears to be an outlier?

c

Remove the outlier and recalculate the correlation coefficient. Round your answer to two decimal places.

d

Describe the correlation between the two variables (with the outlier removed).

e

Use technology to form an equation for the least squares regression line of y on x (with the outlier removed). Round all values to two decimal places.

f

Use your regression line to predict the proportion of emails opened if they're sent 20 times a month. Round your answer to two decimal places.

19

The results (as percentages) for a practice spelling test and the real spelling test were collected for 8 students. Data collected are presented in the following table:

\text{Practice }(x)56.3078.9059.3076.9071.5064.5061.6068.30
\text{Real }(y)48.8069.2052.0066.1062.4056.0054.359.30
a

Using technology, calculate the correlation coefficient between these scores. Round your answer to three decimal places.

b

Describe the correlation between the two variables.

c

Using technology, form an equation for the least squares regression line of y on x. Round all values to two decimal places.

d

Use your regression line to predict the real spelling Test result of a student who scored 60\% in their practice spelling Test. Round the answer to two decimal places.

Sign up to access Worksheet
Get full access to our content with a Mathspace account

Outcomes

MA12-8

solves problems using appropriate statistical processes

What is Mathspace

About Mathspace