topic badge
iGCSE (2021 Edition)

18.13 Lines of best fit

Worksheet
Line of best fit
1

Draw an approximate line of best fit by hand for each of the the scatter plots below:

a
5
10
15
20
x
5
10
15
20
y
b
5
10
15
20
x
5
10
15
20
y
c
5
10
15
20
x
5
10
15
20
y
d
5
10
15
20
x
5
10
15
20
y
2

The following scatter plot shows the data for two variables, x and y:

a

Sketch the line of best fit for this data.

b

Use your line of best fit to estimate the value of y when:

i

x = 4.5

ii

x = 9

1
2
3
4
5
6
7
8
9
x
1
2
3
4
5
6
7
8
9
y
3

The following scatter plot graphs data for the number of balls hit and the number of runs scored by a batsman:

a

Sketch the line of best fit for this data.

b

Use the line of best fit to estimate the number of runs scored by the batsman after hitting:

i

27 balls

ii

66 balls

c

Is the relationship between the two variables positive or negative?

5
10
15
20
25
30
35
40
45
50
55
60
65
\text{Balls Hit}
5
10
15
20
25
30
35
40
45
50
55
60
65
\text{Runs}
4

The average monthly temperature and the average wind speed, in knots, in a particular location was plotted over several months. The graph shows the points for each month’s data and their line of best fit:

Use the line of best fit to approximate the wind speed on a day when the temperature is 5\degree \text{C}.

1
2
3
4
5
6
7
8
9
\text{Temperature}(\degree \text{C})
1
2
3
4
5
6
7
8
\text{Speed}
Equation of a line of best fit
5

Consider the following scatter plot:

a

Is the relationship between the x and y variables positive or negative?

b

Sketch the line of best fit for this data.

c

Which of the following could be the equation for the line of best fit:

A
y = 2 - 3 x
B
y = 3 x + 2
C
y = - 3 x - 2
D
y = 3 x - 2
1
2
3
4
5
6
7
8
9
10
x
3
6
9
12
15
18
21
24
27
y
6

Consider the following scatter plot:

a

Is the relationship between the x and y variables positive or negative?

b

Which of the following could be the equation for the line of best fit:

A
y = - 4 x - 4
B
y = 44 + 4 x
C
y = - 4 x + 44
D
y = 4 x - 4
1
2
3
4
5
6
7
8
9
x
5
10
15
20
25
30
35
40
45
y
7

Use technology to find the line of best fit for the sets of data below. Write the equation with the coefficient and constant term to the nearest two decimal places.

a
x24371931322214302340
y-7-8-3-6-9-8-2-8-8-12
b
x718169161988127
y16.293.567.2512.118.253.2612.512.510.3314.29
c
x44394150455548544443
y2.291.573.863.144.434.863.864.714.294.14
8

Several cars underwent a brake test and their age, x (in years), was measured against their stopping distance, y (in metres). The scatter plot shows the results and a line of best fit that approximates the positive correlation:

a

According to the line, what is the stopping distance of a car that is 6 years old?

b

According to the line, what is the stopping distance of a car that is 10 years old?

c

Using the information found above, determine the gradient of the line of best fit.

d

State the value of the vertical intercept of the line.

e

Use the line of best fit found to estimate the stopping distance of a car that is 4.5 years old.

1
2
3
4
5
6
7
8
9
10
11
12
\text{Age}
10
20
30
40
50
\text{Distance}
9

The distance of several locations from the equator and their temperature on a particular day is measured. The values are presented on the following scatter plot:

a

Determine whether the following could be the equation of the line relating distance \left(x\right) and temperature \left(y\right):

A
y = - 0.005 x + 49
B
y = - 0.005 x - 49
C
y = 0.005 x + 49
D
y = 0.005 x - 49
b

Estimate the distance from the equator, x, if the temperature is 30.59 \degree \text{C}.

10

Find the equation of the line of best fit on the scatter plot shown:

1
2
3
4
5
6
7
8
9
10
x
2
4
6
8
10
12
14
16
18
y
11

Consider the scatter plot shown:

a

Find the equation of the line of best fit.

b

Use the line of best fit to approximate the value of y for x = 6.9.

1
2
3
4
5
6
7
8
9
x
1
2
3
4
5
6
7
8
9
10
11
y
Least squares regression line
12

The centroid for the data represented on the right is \left(18.04, 13.02\right) and the y-intercept is an integer:

Find the equation of the least squares regression line.

2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
x
2
4
6
8
10
12
14
16
18
20
y
13

The centroid for the data represented on the right is \left(4.7, 5.9\right) and the y-intercept is an integer:

Find the equation of the least squares regression line.

2
4
6
8
10
x
-28
-24
-20
-16
-12
-8
-4
4
8
12
16
20
24
28
y
14

Find the equation of the least squares regression line if:

a

An x-value of 5 gives a predicted value of y = 9, and an x-value of 8 gives a predicted value of y = 3.

b

An x-value of 4 gives a predicted value of y = 3, and an x-value of 6 gives a predicted value of y = 7.

15

For each of least squares regression lines given below:

i

State the gradient of the regression line.

ii

Interpret the meaning of the gradient.

iii

State the value of the y-intercept.

a

y = 3.59 x + 6.72

b

y = - 3.67 x + 8.42

16

The amount of money households spend on dining out each week, D, is measured against their weekly income, I. The following linear regression model is fitted to the data:

D = 0.1 I + 25

a

Interpret the meaning of the y-intercept in this model.

b

State the gradient of the line.

c

If the weekly income of a family increases by \$100, what effect would we expect this to have on the amount of money spent on dining out?

17

For each of the data sets below:

i

Describe the statistical relationship between the two variables.

ii

Using technology, form an equation for the least squares regression line of y on x. Give all values to one decimal place.

a
x15.713.116.310.918.615.912.712.814.216.8
y28.328.828.228.82828.428.62928.428.5
b
x49.524.239.140.241.929.841.242.648.121.1
y167.397.463.0144.1143.3148.390.9173.3194.865.4
18

The price of various new and second-hand Mitsubishi Lancers are shown in the table:

a

Find the equation of the least squares regression line for the price, y, in terms of age, x. Round all values to the nearest integer.

b

State the value of the y-intercept.

c

Interpret the meaning of the y-intercept.

d

State the gradient of the line.

e

Interpret the meaning of the gradient in this context.

\text{Age}\text{Price (\$)}
116\,000
213\,000
021\,990
510\,000
78600
412\,500
311\,000
411\,000
84500
214\,500
19

The average number of pages read to a child each day and the child’s total vocabulary are measured. Data collected from ten children is displayed in the table below:

Pages read per day252729313311829295
Total vocabulary40244046776220487295457460106
a

Find the equation of the Least Squares Regression Line for the Total vocabulary \left ( y \right) in terms of Pages read per day \left( x \right). Round all values to one decimal place.

b

State the value of the y-intercept.

c

Interpret the meaning of the y-intercept in this context.

d

State the gradient of the line.

e

Interpret the meaning of the gradient in this context.

20

The following scatter plot shows the value (in dollars) of various 4-bedroom, 2-bathroom homes in a new suburb against their age (in years):

1
2
3
4
5
6
7
\text{Age}
500000
1000000
1500000
2000000
2500000
3000000
\text{Value}
a

What does the y-intercept of \$ 14\,811 indicate?

b

Does the interpretation in part (a) make sense in this context? Explain your answer.

c

Use the y-intercept and the centroid (whose coordinates are given on the graph) to calculate the gradient of the least squares regression line. Round your answer to one decimal place.

d

Interpret the meaning of the gradient in this context.

21

The following scatter plot shows the heights (in centimetres) and the weights (in kilograms) of 8 primary school children:

20
40
60
80
100
120
140
x
30
40
50
y
a

State the y-value of the y-intercept.

b

What does the y-intercept indicate?

c

Does the interpretation in the previous part make sense in this context? Explain your answer.

d

Use the y-intercept and the centroid (whose coordinates are shown on the graph) to calculate the gradient of the least squares regression line. Round your answer to one decimal place.

e

Interpret the meaning of the gradient in this context.

22

Concern over student use of the social media app SnappyChatty leads to a study of student marks in Mathematics versus minutes spent using the app. Data collected from ten students is displayed in the table below:

\text{Minutes}29215335425311421957162254
\text{Mark } (\%)26631337978951985936
a

Find the equation of the least squares regression line for the mark as a percentage, y in terms of minutes spent using SnappyChatty, x. Round all values to one decimal place.

b

State the value of the y-intercept.

c

Interpret the meaning of the y-intercept in this context.

d

State the gradient of the line.

e

Interpret the meaning of the gradient in this context.

Predictions
23

A car company looked at the relationship between how much it had spent on advertising and the amount of sales each month over several months. The data has been plotted on the scatter graph and a line of best fit drawn:

a

Two points on the line are \left(3200, 300\right) and \left(5600, 450\right). Find the gradient of the line of best fit.

b

The line of best fit can be written in the form S=mA+ c, where m is the gradient, c is the vertical intercept, S is the money spent on sales in thousands of dollars, and A is the advertising costs.

Determine the value of c, the vertical intercept of the line.

c

Use the line of best fit to estimate the number of sales next month if \$4800 is to be spent on advertising.

1000
2000
3000
4000
5000
6000
7000
8000
A
100
200
300
400
500
600
700
800
S
24

Research on the number of cigarettes smoked during pregnancy and the birth weights of the newborn babies was conducted:

a

Describe the statistical relationship between these two variables.

b

Use technology to form an equation for the least squares regression line of y on x. Round all values to two decimal places.

c

Use your regression line to predict the birth weight of a newborn whose mother smoked on average 5 cigarettes per day. Round your answer to two decimal places.

d

Comment on the reliability of your prediction.

\text{Average number of} \\ \text{ cigarettes per day } (x)\text{Birth weight} \\ \text{ in kilograms } (y)
45.904.00
13.105.70
21.94.90
24.405.00
9.305.70
36.704.30
0.507.00
18.005.10
10.005.60
13.005.20
37.304.00
18.905.80
25

The forecast maximum temperature, in degrees Celsius, and the observed maximum temperature are recorded to determine the accuracy in the temperature prediction models used by the weather bureau. Results are displayed in the table below:

a

Describe the linear relationship between these two variables.

b

Use technology to form an equation for the least squares regression line of y on x. Round all values to two decimal places.

c

Use your regression line to predict the observed maximum temperature on a day in the same month when the forecast was 25 \degree\text{C}. Round your answer to two decimal places.

d

For which of the following forecast temperatures would the regression line produce the most reliable prediction?

A

x = 32

B

x = 38

C

x = 25

\text{Forecast }(x)\text{Observed }(y)
30.0030.80
27.0027.90
26.9024.60
28.9025.00
29.8031.40
34.1031.50
33.0033.80
37.1034.30
27.0025.10
30.2031.50
26

A team of salespeople submit their expenses and their sales for the month of March to their manager. The data collected is displayed in the following table:

\text{Sales } (\times 100 \, \$)55.44862.623.623.860.620.151.823.433.3
\text{Expenses (\$)} 43.929.27412949.820.22819.131.2
a

Describe the statistical relationship between the two variables.

b

Use technology to form an equation for the least squares regression line of y (expenses in dollars) on x (sales in hundreds of dollars). Round all values to two decimal places.

c

Use your regression line to predict the expenses of a person in this department who made sales of \$6000 for the month. Round you answer to two decimal places.

27

During an alcohol education programme, 10 adults were offered up to 6 drinks and were then given a simulated driving test where they scored a result out of a possible 100. The results are displayed in the following table:

\text{Number of drinks } (x)3264416342
\text{Driving score } (y)65604257567433635562
a

Describe the correlation between the two variables.

b

Use technology to find the equation for the least squares regression line of y on x. Round all values to one decimal place.

c

Use your regression line to predict the driving score of a young adult who consumed 5 drinks. Round your answer to one decimal place.

28

Research has been conducted to deduce whether there is any relationship between the number of pirated downloads (as a percentage of total downloads) and the incidences of malware infection per 1000 computers. The results are displayed in the table:

a

Describe the statistical relationship between the two variables.

b

Use technology to form an equation for the least squares regression line of y on x. Round all values to two decimal places.

c

Use your regression line to predict the number of computers in 1000 that you would expect to be affected by malware if in that region 80\% of downloads were pirated. Round your answer to two decimal places.

\text{Pirated downloads} \\ \text{as percentage of} \\ \text{ total downloads }(x)\text{Incidences of} \\ \text{malware per}\\ \text{1000 computers }(y)
40.53.7
60.16.1
77.415.7
34.610.6
66.35.2
63.814.8
5712.7
67.512.8
35.33.5
19.11.9
69.77.5
51.810.9
29

A sample of families were interviewed about their annual family income and their average monthly expenditure. The results are given in the table below:

a

Describe the statistical relationship between the two variables.

b

Use technology to form an equation for the least squares regression line of y on x. Round all values to three decimal places.

c

Use your regression line to predict the monthly expenditure for a family whose annual income is \$80\,000. Round your answer to two decimal places.

d

For which of the following annual incomes would the regression line produce the most reliable prediction?

A

\$101\,000

B

\$51\,000

C

\$80\,000

\text{Income }(x)\text{Expenditure }(y)
67\,000700
72\,0001600
62\,0001700
70\,0001000
54\,000600
88\,0001900
83\,0001200
91\,0001000
92\,0002000
98\,0002100
30

The data in the table and scatter plot below show the frequencies per month of online marketing emails sent out to subscribers compared with the proportion of subscribers who click and open the email:

\text{Frequency }(x)\text{Proportion }(y)
30.41
40.58
11.31
50.51
40.33
70.61
100.75
1
2
3
4
5
6
7
8
9
10
11
x
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
y
a

Which piece of data appears to be an outlier?

b

Describe the correlation between the two variables (with the outlier removed).

c

Use technology to form an equation for the least squares regression line of y on x (with the outlier removed). Round all values to two decimal places.

d

Use your regression line to predict the proportion of emails opened if they're sent 20 times a month. Round your answer to two decimal places.

31

The results (as percentages) for a practice spelling test and the real spelling test were collected for 8 students. Data collected are presented in the following table:

\text{Practice }(x)56.3078.9059.3076.9071.5064.5061.6068.30
\text{Real }(y)48.8069.2052.0066.1062.4056.0054.359.30
a

Describe the correlation between the two variables.

b

Using technology, form an equation for the least squares regression line of y on x. Round all values to two decimal places.

c

Use your regression line to predict the real spelling Test result of a student who scored 60\% in their practice spelling Test. Round the answer to two decimal places.

Sign up to access Worksheet
Get full access to our content with a Mathspace account

Outcomes

0607C11.8B

Straight line of best fit (by eye) through the mean on a scatter diagram.

0607E11.8B

Straight line of best fit (by eye) through the mean on a scatter diagram.

What is Mathspace

About Mathspace