topic badge

11.03 Lines of best fit

Introduction

We have already learned to create a  scatter plot  and to perform analysis such as determining association. Another type of analysis we may choose to do to the graph of a scatter plots is to identify a line of best fit.

Line of best fit

A line of best fit (sometimes called a trend or regression line) is a straight line that best represents the data on a scatter plot. It always represents the general trend of the data.

Lines of best fit are really handy as we can use them to help us make predictions or conclusions about the data.

To draw a line of best fit by eye, balance the number of points above the line with the number of points below the line. , and place the line as close as possible to the points. You should generally ignore outliers (points that fall very far from the rest of the data) as they can skew the line of best fit. Later we will look at how we can calculate a line of best fit's equation.

\text{Independent variable}
\text{Dependent variable}

This is an example of what a good line of best fit might look like.

Recall that straight lines are widely used to model relationships between two quantities. For scatter plots that model linear association, we can describe the association as positive linear association, negative linear association or no association. We might even say that two variables have strong or weak association.

5
10
15
20
25
30
35
40
45
50
55
60
65
\text{Balls Hit}
5
10
15
20
25
30
35
40
45
50
55
60
65
\text{Runs}

Positive association - the data appears to gather in a positive relationship, similar to a straight line with a positive slope.

18
20
22
24
26
28
30
32
34
36
\text{Price}
100
110
120
130
140
150
160
170
180
190
\text{Copies}

Negative association - is when the data appears to gather in a negative relationship, similar to a straight line with a negative slope.

32
34
36
38
40
42
44
46
48
\text{Temperature }(\degree \text{F})
1
2
3
4
5
6
7
8
\text{Speed}

No association - when there is no relationship between the variables we say they have no association.

The more closely the plotted data resembles a straight line, the stronger the linear association is between the variables.

Just because two variables have an association, even a strong one, does not mean that one causes the other. For example, there is a strong association between height and stride length. However, it doesn't mean that if you take big steps you'll grow taller.

Examples

Example 1

The following scatter plot shows the data for two variables, x and y.

1
2
3
4
5
6
7
8
9
10
x
1
2
3
4
5
6
7
8
9
10
y

Draw a line of best fit for the data.

Worked Solution
Create a strategy

Draw a line that follows the trend of the points and have the same number of points above and below the line.

Apply the idea
1
2
3
4
5
6
7
8
9
10
x
1
2
3
4
5
6
7
8
9
10
y

Here is an example of a line of best fit that follows the trend of the data and has the same number of points above and below the line.

Idea summary

Types of linear association:

Positive linear association - the data appears to gather in a positive relationship, similar to a straight line with a positive slope.

Negative linear association - is when the data appears to gather in a negative relationship, similar to a straight line with a negative slope.

No association - when there is no relationship between the variables we say they have no association.

In drawing a line of best fit by eye, balance the number of points above the line with the number of points below the line, and place the line as close as possible to the points.

Predictions

If the points appear to lie close to a line, we conclude that a relationship probably exists and it is safe to make predictions using a line of best fit. Making predictions inside the range of the data is called interpolation.

In a well-designed experiment, a researcher is careful not to use the fitted line to make predictions about the response that would be observed to values of the independent variable that are outside the range of the values used in the experiment. For example, if in the experiment the smallest value of the independent variable was 10 and the largest 85, then it would be unwise to try to predict what the response would be when the independent variable was smaller than 10 or larger than 85.

To make such predictions beyond the range of the data is called extrapolation and is considered unsafe.

Examples

Example 2

The number of fish in a river is measured over a five year period.

The results are shown in the following table and plotted below with a line of best fit.

\text{Time in years }(t)012345
\text{Number of fish }(F)1\,9031\,9981\,9001\,5171\,6931\,408
2
4
6
8
10
12
14
16
18
20
22
t
200
400
600
800
1000
1200
1400
1600
1800
2000
F
a

Use the line of best fit to predict the number of years until there are no fish left in the river.

Worked Solution
Create a strategy

Since F is the number of fish, we are looking for number of years (t) when F=0.

Apply the idea

F=0 at the horizontal intercept of the line, which is at (20,0).

\displaystyle t\displaystyle =\displaystyle 20 \text{ years}
b

Predict the number of fish remaining in the river after 7 years.

Worked Solution
Create a strategy

We can move from the number of years on the horizontal axis up to the line, then across to the number of fish on the vertical axis.

Apply the idea
2
4
6
8
10
12
14
16
18
20
22
t
200
400
600
800
1000
1200
1400
1600
1800
2000
F

We can go up from t=7 vertically until we meet the line, then go left to the vertical axis at F=1300.

So the number of fish remaining is 1300.

c

Predict how long it will be before there are 900 fish left in the river.

Worked Solution
Create a strategy

We can move from the number of fish on the vertical axis across to the line, then down to the number of years on the horizontal axis.

Apply the idea
4
8
12
16
20
t
200
400
600
800
1000
1200
1400
1600
1800
2000
F

We can go across from F=900 until we meet the line, then travel vertically down to the horizontal axis at t=11.

So there will be 900 fish left in 11 years.

Idea summary

If the points appear to lie close to a line, we conclude that a relationship probably exists, and it is safe to make predictions using a line of best fit. Making predictions inside the range of the data is called interpolation.

To make such predictions beyond the range of the data is called extrapolation and is considered unsafe.

To make predictions using the line of best fit, move either horizontally or vertically from the known value on the axis to the line, then move either vertically or horizontally to the other axis to find the unknown value.

Outcomes

8.SP.A.2

Know that straight lines are widely used to model relationships between two quantitative variables. For scatter plots that suggest a linear association, informally fit a straight line, and informally assess the model fit by judging the closeness of the data points to the line.

What is Mathspace

About Mathspace