We have already learned to create a  scatter plot and to perform analysis such as determining association. Another type of analysis we may choose to do to the graph of a scatter plots is to identify a line of best fit.
A line of best fit (sometimes called a trend or regression line) is a straight line that best represents the data on a scatter plot. It always represents the general trend of the data.
Lines of best fit are really handy as we can use them to help us make predictions or conclusions about the data.
To draw a line of best fit by eye, balance the number of points above the line with the number of points below the line. , and place the line as close as possible to the points. You should generally ignore outliers (points that fall very far from the rest of the data) as they can skew the line of best fit. Later we will look at how we can calculate a line of best fit's equation.
Recall that straight lines are widely used to model relationships between two quantities. For scatter plots that model linear association, we can describe the association as positive linear association, negative linear association or no association. We might even say that two variables have strong or weak association.
The more closely the plotted data resembles a straight line, the stronger the linear association is between the variables.
Just because two variables have an association, even a strong one, does not mean that one causes the other. For example, there is a strong association between height and stride length. However, it doesn't mean that if you take big steps you'll grow taller.
The following scatter plot shows the data for two variables, x and y.
Draw a line of best fit for the data.
Types of linear association:
Positive linear association - the data appears to gather in a positive relationship, similar to a straight line with a positive slope.
Negative linear association - is when the data appears to gather in a negative relationship, similar to a straight line with a negative slope.
No association - when there is no relationship between the variables we say they have no association.
In drawing a line of best fit by eye, balance the number of points above the line with the number of points below the line, and place the line as close as possible to the points.
If the points appear to lie close to a line, we conclude that a relationship probably exists and it is safe to make predictions using a line of best fit. Making predictions inside the range of the data is called interpolation.
In a well-designed experiment, a researcher is careful not to use the fitted line to make predictions about the response that would be observed to values of the independent variable that are outside the range of the values used in the experiment. For example, if in the experiment the smallest value of the independent variable was 10 and the largest 85, then it would be unwise to try to predict what the response would be when the independent variable was smaller than 10 or larger than 85.
To make such predictions beyond the range of the data is called extrapolation and is considered unsafe.
The number of fish in a river is measured over a five year period.
The results are shown in the following table and plotted below with a line of best fit.
\text{Time in years }(t) | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
\text{Number of fish }(F) | 1\,903 | 1\,998 | 1\,900 | 1\,517 | 1\,693 | 1\,408 |
Use the line of best fit to predict the number of years until there are no fish left in the river.
Predict the number of fish remaining in the river after 7 years.
Predict how long it will be before there are 900 fish left in the river.
If the points appear to lie close to a line, we conclude that a relationship probably exists, and it is safe to make predictions using a line of best fit. Making predictions inside the range of the data is called interpolation.
To make such predictions beyond the range of the data is called extrapolation and is considered unsafe.
To make predictions using the line of best fit, move either horizontally or vertically from the known value on the axis to the line, then move either vertically or horizontally to the other axis to find the unknown value.