 # 8.03 Fitting a straight line - least squares regression

Lesson

A line of best fit is a straight line that best represents bivariate data on a scatter plot. This line may pass through some of the points, none of the points, or all of the points. Lines of best fit are useful to determine the equation of the line and use the equation to make predictions.

### The least squares line of best fit

The most common method of fitting a line of best fit to a scatter plot of bivariate data is using the least squares method. This is then called a least squares line of best fit, or sometimes a Least Squares Regression Line.

So what is the Least Squares method?

In a sentence, it is a technique that minimises the sum of the squares of the vertical distance from each data point to a straight line. Perhaps an easier way to understand it is through a demonstration.

Experiment with this Geogebra applet.

1. Refresh the applet and generate a new scatter plot to experiment with.
2. Drag the slider to Stage 2 and move the blue dots around the rectangle until you have a line which you think is the Line of Best Fit.
3. Drag the slider to Stage 3 to see what are called residuals. For now, think of these as the distance between the line and the actual data.
4. Drag the slider to Stage 4. Here you see the residuals turn to squares. Now move the blue dots on the line around again and try to make the total area of all the squares combined to be as small as possible. (Note, depending on the scale, these squares may appear rectangular)
5. Drag the slider to Stage 5. Your blue Line of Best Fit should be very close to the green Least Squares Regression Line. And the sum of your squares should be very close to the least sum of the squares.
 Created with GeogebraAdapted from an applet created by mzilinskas

### The equation of a least squares line of best fit

The equation of a least squares line of best fit is the same as the equation of a straight line, but is usually presented in a different form. You may recall the equation of a straight line as

Equations of a straight line

$y=mx+c$y=mx+c

or

$y=ax+b$y=ax+b

However, a least squares line is usually written as,

Equation of a least squares regression line

$y=a+bx$y=a+bx

where $y$y is the dependent variable and $x$x is the independent variable.

Plotting a least squares line on a graph is the same as plotting the graph of a straight line and finding the equation of a least squares line from a graph is the same as finding the equation of a straight line from a graph.

#### Practice question

##### Question 1

The table shows the number of people who went to watch a movie $x$x weeks after it was released.

 Weeks ($x$x) $1$1 $2$2 $3$3 $4$4 $5$5 $6$6 $7$7 Number of people ($y$y) $17$17 $17$17 $13$13 $13$13 $9$9 $9$9 $5$5
1. Plot the points from the table.

2. If a line of best fit were drawn to approximate the relationship, which of the following could be its equation?

$y=-2x+20$y=2x+20

A

$y=2x+20$y=2x+20

B

$y=-2x$y=2x

C

$y=2x$y=2x

D

$y=-2x+20$y=2x+20

A

$y=2x+20$y=2x+20

B

$y=-2x$y=2x

C

$y=2x$y=2x

D
3. Graph the line of best fit whose equation is given by $y=-2x+20$y=2x+20.

4. Use the equation of the line of best fit to find the number of people who went to watch the movie $10$10 weeks after it was released.

## Determining the least squares line using CAS

Most of the time, you will be required to calculate the equation of the Least Squares Regression Line using technology.

Here's a video on how to use the TI-Nspire to create a scatter graph and calculate the equation of the Least Squares Regression Line.

### Outcomes

#### 9.D1.3

Create a scatter plot to represent the relationship between two variables, determine the correlation between these variables by testing different regression models using technology, and use a model to make predictions when appropriate.

#### 9.D2.1

Describe the value of mathematical modelling and how it is used in real life to inform decisions.