From a scatter plot and a line of fit, we can further analyze an association between two variables by comparing the data points to the line of fit.
The difference between the observed value (the data point) and the predicted value (the value of the trend line at that point) is called a residual.
By taking the residuals of each point in the data set and plotting them at their corresponding x-values, we form a residual plot for the data.
By observing the original scatter plot and the residual plot, we can analyze the data to see if there is a correlation between the two variables.
We can also use the scatter plot and residual plot to look for outliers in the data set - data points which stand out by being far away from their predicted value.
Following are some example scatter plots with the line of best fit and residuals, and their corresponding residual plots.
Scatter plot and residual plot of weak positive linear association with no outliers:
Scatter plot and residual plot of strong negative linear association with an outlier:
Scatter plot and residual plot of non-linear association:
Describe the strength and direction of the correlation for the scatterplot shown.
Consider the data set shown in the table of values and scatter plot below with the line of fit y=-0.57x+16.74
x | 10 | 11 | 13 | 14 | 18 | 19 | 21 | 23 | 25 | 28 | 29 | 31 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 12 | 13 | 9 | 2 | 8 | 7 | 7 | 4 | 2 | 3 | -1 | -2 |
Create a residual plot for this data set.
Identify a possible outlier in the scatterplot and calculate its residual.