In order to analyse the association between two numerical variables, we can first plot the data in a scatterplot. The explanatory variable is shown on the horizontal axis and the response variable is shown on the vertical axis. In this way, each data point is displayed as a point in a two-dimensional coordinate system. Introduction to the lesson
To help us identify any correlation between the two variables, there are three things we focus on when looking at a scatterplot:
Direction
Form
Strength
The direction of the scatterplot refers to the pattern shown by the data points. We can describe the direction of the pattern as having positive correlation, negative correlation, or no correlation:
Positive correlation
A positive correlation occurs when the RV increases as the EV increases.
From a graphical perspective, this occurs when the y-coordinate increases as the x-coordinate increases, which is similar to a line with a positive gradient.
Negative correlation
A negative correlation occurs when the RV decreases as the EV increases.
From a graphical perspective, this occurs when the y-coordinate decreases as the x-coordinate increases, which is similar to a line with a negative gradient.
No correlation
No correlation describes a data set that has no relationship between the variables.
This can come in the form of totally unrelated data, or data that indicates no change of RV as the EV changes (like a horizontal straight line, which has zero gradient).
When we are looking at the form of a scatterplot we are looking to see if the data points show a pattern that has a linear form. If the data points lie on or close to a straight line, we can say the scatterplot has a linear form.
Forms other than a line may be apparent in a scatterplot. If the data points lie on or close to a curve, it may be appropriate to infer a non-linear form between the variables. We will only be using linear models in this course.
The strength of a linear correlation relates to how closely the points reassemble a straight line.
If the points lie exactly on a straight line then we can say that there is a perfect correlation.
If the points are scattered randomly then we can say there is no correlation.
Most scatterplots will fall somewhere in between these two extremes and will display a weak, moderate or strong correlation.
The following table has data results from an experiment.
X | 1 | 4 | 5 | 8 | 9 | 11 | 13 | 15 | 18 | 19 |
---|---|---|---|---|---|---|---|---|---|---|
Y | 2 | 4 | 6 | 8 | 12 | 24 | 30 | 46 | 52 | 64 |
Plot the data from the table on the graph below.
What is the type of correlation between the data points? Select the best answer.
There are three things we focus on when analysing a scatterplot:
Form: linear or non-linear, what shape the data has
If it is linear:
Direction: positive or negative, whether a line drawn through the data have a positive or negative gradient
Strength: strong, moderate, weak - how tightly the points model a line
If there is no connection between the two variables we say there is no correlation.