When given bivariate data as a table of values, a scatterplot can be created to graph the data, where the explanatory variable is shown on the horizontal axis and the response variable is shown on the vertical axis. In this way, each data point is displayed as a point in a two-dimensional coordinate system.
An association between two variables is known as a correlation. A correlation may (or may not) signify a relationship between two variables. To identify any correlation between the two variables, there are three things to focus on when analysing a scatterplot:
Direction
Form
Strength
The direction of the scatterplot refers to the pattern shown by the data points. We can describe the direction of the pattern as having positive correlation, negative correlation, or no correlation:
Positive correlation
A positive correlation occurs when the RV increases as the EV increases.
From a graphical perspective, this occurs when the y-coordinate increases as the x-coordinate increases, which is similar to a line with a positive gradient.
Negative correlation
From a graphical perspective, this occurs when the y-coordinate decreases as the x-coordinate increases, which is similar to a line with a negative gradient.
No correlation
No correlation describes a data set that has no relationship between the variables.
The form of a scatterplot refers to the type of relationship the two variables may appear to share. For example, if the data points lie on or close to a straight line, the scatterplot has a linear form.
Forms other than a line may be apparent in a scatterplot. If the data points lie on or close to a curve, it may be appropriate to infer a non-linear form between the variables.
The strength of a linear correlation relates to how closely the points reassemble a straight line.
If the points lie exactly on a straight line then we can say that there is a perfect correlation.
If the points are scattered randomly then we can say there is no correlation.
Most scatterplots will fall somewhere in between these two extremes and will display a weak, moderate or strong correlation.
Create a scatter plot for the set of data in the table.
x | 1 | 3 | 5 | 7 | 9 |
---|---|---|---|---|---|
y | 3 | 7 | 11 | 15 | 19 |
Identify the type of correlation in the following scatter plot.
Consider the two variables: eye color and IQ. Do you think there is a relationship between them?
The scatter plot shows the relationship between sea temperature and the amount of healthy coral.
Describe the correlation between sea temperature the amount of healthy coral.
Which variable is the dependent variable?
Which variable is the independent variable?
The following table shows the number of traffic accidents associated with a sample of drivers of different age groups.
Age | Accidents |
---|---|
20 | 41 |
25 | 44 |
30 | 39 |
35 | 34 |
40 | 30 |
45 | 25 |
50 | 22 |
55 | 18 |
60 | 19 |
65 | 17 |
Construct a scatter plot to represent the above data.
Is the correlation between a person's age and the number of accidents they are involved in positive or negative?
Is the correlation between a person's age and the number of accidents they are involved in strong or weak?
Which age group's data represent an outlier?
Consider the table of values that show four excerpts from a database comparing the income per capita of a country and the child mortality rate of the country. If a scatter plot was created from the entire database, what relationship would you expect it to have?
Income per capita | Child Mortality rate |
---|---|
1\,465 | 67 |
11\,428 | 16 |
2\,621 | 35 |
32\,468 | 9 |
There are three things we focus on when analysing a scatterplot:
Form: linear or non-linear, what shape the data has
If it is linear:
Direction: positive or negative, whether a line drawn through the data have a positive or negative gradient
Strength: strong, moderate, weak - how tightly the points model a line
If there is no connection between the two variables we say there is no correlation.
A positive correlation is when the data appears to gather in a positive direction, similar to a straight line with a positive slope. The variables change in the same direction.
A negative correlation is when the data appears to gather in a negative direction. Similar to a straight line with a negative slope. In other words, as one variable increases, the other one decreases.
When there is no relationship between the variables we say they have no correlation.