In order to analyze the association between two numerical variables, we can first plot the data in a scatterplot. The independent variable is shown on the horizontal axis and the dependent variable is shown on the vertical axis. In this way, each data point is displayed as a point in a two-dimensional coordinate system.
To help us identify any correlation between the two variables, there are three things we focus on when looking at a scatterplot:
The direction of the scatterplot refers to the pattern shown by the data points. We can describe the direction of the pattern as having positive correlation, negative correlation or no correlation:
When we are looking at the form of a scatterplot we are looking to see if the data points show a pattern that has a linear form. If the data points lie on or close to a straight line, we can say the scatterplot has a linear form.
Forms other than a line may be apparent in a scatterplot. If the data points lie on or close to a curve, it may be appropriate to infer a non-linear form between the variables.
The strength of a linear correlation relates to how closely the points reassemble a straight line.
Most scatterplots will fall somewhere in between these two extremes, and will display a weak, moderate or strong correlation.
To measure the strength of a linear correlation we calculate something called the correlation coefficient (also known as the r value). This calculation will be discussed in the next chapter.
Identify the type of correlation in the following scatter plot.
Think: If we draw a straight line through the points, we will be able to look at the slope of the line and how closely it fits the points. Here is a line that approximates the trend of the data:
Do: The line that we drew to approximate the data has a slope of around $+1$+1, so this is a positive correlation. The line fits quite closely to all of the points, so it is a strong correlation. In summary, we would say that this scatterplot indicates a strong, positive correlation.
Describe the correlation between the two variables; eye colour and IQ.
Think: Does a person's eye colour have anything to do with their IQ?
Do: Eye colour and IQ is an example of a pair of variables that have no correlation.
The following table has data results from an experiment.
$X$X | $2$2 | $4$4 | $7$7 | $9$9 | $12$12 | $15$15 | $17$17 | $20$20 |
$Y$Y | $2$2 | $4$4 | $6$6 | $8$8 | $12$12 | $18$18 | $28$28 | $38$38 |
Plot the data from the table on the graph below.
What is the type of correlation between the data points? Select the best answer.
Linear Positive
Linear Negative
Nonlinear
No Correlation
The following table shows the number of traffic accidents associated with a sample of drivers of different age groups.
Age | Accidents |
---|---|
$20$20 | $41$41 |
$25$25 | $44$44 |
$30$30 | $39$39 |
$35$35 | $34$34 |
$40$40 | $30$30 |
$45$45 | $25$25 |
$50$50 | $22$22 |
$55$55 | $18$18 |
$60$60 | $19$19 |
$65$65 | $17$17 |
Which of the following scatter plots correctly represents the above data?
Is the correlation between a person's age and the number of accidents they are involved in positive or negative?
Positive
Negative
Is the correlation between a person's age and the number of accidents they are involved in strong or weak?
Strong
Weak
Which age group's data represent an outlier?
30-year-olds
None of them
65-year-olds
20-year-olds
Consider the table of values that show four excerpts from a database comparing the income per capita of a country and the child mortality rate of the country. If a scatter plot was created from the entire database, what relationship would you expect it to have?
Income per capita | Child Mortality rate |
---|---|
$1465$1465 | $67$67 |
$11428$11428 | $16$16 |
$2621$2621 | $35$35 |
$32468$32468 | $9$9 |
Strongly positive
No relationship
Strongly negative