In order to analyze the association between two numerical variables, we can first plot the data in a scatterplot. The independent variable is shown on the horizontal axis and the dependent variable is shown on the vertical axis. In this way, each data point is displayed as a point in a two-dimensional coordinate system.

To help us identify any correlation between the two variables, there are three things we focus on when looking at a scatterplot:

Direction
Form
Strength

Direction

The direction of the scatterplot refers to the pattern shown by the data points. We can describe the direction of the pattern as having positive correlation, negative correlation or no correlation:

Positive correlation
- A positive correlation occurs when the dependent variable increases as the independent variable increases.
- From a graphical perspective this occurs when the $y$y-coordinate increases as the $x$x-coordinate increases, which is similar to a line with a positive slope.
Negative correlation
- A negative correlation occurs when the dependent variable decreases as the independent variable increases.
- From a graphical perspective this occurs when the $y$y-coordinate decreases as the $x$x-coordinate increases, which is similar to a line with a negative slope.
No correlation
- No correlation describes a data set which has no relationship between the variables.
- This can come in the form of totally unrelated data, or data that indicates no change of dependent variable as the independent variable changes (like a horizontal straight line, which has zero slope).

Form

When we are looking at the form of a scatterplot we are looking to see if the data points show a pattern that has a linear form. If the data points lie on or close to a straight line, we can say the scatterplot has a linear form.

Forms other than a line may be apparent in a scatterplot. If the data points lie on or close to a curve, it may be appropriate to infer a non-linear form between the variables.

Strength

The strength of a linear correlation relates to how closely the points reassemble a straight line.

If the points lie exactly on a straight line then we can say that there is a perfect correlation.
If the points are scattered randomly then we can say there is no correlation.

Most scatterplots will fall somewhere in between these two extremes, and will display a weak, moderate or strong correlation.

To measure the strength of a linear correlation we calculate something called the correlation coefficient (also known as the r value). This calculation will be discussed in the next chapter.

Worked examples

example 1

Identify the type of correlation in the following scatter plot.

Think: If we draw a straight line through the points, we will be able to look at the slope of the line and how closely it fits the points. Here is a line that approximates the trend of the data:

Do: The line that we drew to approximate the data has a slope of around $+1$+1, so this is a positive correlation. The line fits quite closely to all of the points, so it is a strong correlation. In summary, we would say that this scatterplot indicates a strong, positive correlation.

example 2

Describe the correlation between the two variables; eye colour and IQ.

Think: Does a person's eye colour have anything to do with their IQ?

Do: Eye colour and IQ is an example of a pair of variables that have no correlation.

Practice questions

QUESTION 1

The following table has data results from an experiment.

$X$`X`	$2$2	$4$4	$7$7	$9$9	$12$12	$15$15	$17$17	$20$20
$Y$`Y`	$2$2	$4$4	$6$6	$8$8	$12$12	$18$18	$28$28	$38$38

Plot the data from the table on the graph below.

Loading Graph...
What is the type of correlation between the data points? Select the best answer.
Linear Positive
A
Linear Negative
B
Nonlinear
C
No Correlation
D

Question 2

The following table shows the number of traffic accidents associated with a sample of drivers of different age groups.

Age	Accidents
$20$20	$41$41
$25$25	$44$44
$30$30	$39$39
$35$35	$34$34
$40$40	$30$30
$45$45	$25$25
$50$50	$22$22
$55$55	$18$18
$60$60	$19$19
$65$65	$17$17

Which of the following scatter plots correctly represents the above data?

A

B

C
Is the correlation between a person's age and the number of accidents they are involved in positive or negative?
Positive
A
Negative
B
Is the correlation between a person's age and the number of accidents they are involved in strong or weak?
Strong
A
Weak
B
Which age group's data represent an outlier?
30-year-olds
A
None of them
B
65-year-olds
C
20-year-olds
D

Question 3

Consider the table of values that show four excerpts from a database comparing the income per capita of a country and the child mortality rate of the country. If a scatter plot was created from the entire database, what relationship would you expect it to have?

Income per capita	Child Mortality rate
$1465$1465	$67$67
$11428$11428	$16$16
$2621$2621	$35$35
$32468$32468	$9$9

Strongly positive
A
No relationship
B
Strongly negative
C

Outcomes

9.D1.3

Create a scatter plot to represent the relationship between two variables, determine the correlation between these variables by testing different regression models using technology, and use a model to make predictions when appropriate.

8.01 Scatterplots