This chapter revisits ideas about causality and correlation that were discussed in a previous investigation.
When a change in the value of one variable quantity seems to be associated with a proportional change in another variable, we say there is a correlation (or a relationship) between the two variables.
A correlation between variables may be discovered in the course of an experiment or through an analysis of observational data.
In a typical experiment, a researcher sets one variable, called the independent or explanatory variable, to various levels and observes the corresponding values of the other variable, called the response variable. The pairs of data values obtained in this way may be graphed as a scatter plot and the best fitting straight line found. The correlation is said to be strong if the data points are reasonably close to the line and weak otherwise.
Correlation is measured on a continuous scale from -1 to 1 by a quantity called a correlation coefficient. Negative correlations occur when the best fitting line has a negative slope: when an increase in the independent variable is associated with a proportional decrease in the response variable. Correlations are strongest at the ends of the scale and weaker towards the central zero. A correlation with value 0, is really that there is no correlation at all .
The calculation of a correlation coefficient involves some specialised statistics calculated from the data, called the variances and the covariance. These measure the spread of the data sets about their mean values.
When a correlation is discovered through the analysis of data obtained by an observational study, the same use is made of a straight line graphical representation as in the experimental case and the strength of the correlation is measured in a similar way. Observational studies are important in medical research and in other situations where it would be unethical or impractical to perform experiments.
In the case of an observational study, more so than in an experiment, care must be taken not to assume that correlation implies causation.
In an experiment, it is usually reasonable to think that if values of an explanatory variable are deliberately chosen and the response variable is observed to change accordingly, then there is a causal relation between the explanatory and response variables. However, in an observational study, the values of both variables in the pair are merely observed, not chosen.
It is possible in an observational study for variables to be correlated purely by chance, or because both are dependent on a third, hidden variable. Moreover, if data sets A and B are correlated, it may be that values in B are causally determined by corresponding values in A, or conversely, it could be that values in set A are determined by corresponding values in B.
Thus, care is needed lest a correlation is wrongly taken to imply a causal relationship. To move from the discovery of a correlation to the claim that a causal effect has been found, researchers need to gather evidence external to the data. Typically a plausible physical mechanism needs to be demonstrated in order to explain the apparent relationship.
Identify the correlation between the temperature and the number of heaters sold.
A positive correlation
A negative correlation
No correlation
The table shows the number of fans sold at a store during days of various temperatures.
Temperature ($^\circ$°C) | $6$6 | $8$8 | $10$10 | $12$12 | $14$14 | $16$16 | $18$18 | $20$20 |
Number of fans sold | $12$12 | $13$13 | $14$14 | $17$17 | $18$18 | $19$19 | $21$21 | $23$23 |
Consider the correlation coefficient $r$r for temperature and number of fans sold. In what range will $r$r be?
$r=0$r=0
$r>0$r>0
$r<0$r<0
Is there a causal relationship?
Yes
No
A study found a strong correlation between the approximate number of pirates out at sea and the average world temperature.
Does this mean that the number of pirates out at sea has an impact on world temperature?
Yes
No
Which of the following is the most likely explanation for the strong correlation?
Contributing variables - there are other causal relationships and variables that come in to play and these may lead to an indirect positive association between the approximate number of pirates out at sea and the average world temperature.
Coincidence - there are no other contributing factors or reasonable arguments to be made for the strong positive association between the approximate number of pirates out at sea and the average world temperature.
Which of the following is demonstrated by the strong correlation between the approximate number of pirates out at sea and the average world temperature?
If there is correlation between two variables, then there must be causation.
If there is correlation between two variables, there isn't necessarily causation.
If there is correlation between two variables, then there is no causation.