Apparently, eating chocolate makes you smarter. Not only that, but the more you eat the smarter you become.
This might sound far-fetched, but it is essentially the conclusion of an article in the New England Journal of Medicine discussing the apparent relationship between chocolate and Nobel Prizes. The article is based on the data in the chart below, which shows an obvious relationship between the amount of chocolate consumed per capita in a particular country, and the number of Nobel Prizes awarded to that particular country per ten million people:
Notice the "$P$P value" in the top left corner. In statistics, a $P$P value is the probability that the relationship between two things would happen by chance alone. It is expressed as a decimal, with $0.1$0.1 being a $10%$10% chance, $0.01$0.01 being a $1%$1% chance, and so on. Generally, anything with $P<0.05$P<0.05 is considered a statistically significant finding, which means that it is unlikely to be just a coincidence. This particular chart has $P<0.0001$P<0.0001, which means there is only a $0.01%$0.01% chance that this finding would happen by chance alone. This means that the notion that higher chocolate consumption is linked with a larger number of Nobel Prizes is probably true.
Sounds great, right? Eat chocolate before every exam, and your marks should go up in no time!
Actually, it's not that simple. All that this graph shows is that countries which eat a lot of chocolate also have a lot of Nobel Prizes. It does not show that eating chocolate actually causes Nobel Prizes. The link between chocolate (variable $A$A) and Nobel Prizes (variable $B$B) could happen in a number of different ways:
Just because two things are correlated does not mean that one definitely causes the other. To sum up this important message, statisticians try to teach as many people who will listen to them that correlation does not imply causation. It is the most important lesson in all of statistics, one that many people can often forget.
As discussed in this article, the link between chocolate and Nobel Prizes is most likely a case where $C$C causes both $A$A and $B$B. The variable $C$C in this case is wealth - people in richer countries have more money to spend on luxury foods like chocolate, and also have more money to spend on research. To show this, the authors of this article performed a similar study on sales of luxury cars, and found that this too was well correlated with Nobel Prizes:
In this case, it seems that the link between chocolate and Nobel Prizes was really just because rich countries eat more chocolate and do more research than poorer countries.
Have a look at the following graphs, and see if you can explain the different possibilities for the relationship between them.
There is a very significant correlation here: as lemons increase, the number of car crashes decreases. There are a number of ways that this correlation could happen:
Again, this graph shows a very clear correlation: as pirate numbers have decreased, the global temperature has increased. Once again, this could be explained in a number of ways:
Hopefully the above examples have illustrated the way in which correlation does not imply causation. Therefore, when reading news articles which declare that "scientists have discovered a link between" two things, one needs to be skeptical. A "link" usually only means a correlation, and the jump from this correlation to causation can often be flawed.
For example, an article could be published with the title "Scientists discover a link between ice cream and drowning". Both ice cream and drowning occur more during hot weather, and so a "link" will definitely exist. It could also be said that "Scientists discover a link between walking sticks and heart attacks", since both tend to occur in elderly people. These "links" often refer to correlation only and do not represent a causal relationship.
There is no need to be too pessimistic. There are ways to prove causation. which are used often by scientists to ensure that the relationship between two things really is a case of one causing the other. The most commonly used method is a randomised controlled trial, which is used frequently in medicine to prove that a particular treatment really does cause patients to get better.
The most important lesson to take out of this is that "links" reported on in popular press may not be as scientific as they may seem, and often need a skeptical eye in evaluating their claims. This is particularly true when the two things which are "linked" have an obvious relation to wealth, such as "Scientists discover that eating lobster makes you live longer".
If you're really curious about whether chocolate will improve your test results, there's only one way to find out. Run a randomised controlled trial between you and your friends: randomly select half to eat chocolate before the test, and half to not eat chocolate, and see who gets the best test results!
Identify the correlation between the temperature and the number of heaters sold.
A positive correlation
A negative correlation
No correlation
A study found a strong correlation between the approximate number of pirates out at sea and the average world temperature.
Does this mean that the number of pirates out at sea has an impact on world temperature?
Yes
No
Which of the following is the most likely explanation for the strong correlation?
Contributing variables - there are other causal relationships and variables that come in to play and these may lead to an indirect positive association between the approximate number of pirates out at sea and the average world temperature.
Coincidence - there are no other contributing factors or reasonable arguments to be made for the strong positive association between the approximate number of pirates out at sea and the average world temperature.
Which of the following is demonstrated by the strong correlation between the approximate number of pirates out at sea and the average world temperature?
If there is correlation between two variables, then there must be causation.
If there is correlation between two variables, there isn't necessarily causation.
If there is correlation between two variables, then there is no causation.