topic badge

Investigation: Causality or no causality

Lesson

Correlation and causation

Apparently, eating chocolate makes you smarter. Not only that, but the more you eat the smarter you become.

This might sound far-fetched, but it is essentially the conclusion of an article in the New England Journal of Medicine discussing the apparent relationship between chocolate and Nobel prizes. The article is based on the data in the chart below, which shows an obvious relationship between the amount of chocolate consumed per capita in a particular country, and the number of Nobel Prizes awarded to that particular country per ten million people:

Notice the "P value" in the top left corner. In statistics, a P value is the probability that the relationship between two things would happen by chance alone. It is expressed as a decimal, with 0.1 being a 10\% chance, 0.01 being a 1\% chance, and so on. Generally, anything with P < 0.05 is considered a statistically significant finding, which means that it is unlikely to be just a coincidence. This particular chart has P < 0.0001, which means there is only a 0.01\% chance that this finding would happen by chance alone. This means that the notion that higher chocolate consumption is linked with a larger number of Nobel Prizes is probably true.

Sounds great, right? Eat chocolate before every exam, and your marks should go up in no time!

Actually, it's not that simple. All that this graph shows is that countries which eat a lot of chocolate also have a lot of Nobel Prizes. It does not show that eating chocolate actually causes Nobel Prizes. The link between chocolate (variable A) and Nobel Prizes (variable B) could happen in a number of different ways:

  1. A causes B: Chocolate consumption causes Nobel Prizes. We might suggest that substances found in chocolate improve brain activity, thus leading to more research breakthroughs by scientists. In this way, we get the relationship depicted in the graph: countries like Switzerland eat a lot of chocolate and win a lot of Nobel Prizes.
  2. B causes A: Nobel Prizes cause chocolate consumption. We might suggest that when people from a particular country win a Nobel Prize, the people in that country celebrate by eating lots of chocolate. We would still see the same relationship depicted in the graph: countries like Switzerland win a lot of Nobel Prizes and eat lots of chocolate.
  3. C causes both A and B: Some other factor causes both Nobel Prizes and Chocolate consumption. For example, we might notice that Nobel Prizes are more frequent in countries which are quite cold (like Norway and Sweden). It is easier to store and eat chocolate when it is cold than when it is hot, so people living in cold countries might eat more chocolate. It is also easier to concentrate in cold weather than in hot weather, so people living in cold countries might be able to do better research and win more Nobel Prizes. In this way, we would still get the same kind of graph: Switzerland is a cold country, and so people eat lots of chocolate and win lots of Nobel Prizes.

 

Correlation does not imply causation

Just because two things are associated (or correlated) does not mean that one definitely causes the other. To sum up this important message, statisticians try to teach as many people who will listen to them that correlation does not imply causation. It is the most important lesson in all of statistics, one that many people can often forget.

As discussed, the link between chocolate and Nobel Prizes is most likely a case where C causes both A and B. The variable C, in this case, is wealth - people in richer countries have more money to spend on luxury foods like chocolate, and also have more money to spend on research. To show this, the authors of this article performed a similar study on sales of luxury cars, and found that this too was well correlated with Nobel Prizes:

In this case, it seems that the link between chocolate and Nobel Prizes was really just because rich countries eat more chocolate and do more research than poorer countries.

Let's have a look at the following graphs, and see if we can explain the different possibilities for the relationship between them.

 

Lemons prevent car crashes?

There is a very significant association (correlation) here: as lemons increase, the number of car crashes decreases. There are a number of ways that this correlation could happen:

  1. A causes B: Could lemons prevent car crashes? (Be creative!)
  2. B causes A: Could reduced car crashes cause the number of lemons imported to increase?
  3. C causes both A and B: Could some third factor cause both increased lemon imports and reduced car crashes?

 

Lack of pirates causes global warming?

Again, this graph shows a very clear correlation: as pirate numbers have decreased, the global temperature has increased. Once again, this could be explained in a number of ways:

  1. A causes B: Could low numbers of pirates cause global warming?
  2. B causes A: Could global warming reduce the number of pirates?
  3. C causes both A and B: Could some third factor cause both reduced numbers of pirates and increased global temperatures?

 

Scientists have discovered a link between...

Hopefully, the above examples have illustrated the way in which correlation does not imply causation. Therefore, when reading news articles that declare that "scientists have discovered a link between" two things, one needs to be skeptical. A "link" usually only means a correlation, and the jump from this correlation to causation can often be flawed.

For example, an article could be published with the title "Scientists discover a link between ice cream and drowning". Both ice cream and drowning occur more during hot weather, and so a "link" will definitely exist. It could also be said that "Scientists discover a link between walking sticks and heart attacks", since both tend to occur in elderly people. These "links" often refer to correlation only and do not represent a causal relationship.

 

Who to believe?

There is no need to be too pessimistic. There are ways to prove causation. which are used often by scientists to ensure that the relationship between two things really is a case of one causing the other. The most commonly used method is a randomised controlled trial, which is used frequently in medicine to prove that a particular treatment really does cause patients to get better.

The most important lesson to take out of this is that "links" reported on in popular press may not be as scientific as they may seem, and often need a skeptical eye in evaluating their claims. This is particularly true when the two things which are "linked" have an obvious relation to wealth, such as "Scientists discover that eating lobster makes you live longer".

Outcomes

ACMEM147

distinguish between causality and correlation through examples

What is Mathspace

About Mathspace