Language and Use of Statistics

Lesson

Well, here's one more excuse to eat chocolate. Apparently, it makes you smarter. And not only that, but the more you eat, the smarter you become.

It might sound far-fetched, but this is essentially the conclusion of an article in the New England Journal of Medicine discussing the apparent relationship between chocolate and Nobel Prizes. The article is based on this chart, which shows an obvious relationship between the amount of chocolate consumed per capita in a particular country, and the number of Nobel Prizes awarded to that particular country:

Notice the "P value" in the top left corner. In statistics, a P value is the probability that the relationship between two things would happen by chance alone. It is expressed as a decimal, with 0.1 being a 10% chance, 0.01 being a 1% chance, and so on. Generally, anything with P<0.05 is considered a *statistically **significant *finding, which means that it is unlikely to be just a coincidence. This particular chart has P<0.0001, which means there is only a 0.01% chance that this finding would happen by chance alone. This means that the notion that higher chocolate consumption is linked with a larger number of Nobel Prizes is probably true.

Sounds great, right? Eat chocolate before every exam, and your marks should go up in no time!

Actually, it's not that simple. All this graph shows is that countries which eat a lot of chocolate also have a lot of Nobel Prizes, not that eating chocolate actually causes Nobel Prizes. The link between chocolate (A) and Nobel Prizes (B) could happen in a number of different ways:

1. A causes B: Chocolate consumption causes Nobel Prizes. We might suggest that substances found in chocolate improve brain activity, thus leading to more research breakthroughs by scientists. In this way, we get the relationship depicted in the graph: countries like Switzerland eat a lot of chocolate and win a lot of Nobel Prizes.

2. B causes A: Nobel Prizes cause chocolate consumption. We might suggest that when people from a particular country win a Nobel Prize, the people in that country celebrate by eating lots of chocolate. We would still see the same relationship depicted in the graph: countries like Switzerland win a lot of Nobel Prizes and eat lots of chocolate.

3. C causes A and B: Some other factor causes both Nobel Prizes and Chocolate consumption. For example, we might notice that Nobel Prizes are more frequent in countries which are quite cold (like Norway and Sweden). It is easier to store and eat chocolate when it is cold than when it is hot, so people living in cold countries might eat more chocolate. It is also easier to concentrate in cold weather than in hot weather, so people living in cold countries might be able to do research better and win more Nobel Prizes. In this way, we still get the same graph: Switzerland is a cold country, and so people eat lots of chocolate and win lots of Nobel Prizes.

As you can see, just because two things are linked, this does not mean that one definitely causes the other. In fancy statistical language, a link between two variables is called a *correlation*. Just because two variables are *correlated, *this does not mean that one causes the other. To sum up this important message, statisticians try to teach as many people who will listen to them that *correlation does not imply causation. *It is the most important lesson in all of statistics, so go tell all of your friends too!

As discussed in this article, the link between chocolate and Nobel Prizes is most likely a case where C causes A and B. The variable C in this case is wealth, as people in richer countries have more money to spend on luxury foods like chocolate, and also have more money to spend on research. To show this, the authors of this article also did a similar study on sales of luxury cars, and found that they too correlated well with Nobel Prizes:

In this case, it seems that the link between chocolate and Nobel Prizes was really just because rich countries eat more chocolate and do more research than poorer countries.

Have a look at the following graphs, and see if you can explain the different possibilities for the relationship between them:

We see a very significant correlation here: as lemons increase, the number of car crashes decreases. There are a number of ways that this correlation could happen:

1. A causes B: Could lemons prevent car crashes? (Be creative!)

2. B causes A: Could reduced car crashes cause the number of lemons imported to increase? (Be even more creative!)

3. C causes A and B: Could some third factor cause both increased lemon imports and reduced car crashes?

4. Which of the above three explanations do you think is the most likely explanation for the correlation?

Again, it's a very clear correlation. As pirate numbers have decreased, the global temperature has increased. Once again, this could be explained a number of ways:

1. A causes B: Could low numbers of pirates cause global warming?

2. B causes A: Could global warming reduce the number of pirates?

3. C causes A and B: Could some third factor cause both reduced numbers of pirates and increased global temperatures?

4. Which of the above three explanations do you think is the most likely explanation for the correlation?

**Scientists Discover A Link Between...**

Hopefully the above examples have illustrated the way in which correlation does not imply causation. Therefore, when reading news articles which declare that "scientists discover a link between" between two things, one needs to be sceptical. A "link" usually only means a correlation, and the jump from this correlation to causation can sometimes be flawed.

We could easily publish an article with the title "Scientists discover a link between ice cream and drowning". Both ice cream and drowning occur during hot weather, and so a "link" will definitely exist. We could also say "Scientists discover a link between walking sticks and heart attacks" (both tend to occur in elderly people).

1. Make your own news headline, based on a "link" that you have discovered, to mislead your readers into thinking that one causes the other.

Well, you could become a die-hard sceptic. However, there is no need to be too pessimistic. There are ways to prove causation. which are used often by scientists to ensure that the relationship between two things really is a case of one causing the other. The most commonly used method is a randomised controlled trial, which is used frequently in medicine to prove that a particular treatment really does cause patients to get better.

The most important thing to take out of this lesson is that "links" that you read about in the popular press may not be as "scientific" as they may seem, and often need a sceptical eye in evaluating their claims. This is particularly true when the two things which are "linked" have an obvious relation to wealth, like "Scientists discover that eating lobster makes you live longer".

If you're really curious about whether chocolate will improve your test results, there's only one way to find out. Run a randomised controlled trial between yourself and your friends: randomly select half to eat chocolate before the test, and half to not eat chocolate, and see who gets the best test results!

Carry out investigations of phenomena, using the statistical enquiry cycle: A conducting surveys that require random sampling techniques, conducting experiments, and using existing data sets B evaluating the choice of measures for variables and the sampling and data collection methods used C using relevant contextual knowledge, exploratory data analysis, and statistical inference.

Use statistical methods to make an inference