Human development and progress is based on our observations of the world:
Statistical investigations can help us measure observations to make the best decisions.
For example, a worthwhile question may be:
Does internet use depend on literacy standards across different countries?
Can we assume that the internet is used regardless of whether people are literate or not? Or does it prevent people from accessing the internet, in which case should we put more resources towards improving literacy rates?
We may think that logically, it makes sense that someone's usage of the internet depends on their level of literacy.
But to answer the main question and other questions that may result from it more reliably, we can perform a process of statistical inquiry to see if there is a relationship between a country's internet usage and literacy rates.
Statistical inquiry can be summarised into a four-step process.
When formulating the question, we should consider what statistics we need to be able to answer the question, and what data is needed to calculate the statistics. It is also worth looking into previous research at this point. This can give us some idea of what kind of questions we should be asking.
Let's go back to the question posed earlier.
Does internet use depend on literacy across different countries?
Think: First, consider how literacy and internet use can be measured. The United Nations Educational, Scientific and Cultural Organisation (UNESCO) measures a country's literacy rate as the number of people aged 15 or over who can read or write divided by the total population. The International Telecommunications Union (ITU) measures the number of internet users as the number of people who have used the internet through any device within the past 12 months.
We want to use the literacy rate as our independent variable, and internet usage as our dependent variable. To make the variables comparable, we can express them both as percentages of the entire population.
To find any possible relationships between these two variables, we can use technology to fit a line of best fit.
Do: First we collate the data from these two organisations in a table (this table contains the data of only the first few countries).
Country | Literacy Rate | Internet Use Rate |
---|---|---|
Afghanistan | $38.20%$38.20% | $10.60%$10.60% |
Albania | $97.60%$97.60% | $66.36%$66.36% |
Algeria | $80.20%$80.20% | $42.95%$42.95% |
Angola | $71.10%$71.10% | $13.00%$13.00% |
Argentina | $98.10%$98.10% | $70.15%$70.15% |
Armenia | $99.80%$99.80% | $62.00%$62.00% |
Azerbaijan | $99.80%$99.80% | $78.20%$78.20% |
Using the data of all countries, the line of best fit is $y=0.89x-0.32$y=0.89x−0.32 and the correlation coefficient is $r=0.69$r=0.69. The scatterplot is displayed below.
Reflect: We can conclude by saying that a country's internet use increases as that country's literacy increases. There are a few considerations with this conclusion though.
First, this is not evidence that literacy causes an increase in internet use, or that internet use causes an increase in literacy. There could be a hidden factor that causes the increase in both variables, such as the health of the economy or the education policy of each country. Second, given our correlation coefficient, this is a moderately strong relationship. The data includes almost every country, so it should be reliable. However, looking at the shape of the data in the scatterplot, this could be a non-linear relationship instead.
The results from these statistics don't give us a clear answer to the question of whether a country's internet usage depends on literacy rates. And they definitely wouldn't be enough for governments to make decisions that affect the population.
So we would need to continue the process of statistical inquiry and look for more data that may help answer the question more definitively.
What is the best way to gather data to investigate the relationship between the heights of basketball players and the number of points that they score?
Conduct an observational study
Conduct an experiment
Look up official data
Which statistic is most appropriate for finding the strength of the relationship between an individual's ability in mathematics and their ability in music?
Gradient of line of best fit
$y$y-intercept of line of best fit
Correlation coefficient
Extrapolation from the data
Interpolation from the data
In an inquiry to find the energy consumption of a car, $10$10 controlled experiments were performed and the data gathered below.
The correlation coefficient was found to be $r=0.99$r=0.99, and the line of best fit was found to be $y=0.09x+11.23$y=0.09x+11.23.
What can we conclude from this inquiry?
Distance in km (x) | $147$147 | $259$259 | $317$317 | $403$403 | $448$448 | $660$660 | $705$705 | $751$751 | $756$756 | $771$771 |
---|---|---|---|---|---|---|---|---|---|---|
Petrol consumed in L (y) | $25$25 | $35$35 | $41$41 | $49$49 | $47$47 | $76$76 | $72$72 | $77$77 | $82$82 | $82$82 |