topic badge

7.06 Statistical inquiry

Lesson

Human development and progress is based on our observations of the world:

  • What works or is good for us that we should build on?
  • What doesn't work or hurts us that we should stop doing?
  • What patterns continue in a predictable way?
  • What do we think may be true but that we need to investigate?

Statistical investigations can help us measure observations to make the best decisions.

For example, a worthwhile question may be:

Does internet use depend on literacy standards across different countries?

Can we assume that the internet is used regardless of whether people are literate or not? Or does it prevent people from accessing the internet, in which case should we put more resources towards improving literacy rates?

We may think that logically, it makes sense that someone's usage of the internet depends on their level of literacy.

But to answer the main question and other questions that may result from it more reliably, we can perform a process of statistical inquiry to see if there is a relationship between a country's internet usage and literacy rates.

Statistical inquiry can be summarised into a four-step process.

The steps of statistical inquiry
  1. Determine what question we want to answer and how we can go about answering it.
  2. Gather data relevant to answering the question.
  3. Perform any calculations required.
  4. Interpret the data and statistics and conclude by answering the question.
  1. When formulating the question, we should consider what statistics we need to be able to answer the question, and what data is needed to calculate the statistics. It is also worth looking into previous research at this point. This can give us some idea of what kind of questions we should be asking.

  2. There are several ways we can go about gathering data, depending on what kind of question we are asking. We can gather the data ourselves, by observation or experiment. Otherwise, we can find data gathered by other people, using online sources, books and journals.

    In either case, make sure that the data can be used to calculate statistics to answer the original question.
     
  3. After we've gathered data we can interpret it using any of the statistical techniques we've looked at in this chapter. If we want to find the strength of a linear relationship, we can find the correlation coefficient. If we want to find how quickly the dependent variable changes as the independent variable changes, we can find the gradient of the line of best fit. If we want to predict the dependent variable for an unmeasured independent variable, we can use interpolation or extrapolation. Keep in mind that these techniques assume the data is roughly linear and may not give accurate results if there is a non-linear relationship.
     
  4. To conclude, we want to be able to describe the statistical results in terms of the original question. Be careful here. We can't tell whether there is a causal link between two variables using just statistical techniques. It's also important to make note of how reliable the answer is. The more data points we have, the more reliable th
     

Exploration

Let's go back to the question posed earlier.

Does internet use depend on literacy across different countries?

Think: First, consider how literacy and internet use can be measured. The United Nations Educational, Scientific and Cultural Organisation (UNESCO) measures a country's literacy rate as the number of people aged 15 or over who can read or write divided by the total population. The International Telecommunications Union (ITU) measures the number of internet users as the number of people who have used the internet through any device within the past 12 months.

We want to use the literacy rate as our independent variable, and internet usage as our dependent variable. To make the variables comparable, we can express them both as percentages of the entire population.

To find any possible relationships between these two variables, we can use technology to fit a line of best fit.

Do: First we collate the data from these two organisations in a table (this table contains the data of only the first few countries).

Country Literacy Rate Internet Use Rate
Afghanistan $38.20%$38.20% $10.60%$10.60%
Albania $97.60%$97.60% $66.36%$66.36%
Algeria $80.20%$80.20% $42.95%$42.95%
Angola $71.10%$71.10% $13.00%$13.00%
Argentina $98.10%$98.10% $70.15%$70.15%
Armenia $99.80%$99.80% $62.00%$62.00%
Azerbaijan $99.80%$99.80% $78.20%$78.20%

Using the data of all countries, the line of best fit is $y=0.89x-0.32$y=0.89x0.32 and the correlation coefficient is $r=0.69$r=0.69. The scatterplot is displayed below.

Internet use rate vs. Literacy rate

Reflect: We can conclude by saying that a country's internet use increases as that country's literacy increases. There are a few considerations with this conclusion though.

First, this is not evidence that literacy causes an increase in internet use, or that internet use causes an increase in literacy. There could be a hidden factor that causes the increase in both variables, such as the health of the economy or the education policy of each country. Second, given our correlation coefficient, this is a moderately strong relationship. The data includes almost every country, so it should be reliable. However, looking at the shape of the data in the scatterplot, this could be a non-linear relationship instead.

The results from these statistics don't give us a clear answer to the question of whether a country's internet usage depends on literacy rates. And they definitely wouldn't be enough for governments to make decisions that affect the population.

So we would need to continue the process of statistical inquiry and look for more data that may help answer the question more definitively.

Practice questions

Question 1

What is the best way to gather data to investigate the relationship between the heights of basketball players and the number of points that they score?

  1. Conduct an observational study

    A

    Conduct an experiment

    B

    Look up official data

    C

Question 2

Which statistic is most appropriate for finding the strength of the relationship between an individual's ability in mathematics and their ability in music?

  1. Gradient of line of best fit

    A

    $y$y-intercept of line of best fit

    B

    Correlation coefficient

    C

    Extrapolation from the data

    D

    Interpolation from the data

    E

Question 3

In an inquiry to find the energy consumption of a car, $10$10 controlled experiments were performed and the data gathered below.

The correlation coefficient was found to be $r=0.99$r=0.99, and the line of best fit was found to be $y=0.09x+11.23$y=0.09x+11.23.

What can we conclude from this inquiry?

Distance in km (x) $147$147 $259$259 $317$317 $403$403 $448$448 $660$660 $705$705 $751$751 $756$756 $771$771
Petrol consumed in L (y) $25$25 $35$35 $41$41 $49$49 $47$47 $76$76 $72$72 $77$77 $82$82 $82$82

Outcomes

MS2-12-2

analyses representations of data in order to make inferences, predictions and draw conclusions

MS2-12-7

solves problems requiring statistical processes, including the use of the normal distribution and the correlation of bivariate data

What is Mathspace

About Mathspace