The statistical investigation process is a process of transforming raw data into useful information that can tell us more about a subject and allow us to make recommendations and possibly make predictions of future outcomes. It consists of six stages:
The first stage is to pinpoint the final information that will be needed in order to be able to draw a conclusion. This involves coming up with questions that, if answered, would lead to meaningful information that would allow us to draw a conclusion and to make recommendations.
For example, suppose we were in charge of the school’s funds that have been set aside for the development of a new sports field, but we are not sure which type of field (e.g. baseball diamond, basketball court, track) would be of greatest benefit to students. To investigate this issue, we would need to ask questions such as:
Consider the following scenarios:
Once we have written questions, we need to collect data to answer them. We have to decide on how we will collect the data, the type of data we will collect, and the sources from which we will collect them. The sources can be either primary or secondary. Collecting from a primary source involves collecting the data directly ourselves by interviewing or observing others or even conducting experiments. When collecting data using any such methods, it is important to ensure that the data to be collected can be organized easily and is not biased. For example, when creating a questionnaire, it would be better to include questions that are not open-ended, but rather have a limited number of options from which participants can choose their answers. This way, the answers collected can be easily tallied and organized. For instance, instead of asking someone “What is your favourite colour?”, it would be better to ask “Which of the following colours is do you prefer?” and to list a few common colours that they can choose from, including an option of “Other” in case they would like to answer with a colour that is not one of those listed.
Using a secondary source involves gathering data that has already been collected or generated by others. This could involve gathering data from books or the internet. It is important that the data to be collected are from a reliable source and not from some obscure website or outdated book, otherwise the data may not be accurate. Some reliable sources of note are government organizations such as the Statistics Canada and Environment Canada, which have strict data collection methodologies in place to ensure the accuracy and reliability of their data.
For each of the following scenarios :
Compose a (non open-ended) question along with its response options that can be asked in order to investigate:
In the third stage, we arrange the data we have collected into a form that gives structure and order to the data. A common way of accomplishing this is to use a table, such as, a frequency table. How this data will be organized will vary as a function of the nature of the statistical investigation. For example, if the data collected were the incomes of a group of workers, it would make more sense to organize the data into categories of income ranges, i.e. to tally up the number of workers within certain income ranges such as \$50000-\$60000 rather than tally up the number of workers with an income of a particular value e.g. the number of workers with an income of \$54682.
The following are the exam results of a class of 30 Grade 12 physics students.
81 | 90 | 93 | 79 | 71 | 88 | 64 | 75 | 59 | 80 |
84 | 72 | 77 | 80 | 73 | 67 | 85 | 76 | 71 | 91 |
78 | 82 | 70 | 75 | 89 | 83 | 74 | 72 | 81 | 80 |
Draw up a frequency table of the results with suitable groupings.
Once we have organized the data, we need to present the data in a form that will be easy to read, understand and analyze. Often this will be accomplished by using a graph such as a bar graph, circle graph, histogram, line plot, or line graph. The particular type of graph to be used will depend on the purpose of the investigation. Besides displaying the data in a graph, it may also be beneficial to summarize the data using statistical quantities such as the mean, median, mode, and range.
After we have finished summarizing and displaying the data, it is time to examine and interpret the data, to decide on what it means and to ultimately draw conclusions from it. This may involve identifying trends and patterns from the graph, and identifying how those trends and patterns change over time or across categories (such as across different populations). From these trends, we can then draw conclusions and possibly make predictions about future outcomes.
Once we have finished analyzing the data, it is time to put everything together in a written report. A report should consist of:
Come up with a problem which you would like to investigate that would require a large amount of data. Write questions relating to this problem that, if answered, would lead to meaningful information allowing you to draw conclusions and to make recommendations. Work through the following statistical investigation process in order to finally draw some conclusions:
Here are some examples of problems requiring large amounts of data to help you: