The statistical investigation process is a process of transforming raw data into useful information that can tell us more about a subject and allow us to make recommendations and possibly make predictions of future outcomes. It consists of six stages:
The first stage is to pinpoint the final information that will be needed in order to be able to draw a conclusion. This involves coming up with questions that, if answered, would lead to meaningful information that would allow us to draw a conclusion and to make recommendations. For example, suppose we were in charge of the school’s funds that have been set aside for the development of a new sports field, but we are not sure which type of field (e.g. cricket pitch, basketball court) would be of greatest benefit to students. To investigate this issue, we would need to ask questions such as “What is the most popular sport among students?” (we want to construct a type of field that would satisfy the majority of students), “Are there enough funds to construct the students’ preferred type of field?” (we can’t construct a type of field that we can’t afford) and “How long will it take to construct?” (there’s no point constructing the students’ preferred type of field if it takes 10 years to construct at which point none of those students will be around to enjoy it). With answers to these questions, we would then be able to decide which type of field would benefit students the most.
Consider the following scenarios:
Once we have posed questions, we need to collect data to answer them. Before we do the actual collecting, we have to decide on how we will collect the data, the type of data we will collect and the sources from which we will collect them. The sources can be either primary or secondary. Collecting from a primary source involves collecting the data directly ourselves by interviewing or observing others or even conducting experiments. When collecting data using any such methods, it is important to ensure that the data to be collected can be organised easily. For example, when creating a questionnaire, it would be better to include questions that are not open-ended, but rather have a limited number of options from which participants can choose their answers. This way, the answers collected can be easily tallied and organised. For instance, instead of asking someone “What is your favourite colour?”, it would be better to ask “Which of the following colours is your favourite?” and to list a few common colours that they can choose from, including an option of “Other” in case they would like to answer with a colour that is not one of those listed.
Using a secondary source involves gathering data that has already been collected or generated by others. This could involve gathering data from books or the internet. It is important that the data to be collected are from a reliable source and not from some obscure website or outdated book, otherwise the data may not be accurate. Some reliable sources of note are government organisations such as the Australian Bureau of Statistics and the Bureau of Meteorology, which have strict data collection methodologies in place to ensure the accuracy and reliability of their data.
Determine whether the data to be gathered to investigate the following would be from a primary or secondary source. Also state the method (e.g. questionnaire, interview, observation, experiment), if the source is to be a primary one, or the source (e.g. books, newspapers, internet), if the source is to be a secondary one, would use to gather the data.
Compose a (non open-ended) question along with its response options that can be asked in order to investigate:
In the third stage, we arrange the data we have collected into a form that gives structure and order to the data. A common way of accomplishing this is to use a table, such as, a frequency table. How this data will be organised will vary as a function of the nature of the statistical investigation. For example, if the data collected were the incomes of a group of workers, it would make more sense to organise the data into categories of income ranges, i.e. to tally up the number of workers within certain income ranges such as \$50000-\$60000 rather than tally up the number of workers with an income of a particular value e.g. the number of workers with an income of \$54682.
The following are the HSC results of a class of 30 year 12 physics students.
81 | 90 | 93 | 79 | 71 | 88 | 64 | 75 | 59 | 80 |
84 | 72 | 77 | 80 | 73 | 67 | 85 | 76 | 71 | 91 |
78 | 82 | 70 | 75 | 89 | 83 | 74 | 72 | 81 | 80 |
Draw up a frequency table of the results with suitable groupings. (HINT: HSC results are usually grouped into bands.)
Once we have organised the data, we need to present the data in a form that will be easy to read, understand and analyse. Most often this will be accomplished by using a graph such as a column graph, bar graph, pie chart, dot plot or line chart. The particular type of graph to be used will depend on the purpose of the investigation. For example, in order to present data on the proportion of students with a particular type of favourite sport, it may be more appropriate to use a pie chart than a dot plot. Besides displaying the data in a graph, it may also be beneficial to summarise the data using statistical quantities such as the mean, median, mode and range.
After we have finished summarising and displaying the data, it is time to examine and interpret the data, to decide on what it means and to ultimately draw conclusions from it. This may involve identifying trends and patterns from the graph, and identifying how those trends and patterns change over time or across categories (such as across different populations). From these trends, we can then draw conclusions and possibly make predictions about future outcomes.
Once we have finished analysing the data, it is time to put everything together in a written report.
Introduction: Any report should address the background and aim of the statistical inquiry and the questions it sought to answer, detail the data collection method (including sources and type of data).
Numerical and graphical analysis: Data should be analysed using various statistical measures and it should include the tables and graphs which represent the data provided.
Interpretation of results: Consider the questions which were originally posed and interpret the results of the analysis in relation to these questions. Any trends and patterns in the data are considered and the statistics are related to the original problem. This includes a thorough discussion of the findings, listing and explaining the reasoning behind the conclusions, and, if appropriate, recommendations for the future.
Conclusion: A report should include a summary of the the findings.
Suppose the Roads and Traffic Authority (RTA) has tasked you with investigating the number of vehicles travelling past the front of your school on an average day in order to determine whether there is any need to implement new measures to manage traffic flow.
Come up with a problem which you would like to investigate. Write questions relating to this problem that, if answered, would lead to meaningful information allowing you to draw conclusions and to make recommendations. Work through the statistical investigation process in order to finally draw some conclusions.