A statistical inquiry is a process of transforming raw data into useful information that can tell us more about a subject and allow us to make recommendations and possibly make predictions of future outcomes. It consists of six stages:
The first stage is to pinpoint the final information that will be needed in order to be able to draw a conclusion. This involves coming up with questions that, if answered, would lead to meaningful information that would allow us to draw a conclusion and to make recommendations. For example, suppose you were in charge of the school’s funds that have been set aside for the development of a new sports field, but aren’t sure which type of field (e.g. cricket pitch, basketball court) would be of greatest benefit to students. To investigate this issue, you would need to ask questions such as “What is the most popular sport among students?” (you want to construct a type of field that would satisfy the majority of students), “Are there enough funds to construct the students’ preferred type of field?” (you can’t construct a type of field that you can’t afford) and “How long will it take to construct?” (there’s no point constructing the students’ preferred type of field if it takes 10 years to construct at which point none of those students will be around to enjoy it). With answers to these questions, you would then be able to decide which type of field would benefit students the most.
Once we have posed questions, we need to collect data to answer them. Before we do the actual collecting, we have to decide on how we will collect the data, the type of data we will collect and the sources from which we will collect them. The sources can be either primary or secondary. Collecting from a primary source involves collecting the data directly yourself by interviewing or observing others or even conducting experiments. When collecting data using any such methods, it is important to ensure that the data to be collected can be organised easily. For example, when creating a questionnaire, it would be better to include questions that are not open-ended, but rather have a limited number of options from which participants can choose their answers. This way, the answers collected can be easily tallied and organised. For instance, instead of asking someone “What is your favourite colour?”, it would be better to ask “Which of the following colours is your favourite?” and to list a few common colours that they can choose from, including an option of “Other” in case they would like to answer with a colour that is not one of those listed.
Using a secondary source involves gathering data that has already been collected or generated by others. This could involve gathering data from books or the internet. It is important that the data to be collected are from a reliable source and not from some obscure website or outdated book, otherwise the data may not be accurate. Some reliable sources of note are government organisations such as the Australian Bureau of Statistics and the Bureau of Meteorology, which have strict data collection methodologies in place to ensure the accuracy and reliability of their data.
In the third stage, we arrange the data we have collected into a form that gives structure and order to the data. A common way of accomplishing this is to use a table e.g. a frequency table. How this data will be organised will vary as a function of the nature of the statistical investigation. For example, if the data collected were the incomes of a group of workers, it would make more sense to organise the data into categories of income ranges i.e. to tally up the number of workers within certain income ranges such as $50,000-$60,000 rather than tally up the number of workers with an income of a particular value e.g. the number of workers with an income of $54,682.
The following are the HSC results of a class of 30 year 12 physics students.
81 | 90 | 93 | 79 | 71 | 88 | 64 | 75 | 59 | 80 |
84 | 72 | 77 | 80 | 73 | 67 | 85 | 76 | 71 | 91 |
78 | 82 | 70 | 75 | 89 | 83 | 74 | 72 | 81 | 80 |
Draw up a frequency table of the results with suitable groupings. (HINT: HSC results are usually grouped into bands.)
Once we have organised the data, we need to present the data in a form that will be easy to read, understand and analyse. Most often this will be accomplished by using a graph such as a column graph, bar graph, pie chart, dot plot or line chart. The particular type of graph to be used will depend on the purpose of the investigation. For example, in order to present data on the proportion of students with a particular type of favourite sport, it may be more appropriate to use a pie chart than a dot plot. Besides displaying the data in a graph, it may also be beneficial to summarise the data using statistical quantities such as the mean, median, mode and range.
After we have finished summarising and displaying the data, it is time to examine and interpret the data, to decide on what it means and to ultimately draw conclusions from it. This may involve identifying trends and patterns from the graph, and identifying how those trends and patterns change over time or across categories (such as across different populations). From these trends, we can then draw conclusions and possibly make predictions about future outcomes.
Once we have finished analysing the data, it is time to put everything together in a written report. Any report should address the background and aim of the statistical inquiry and the questions it sought to answer, detail the data collection method (including sources and type of data), involve a thorough discussion of the findings, list and explain the reasoning behind the conclusions, and, if appropriate, include recommendations for the future. It should also include the tables and graphs from steps 2 and 3 of the inquiry (even if only as part of the appendix).
Suppose the Roads and Traffic Authority (RTA) has tasked you with investigating the number of vehicles travelling past the front of your school on an average day in order to determine whether there is any need to implement new measures to manage traffic flow.