A major charity organisation is organising a very large fundraising event in your city, the "City to Beach" fun run. They are expecting to have 65000 entrants in the inaugural event. To raise the profile of the event, every competitor who enters the fun run will receive a promotional t-shirt. The t-shirt design is complete, and now we have been put in charge of ordering an appropriate number of t-shirts in a suitable range of sizes. This is a considerable problem–if we order too few, the fun run entrants will be unhappy if they miss out on a t-shirt that is the correct size, but if we order too many, we could end up trying to deal with 5000 unwanted XXXL t-shirts!
A size chart from the t-shirt manufacturer is given below:
We will need to do our own research to obtain any other information and data that we need, and we don't have much time.
At the start of this chapter, we learned that the statistical investigation process is a cyclic process that involves several stages:
Statistical questions have these characteristics:
We will need to decide how to collect data that can be used for our investigation.
The most obvious option is to ask each competitor to select their preferred shirt size when they enter the fun run. This would be equivalent to a census. Unfortunately, this is not possible because it will not allow enough time for the t-shirts to be manufactured, printed and delivered.
Therefore we must consider obtaining data from primary or secondary sources. To save time and cost, we would prefer to use secondary data from a reliable source.
This is where our knowledge of mathematics can be a huge benefit. We have learned that natural variation often follows a normal distribution. If we can find out the characteristics of the population, we can make a statistical calculation without needing to collect and analyse large amounts of data.
The Australian Bureau of Statistics has data available that appears to meet our needs: How Australians Measure Up. We could use the mean and standard deviation of body weight data from Table 8 in the linked document.
Here are the statistics for the age group 22 - 44 years:
Mean (kg) | Standard deviation (kg) | |
---|---|---|
Male | 82.4 | 13.5 |
Female | 66.3 | 13.5 |
When we are communicating the results of our investigation we should record the source of this data, with sufficient detail that someone reading our report could verify the data themselves.
This is the stage of our investigation where we "do the maths". It is important that we work carefully and systematically to ensure that our results and conclusions are accurate.
Drawing together our assumptions with the information that we have obtained, here is an example of the information that we could have to work with:
We can use the normal distribution to calculate the probability that an individual competitor is within one of the given ranges.
The calculated probability can be used as an estimate of the proportion of the population in the given range, and we can multiply the size of the population by the probability to estimate the number of t-shirts required in that size using the parameters that we chose.
For example, the screenshots below show how this estimate can be determined for the number of size 'L' shirts needed for male competitors:
ClassPad
To the nearest whole number, we estimate that there will be 4116 male competitors needing a size "L" shirt.
Weight | < 57.5 kg | 57.5 - 62.5 kg | 62.5 - 67.5 kg | 67.5 - 77.5 kg | 77.5 - 87.5kg | 87.5 - 92.5kg | > 92.5kg |
---|---|---|---|---|---|---|---|
Size | XS | S | M | L | XL | XXL | XXXL |
Male | |||||||
Female | |||||||
Total |
We have now determined the quantities of each size of t-shirt that we need to order.
Often it is a good idea to construct a graph to represent our results. This is an excellent way to see for ourselves if the results appear to be reasonable and can also be used when we want to communicate our results and conclusions.
Rather than just presenting a graph without any explanation, we should describe the characteristics of the distribution that the graph displays. If relevant, we should refer to skew or symmetry, clusters and gaps, outliers or any other important information.
You will recall that the statistical investigation process is represented as a cyclical process. Now that we have produced our results, we need to consider if this is sufficient for our need, if our assumptions are valid, or if we need to further refine our methods to get more accurate results.
Competitors will be disappointed if they cannot get the size of t-shirt that they need. Perhaps we should order some extra t-shirts in each size. However, we don't want to be too wasteful, and the cost of the extra t-shirts will reduce the amount of money that is raised for charity.
We should certainly review our decisions after the fun run competitors have requested t-shirt sizes so that we can see if our estimates were accurate and be better prepared for the event next year. The requests from this event would be an accurate sample for the next event.
Most often a mathematical investigation is communicated by a written report but sometimes it might be appropriate to make a poster, a slide presentation, a video or even a verbal report.
In any case, the goal of our communications is to convey the important information to others in a systematic, clear and concise way that is best suited to the given task.
When we are creating a written report, there are some guidelines for organising the report so that it is easy for the readers to find the information that they need most easily. The structure of the report should use headings to delineate sections, and we can use images and tables to convey information most effectively.
Our report is meant to a formal document, so typing is preferred over hand-writing. If possible, equations and graphs should be laid out–it is not that hard with modern word processing and spreadsheet software.
A typical structure for a statistical investigation report would have these sections:
This format described above will not be suitable for all investigations, so you may choose to add additional sections, or break up these sections.
Introduction
The introduction presents an outline of the investigation, it needs to:
Data
In this section, we should describe, explain and justify the methods that you used to obtain data. Data can be presented in tables, graphs or lists; preferably using familiar mathematical formats.
Analysis
The analysis section contains the mathematical calculations along with an explanation and justification of the interpretations leading to the conclusions that we draw.
If the analysis is extensive, this section could just contain a summary of the mathematical analysis with references to further details in an appendix.
We could also choose to break the Analysis section into separate Results and Discussion sections.
Conclusions
The conclusion should be an interpretation of the mathematical and statistical results in the context of the investigation. It should be a concise statement of the most important information and must not introduce any new information.
A good conclusion should concisely: