A major charity organisation is organising a very large fundraising event in your city, the "City to Beach" fun run. They are expecting to have 65000 entrants in the inaugural event. To raise the profile of the event, every competitor who enters the fun run will receive a promotional t-shirt. The t-shirt design is complete, and now we have been put in charge of ordering an appropriate number of t-shirts in a suitable range of sizes. This is a considerable problem–if we order too few, the fun run entrants will be unhappy if they miss out on a t-shirt that is the correct size, but if we order too many, this could significantly impact the funds raised and we could end up trying to deal with hundreds of unwanted XXXL t-shirts!
A size chart from the t-shirt manufacturer is given below:
We will need to do our own research to obtain any other information and data that we need, and we don't have much time.
At the start of this chapter, we learned that the statistical investigation process is a cyclic process that involves several stages:
Statistical questions have these characteristics:
1. Write a suitable statistical question that captures the requirements of this investigation.
We will need to decide how to collect data that can be used for our investigation.
The most obvious option is to ask each competitor to select their preferred shirt size when they enter the fun run. This would be equivalent to a census. Unfortunately, this is not possible because it will not allow enough time for the t-shirts to be manufactured, printed and delivered.
Therefore we must consider obtaining data from primary or secondary sources. To save time and cost, we would prefer to use secondary data from a reliable source.
2. What data do we require to answer the question formed?
3. Consider the data required. Is it easily obtainable?
4. Do we need to make some assumptions to simplify the problem?
5. If we only had data on height or weight but not paired data, which would best to use to assess the number of t-shirts of different sizes required? Why?
6. What sources may we consider to be reliable?
One possible source of information is The Australian Bureau of Statistics which has the following data available: How Australians Measure Up.
We could use the breakdown of participants measured weights (found on page 6) shown below.
Measured weight (kg) | Males (%) | Females (%) |
---|---|---|
Less than 50 | 0.2 | 6.8 |
50 to < 60 | 3.8 | 27.2 |
60 to < 70 | 15.6 | 32.9 |
70 to < 80 | 28.8 | 18.6 |
80 to < 90 | 27.0 | 8.4 |
90 to < 100 | 14.7 | 3.5 |
100 to < 110 | 6.7 | 1.5 |
110 or more | 3.1 | 1.0 |
Total | 100.0 | 100.0 |
Alternatively, if you are familiar with the normal distribution, you could use the mean and standard deviation of body weight data from the table on page 19, together with the assumption that the weight of competitors follows a normal distribution.
This is the stage of our investigation where we "do the maths". It is important that we work carefully and systematically to ensure that our results and conclusions are accurate.
7. Form a list of assumptions made to utilise the data. Such as:
8. Use the data together with your assumptions to estimate the number of shirts required for each size and record the results in a table similar to that shown below. (Show the weight interval for each size in the first row)
Weight | |||||||
---|---|---|---|---|---|---|---|
Size | XS | S | M | L | XL | XXL | XXXL |
Male | |||||||
Female | |||||||
Total |
We have now determined the quantities of each size of t-shirt that we need to order.
Often it is a good idea to construct a graph to represent our results. This is an excellent way to see for ourselves if the results appear to be reasonable and can also be used when we want to communicate our results and conclusions.
Rather than just presenting a graph without any explanation, we should describe the characteristics of the distribution that the graph displays. If relevant, we should refer to skew or symmetry, clusters and gaps, outliers or any other important information.
9. What type of graph would be best suited to displaying the results of our analysis? Use the technology to construct graphs showing the t-shirt quantities for each gender and the total.
10. Using mathematical terminology, how could we describe the distribution of t-shirt sizes resulting from your calculations?
You will recall that the statistical investigation process is represented as a cyclical process. Now that we have produced our results, we need to consider if this is sufficient for our need, if our assumptions are valid, or if we need to further refine our methods to get more accurate results.
Competitors will be disappointed if they cannot get the size of t-shirt that they need. Perhaps we should order some extra t-shirts in each size. However, we don't want to be too wasteful, and the cost of the extra t-shirts will reduce the amount of money that is raised for charity.
We should certainly review our decisions after the fun run competitors have requested t-shirt sizes so that we can see if our estimates were accurate and be better prepared for the event next year. The requests from this event would be an accurate sample for the next event.
11. Do the results of our calculations enable us to answer the statistical question that we posed?
12. Were the assumptions made reasonable?
13. Would another measurement be better to use than weight to select the best t-shirt size (e.g. height, Body Mass Index)? Justify your proposal.
14. How would you get more accurate information about the typical age and gender of fun-run participants? Explain your ideas.
Most often a mathematical investigation is communicated by a written report but sometimes it might be appropriate to make a poster, a slide presentation, a video or even a verbal report.
In any case, the goal of our communications is to convey the important information to others in a systematic, clear and concise way that is best suited to the given task.
When we are creating a written report, there are some guidelines for organising the report so that it is easy for the readers to find the information that they need most easily. The structure of the report should use headings to delineate sections, and we can use images and tables to convey information most effectively.
Our report is meant to a formal document, so typing is preferred over hand-writing. If possible, equations and graphs should be laid out–it is not that hard with modern word processing and spreadsheet software.
A typical structure for a statistical investigation report would have these sections:
This format described above will not be suitable for all investigations, so you may choose to add additional sections, or break up these sections.
Introduction
The introduction presents an outline of the investigation, it needs to:
Data
In this section, we should describe, explain and justify the methods that you used to obtain data. Data can be presented in tables, graphs or lists; preferably using familiar mathematical formats.
Analysis
The analysis section contains the mathematical calculations along with an explanation and justification of the interpretations leading to the conclusions that we draw.
If the analysis is extensive, this section could just contain a summary of the mathematical analysis with references to further details in an appendix.
We could also choose to break the Analysis section into separate Results and Discussion sections.
Conclusions
The conclusion should be an interpretation of the mathematical and statistical results in the context of the investigation. It should be a concise statement of the most important information and must not introduce any new information.
A good conclusion should concisely:
15. Write the complete the statistical investigation report for this investigation, following the guidelines provided.