Your task is to follow the statistical investigation process in order to investigate and prepare a report from data collected about passengers on the Titanic. The data can be found in the attached table.
Below is a summary of the statistical investigation process. The corresponding exercise contains questions that will help you along the way.
The statistical investigation process is a process that begins with the need to solve a real-world problem and aims to reflect the way statisticians work.
It is a cyclic process that involves several stages:
For example, if a scientist is using the statistical investigation process to investigate a possible relationship between litres of soft drink consumed per week and BMI for a set of people (bivariate data), then the steps may look something like this:
Note: a statistician must consider whether they will survey an entire population of interest (census) or a representative group from within the population (sample). The process of selecting a sample must be as unbiased as possible to keep the data as representative as possible.
A typical structure for a statistical investigation report would have these sections:
This format described above will not be suitable for all investigations, so you may choose to add additional sections, or break up these sections.
The introduction presents an outline of the investigation, including:
In this section, we should describe, explain and justify the methods that you used to obtain data. Data can be presented in tables, graphs or lists; preferably using familiar mathematical formats.
The analysis section contains the mathematical calculations along with an explanation and justification of the interpretations leading to the conclusions that we draw.
If the analysis is extensive, this section could just contain a summary of the mathematical analysis with references to further details in an appendix.
The conclusion should be an interpretation of the mathematical and statistical results in the context of the investigation. It should be a concise statement of the most important information and must not introduce any new information.
A good conclusion should concisely:
Task description: to analyse data on Titanic passengers in order to evaluate which type of passenger is most likely to survive.
The screenshot below shows the first few rows of the data table from the file titanic-data.csv containing information about passengers on the Titanic. (Data originally provided at https://www.kaggle.com/c/titanic-survival/data).
Notes about the data:
Data name |
Explanation |
---|---|
Survived |
0 = No 1 = Yes |
Pclass |
Also means socio-economic status (SES) 1st = Upperclass; 2nd = Middleclass; 3rd = Lowerclass |
Age |
Is in years. If estimated it is in form xx.5 If age is less than one year old it is given as a fraction |
Sibsp |
Number of siblings or spouses travelling with this passenger |
Parch |
Number of parents or children aboard travelling with this passenger |
Fare |
The passenger fare measured in pounds sterling |
Embarked | C = Cherbourg Q = Queenstown S = Southhampton |