The growth in computerized technology has made it possible to analyze ever larger and more complex data sets, resulting in more efficient, responsive and adaptable processes. It has allowed for advances in a range of fields such as medicine, environmental science, transportation, manufacturing and logistics.
At the same time, data has also been used to influence opinion and create political divisions. It has intruded on personal privacy and created greater inequality. It is increasingly important for us to understand how data is collected and used, and be aware of the impact it may have on our lives and our future.
One decision that has to be made, before data collection begins, is whether to collect data from every member of a population, or to only collect data from a sample of members within that population.
In statistics, a population refers to every member within any particular group of interest. It could be the entire population of a country, the population of a school, the number of frogs living in a wetland, the number of trees in a forest, or the number of cars in a parking lot.
A survey conducted on every member of a population is called a census. In the US, a nationwide census is conducted every ten years. Data obtained from the census is used by the government to plan for the future direction of the country. It is needed for planning purposes for such things as the setting of electoral boundaries and the equitable distribution of resources. Apart from a count of people in each dwelling on census night, questions are asked of each household that are intended to inform public policy making.
Collecting data from every member of a population is the most accurate way of gathering information, but it is not always the most practical, and can be very expensive. For these reasons, data is often gathered from a smaller group, or sample, that can be used to estimate the characteristics of the wider population.
The size of the sample is an important consideration. If the sample is too large, it may be too expensive or timeconsuming to collect the data. If it is too small, the sample may not be representative of the population.
In ecological studies, a sampling technique, called capturerecapture, is used to estimate the number of individuals in a population. It involves tagging, releasing and then recapturing after a certain time has elapsed in order to estimate the size of a population.
Suppose a region is home to an unknown number of animals of a particular species. A researcher might capture some of the animals, tag them and then release them back into the environment. Sometime later when the released animals can be assumed to have become wellmixed with the rest of the population, another sample of the animals is captured. Some of these are likely to be the previously tagged individuals.
The proportion of tagged individuals in the second sample is likely to be approximately the same as the proportion that the original sample size is of the whole population size. This method assumes that the population is "closed". In other words, the two visits to the study area are close enough in time so that no individuals die, are born, or move into or out of the study area between visits. The model also assumes that no marks fall off animals between visits to the field site by the researcher, and that the researcher takes a random sample both times. Randomness of a wild sample may be difficult to guarantee in practice. Perhaps tagged animals were slower and more likely to be caught in both samples. Can you think of any other factors that may cause the population estimate to be inaccurate?
A sample was of $50$50 fish were caught in a lake. These were tagged and released. Some time later another $48$48 fish were caught. Of these, four were found to be tagged. Estimate the number of fish in the lake.
Think: The proportion of tagged fish in the second sample should be equal to the proportion of $50$50 fish out of the total population.
Do: The fraction of tagged fish in the second sample is $\frac{4}{48}=\frac{1}{12}$448=112 and it is known that there are $50$50 tagged fish in the lake.
$\frac{\text{Number tagged sample 1}}{\text{Total population}}$Number tagged sample 1Total population  $=$=  $\frac{\text{Number tagged sample 2}}{\text{Total number sample 2}}$Number tagged sample 2Total number sample 2 

$\frac{50}{\text{Total population}}$50Total population  $=$=  $\frac{1}{12}$112 

$50\times12$50×12  $=$=  $\text{Total population}$Total population 
Multiply both sides by $12\cdot\text{Total population}$12·Total population 
$\text{Total population}$Total population  $=$=  $600$600 

Therefore, there is estimated to be $600$600 fish in the lake.
$\frac{\text{Number tagged sample 1}}{\text{Total population}}$Number tagged sample 1Total population $=$=$\frac{\text{Number tagged sample 2}}{\text{Total number sample 2}}$Number tagged sample 2Total number sample 2
The heights (in cm) of a population of 3 people are $A$A, $B$B and $C$C.
List all possible samples of size 2 without replacement. For example if the first 2 are selected we can write that as $AB$AB.
Use commas to separate different samples.
If $A=171$A=171, $B=153$B=153 and $C=162$C=162, complete the following table:
Sample Values (cm)  Sample Mean 

$171$171, $153$153  $\editable{}$ cm 
$171$171, $162$162  $\editable{}$ cm 
$153$153, $162$162  $\editable{}$ cm 
What is the mean of the distribution of all possible sample means?
What is the population mean?
Is the mean of the sample means equal to the population mean?
Yes
No
Yes
No
An oil spill has spread over an area of $1650$1650 square kilometers. A team of biologists scan an area of $150$150 square kilometers, and find $272$272 dead marine animals. Find $y$y, the estimated number of dead marine animals over the entire area of the oil spill.
A local council wanted to monitor the number of rabbits in the area. They used the capturerecapture technique to estimate the population of rabbits. $219$219 rabbits were caught, tagged and released. Later, $42$42 rabbits were caught at random. $15$15 of these $42$42 rabbits had been tagged.
Find $k$k, the estimated population of rabbits. Round your answer to the nearest whole number if necessary.
Local council B conducted a similar study and found they had $15%$15% fewer rabbits. What was the estimated population of rabbits in council area B? Round to the nearest whole number if necessary.
Understand statistics as a process for making inferences to be made about population parameters based on a random sample from that population.
Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling.