topic badge

7.04 Univariate data, center, and spread

Measures of center

We can summarize data in many ways including using descriptive statistics like mean, median, mode or with data displays like histograms or boxplots. The way we summarize data can depend on the type of data. In this lesson, we will look at summarizing and formulating questions for univariate data.

Univariate data

Information gathered around a single characteristic. This data can be numerical or categorical.

Displays include: pictographs, bar graphs, line graphs, line plots/dot plots, stem-and-leaf plots, circle graphs, and histograms.

Example:

Scores on assessments, time spent looking at social media, hours spent on an activity

This histogram displays numerical data of the heights of students in a class.

A histogram on the height of students. Ask your teacher for more information.

Notice that there is only one characteristic (or attribute), height, that is being explored. The axes are the attribute and the frequencies. We can only compare the heights of different groups/bins within the data set.

The first bin is students who have heights in the interval 135 – \lt 140 \operatorname{cm}, the second bin is students who are 140 – \lt 145 \operatorname{cm} tall.

In general, a bin contains the lower value but not the upper value.

Previously, we have seen these measures of center:

Mean

The point on a number line where the data distribution is balanced.

\text{Mean}=\dfrac{\text{Sum of all the values in the data set}}{\text{Number of values in the data set}}

This is a measure of center and summarizes the whole data set with a single number.

Example:

\overline{x} and \mu can be used to represent the mean

Median

The middle value of a data set in ranked order.

This is a measure of center and summarizes the whole data set with a single number.

Mode

The piece of data that occurs most frequently.

This is a measure of center and summarizes the whole data set with a single number.

Sometimes one measure may better represent the data than another. When deciding which to use we need to ask ourselves "Which measure would best represent the type of data we have?"

Exploration

This histogram summarizes numerical univariate data.

Histogram with frequency from 0 to 6 and intervals of 200 for the horizontal axis. The column (0-99) has a frequency of 4, the second column (100-199) has a frequency of 6. the third column (200-299) has a frequency of 2, the fourth column (300-399) hasa frequency of 4, the fifth column (400-499) has a frequency of 1, and the last column (800-899) has a frequency of 1.
  1. We can say the modal class is 100– \lt 200. What do you think that means?

  2. One data value is 842, what could we call that value?

  3. One of the measures of center is 199. Which measure of center would this be? Explain.

  4. One of the measures of center is 114. Which measure of center would this be? Explain.

Depending on what the distribution looks like when graphed using a histogram, the measures of center locations can vary.

  • Mean and median can be the same and in the same bin as the mode

    A histogram with all the columns having the same frequency, with its mean and median having the same value. A statistics table is also shown. See your teacher for more info.
  • Mean and median can be similar and both in the same bin as the mode

    A histogram with almost similar mean and meadian both in the same bin as its mode. A statistics table is also shown. See your teacher for more info.
  • Mean and median can be quite different, but in the same bin as the mode.

    A histogram with different mean and median but in the same bin as the mode. A statistics table is also shown. See your teacher for more info.
  • Mean, median, and mode can all be in different bins

    A histogram with its mean, median and mode in different bins. A statistics table is also shown. See your teacher for more info.
BenefitsDrawbacks
MeanIncludes all of the data in the calculation, widely usedHeavily impacted by extreme values or uneven distributions
MedianTells us the middle, not impacted by extreme valuesDoes not include all data values
ModeQuick to identify, tells us about the most frequent value(s)Does not include all data value, is not necessarily in the middle

Examples

Example 1

The salaries of part-time employees at a company are given in the dot plot, rounded to the nearest thousand.

A line plot titled Salaries in thousand dollars, ranging from 18 to 38 in steps of 1. Ask your teacher for more information.
a

Which measure of center best reflects the typical wage of a part-time employee?

Worked Solution
Create a strategy

Choose the measure that is appropriate for data sets with extreme values.

Apply the idea

The median is the best measure of center that reflects the typical wage of a part-time employee due to presence of the three extreme values in the data set.

Reflect and check

The mean is 22.9 which is not a very good estimation of the typical salary. It is pulled up by the three much larger values. This might be used to manipulate the analysis to make it look like employees are getting paid more generously than they actually are, so is not a good measure of center.

The mode is 18 which is definitely not a good measure of the typical salary. It represents the minimum salar. This might be used to justify that employees need a raise as it is much lower than the typical salary. This would not accurately reflect the employee salaries.

b

Calculate and interpret the chosen measure of center from part (a).

Worked Solution
Create a strategy

Remember that the median is the middle value of a data set ranked in order. Start by arranging the salaries in order from lowest to highest. If the total number of salaries is odd, then the median is the middle value. If the total number is even, then the median is the average of the two middle values.

Apply the idea

18,18,18,18,19,19,19,20,20,21,21,21,22,22,22,23,23,37,38,38

There total number is even and the two middle values are both 21. This indicates that our median for the data set is 21.

The typical wage of a part-time employee at the given company is \$21\,000.00.

Reflect and check

We can compare this to a mean of \$22\,850 and a mode of \$18\, 000.

Example 2

A journalist wanted to report on road speed cameras being used as revenue raisers. She obtained data that showed the number of times 20 speed cameras issued a fine to motorists in one month. The results were: 101,\,102,\,115,\,115,\,121,\,124,\,127,\,128,\,130,\, 130,\,\\ 143,\, 143,\,146,\,162,\,162,\,163,\,178,\,183,\,194,\,977

a

Determine the mean number of times a speed camera issued a fine in that month. Give your answer rounded to one decimal place.

Worked Solution
Create a strategy

To find the mean, use the formula: \text{Mean}=\dfrac{\text{Sum of values}}{\text{Number of values}}

Apply the idea

Add all the number of times a speed camera issued a fine and divide by the total number of cameras:

\displaystyle \text{Mean}\displaystyle =\displaystyle \dfrac{3644}{20}Find the sum of the values
\displaystyle =\displaystyle 182.2Divide and round your answer
b

Determine the median number of times a speed camera issued a fine in that month. Give your answer rounded to one decimal place.

Worked Solution
Create a strategy

The median in a data set with an even number of values is the average of the two middle data values.

Apply the idea

First half of the set: 101,\,102,\,115,\,115,\,121,\,124,\,127,\,128,\,130

Second half of the set: 143,\,146,\,162,\,162,\,163,\,178,\,183,\,194,\,977

The middle values of the set: 130,\,143

\displaystyle \text{median}\displaystyle =\displaystyle \dfrac{130+143}{2}Find the average of the middle values
\displaystyle =\displaystyle \dfrac{273}{2}Evaluate the addition
\displaystyle =\displaystyle 136.5Evaluate the division
c

The journalist wants to give the impression that speed cameras are just being used to raise revenue. Which statement should she make? Explain.

A
An investigation into speed cameras found that the median number of fines in one month was 137.
B
An investigation into speed cameras found that, on average, 182 fines were issued in one month.
Worked Solution
Create a strategy

Choose the option which uses the larger number out of the mean and median.

Apply the idea

The correct option is B: An investigation into speed cameras found that, on average, 182 fines were issued in one month.

The mean was heavily impacted by the value 977, so is much higher which gives the impression that a lot of money is coming from speed cameras.

Reflect and check

A neutral and more thorough report would give some insights as to the sample size, the range, and any extreme values as they may be due to input or camera error.

Idea summary

We can summarize numerical univariate data using measures of center. A measure of center is a single data value that describes the center or middle of a whole set of data.

  • Mean: the point on a number line where the data distribution is balanced.

  • Median: the middle value of a data set in ranked order.

  • Mode: the piece of data that occurs most frequently.

BenefitsDrawbacks
MeanIncludes all of the data in the calculation, widely usedHeavily impacted by extreme values
MedianTells us the middle, not impacted by extreme valuesDoes not include all data values
ModeQuick to identify, tells us about the most frequent value(s)Does not include all data value, is not necessarily in the middle

Measures of spread and dispersion

Previously, we have seen these measure of dispersion (spread):

Range

The difference between the upper extreme and the lower extreme.

This is a measure of dispersion or spread and summarizes the variation in the data set. It can be heavily impacted by extreme values.

Interquartile range

The difference between the upper quartile and the lower quartile.

This is a measure of dispersion or spread and summarizes the variation in the data set. It is not usually impacted by extreme values.

There are two more measures of spread or dispersion called the variance and the standard deviation. Let's explore why they are helpful.

Exploration

Two sets of data were collected from two different samples of passengers in an airport. Passengers were asked the approximate duration of the flight they had just been on.

Set A1234567891011121314151617
Set B1111599999991317171717
  1. Find the mean of both data sets. What does it tell us about the data?

  2. Find the median of both data sets. What does it tell us about the data?

  3. Find the range of both data sets. What does it tell us about the data?

  4. Find the interquartile range of both data sets. What does it tell us about the data?

  5. How would you describe the differences between the two sets in a way that the given summary statistics don't show?

The variance is a way of showing how spread out numbers in a data set are. The variance looks at the square of the distances between each data value from the mean.

Consider the following data set with a mean of 73.7: 100, 51, 79, 57, 60, 64, 95, 98, 56, 77

1
2
3
4
5
6
7
8
9
10
x
10
20
30
40
50
60
70
80
90
100
y

We can visualize the variance of the given data set by looking at the distance between the data value, shown on the y-axis and the mean 73.7, shown as a horizontal line.

Notice that some of these distances are positive and some are negative. To avoid the positive and negative distances 'canceling' each other out, we square the distances to make them all positive.

Then we find the average of these squared distances. This is the variance.

A small variance indicates that most scores are close to the mean, while a large variance indicates that the scores are more spread out. This is the formula:

\displaystyle \sigma^{2}=\dfrac{\left(x_{1} - \mu \right)^{2}+ \left(x_{2} - \mu \right)^{2}+\ldots +\left(x_{N} - \mu \right)^{2}}{N}
\bm{\sigma}
Population variance
\bm{N}
Population size
\bm{x}
A particular data value
\bm{\mu}
Population mean

The standard deviation is the square root of the variance and is often used when analyzing the dispersion of univariate data. (The square root undoes the squaring we did to make the distances positive when finding the variance). This is the formula:

\displaystyle \sigma=\sqrt{\dfrac{ \left(x_{1} - \mu \right)^{2}+ \left(x_{2} - \mu \right)^{2}+\ldots +\left(x_{N} - \mu \right)^{2} }{N}}
\bm{\sigma}
Population standard deviation
\bm{N}
Population size
\bm{x}
A particular data value
\bm{\mu}
Population mean
xx-\mu\left(x - \mu \right)^2
10026.3691.69
51-22.7515.29
795.328.09
57-16.7278.89
60-13.7187.69
64-9.794.09
9521.3453.69
9824.3590.49
56-17.7313.29
773.310.89
\mu=73.7\text{Sum}=3164.10

To calculate by hand:

  1. Find the mean of the data set, \mu

  2. Find distance each point is from the mean, x-\mu

  3. Square the distances to prevent negative distances from canceling out positive, \left(x - \mu \right)^{2}

  4. Find the mean of the squared distances - now we have the variance which gives us one measure of dispersion\\ \sigma^{2}=\dfrac{ \left(x_1 - \mu \right)^{2}+\ldots +\left(x_N - \mu \right)^{2} }{N}\approx 316.41

  5. Remember we squared the distances, so we square root to get the standard deviation: \sigma=\sqrt{\frac{3164.10}{10}}\approx17.79

The process for calculating standard deviation is time consuming, so we will be using our calculator to find the standard deviation. In statistics mode on a calculator, the symbol \sigma_{n} or \sigma_{x} may also be used.

There is a second type of standard deviation for when you are working with a sample and not a population. This is the sample standard deviation, with the symbol s or s_{x}. Generally, s and \sigma will be fairly close.

Here are some examples of what a data set can look like with the same measures of center, but different measures of dispersion.

  • Mean =5, Median =5, Standard deviation=2.84

    A histogram with mean and meadian of 5, and standard deviation of 2.84. See your teacher for more info.
  • Mean =5, Median =5, Standard deviation=2

    A histogram with mean and meadian of 5, and standard deviation of 2. See your teacher for more info
  • Mean =5, Median =5, Standard deviation=0.93

    A histogram with mean and meadian of 5, and standard deviation of 0.93. See your teacher for more info
  • Mean =5, Median =5, Standard deviation=0

    A histogram with mean and meadian of 5, and standard deviation of 0. See your teacher for more info

When comparing the standard deviation, we should consider the scale or order of magnitude of the data. For example, the standard deviation for human baby weights might be smaller than for whale baby weights, but that does not necessarily mean that human baby weights are more consistent.

Examples

Example 3

The number of push-ups Mario does each day is shown.33,\,32,\,32,\,32,\,31,\,32,\,32,\,32,\,32,\,32

a

Calculate the variance by hand using a spreadsheet or table. Round your answer to two decimal places.

Worked Solution
Create a strategy

Use the formula \sigma = \dfrac{\left(x_{1} - \mu \right)^{2}+ \left(x_{2} - \mu \right)^{2}+\ldots +\left(x_{N} - \mu \right)^{2}}{N}.

\text{No. of push-ups} \left(x \right)\left(x - \mu\right)\left(x - \mu\right)^{2}
33⬚⬚
32⬚⬚
32⬚⬚
32⬚⬚
31⬚⬚
32⬚⬚
32⬚⬚
32⬚⬚
32⬚⬚
32⬚⬚

First, calculate the population mean, \mu.

Then, use a table like the one shown to find the sum of \left(x-\mu\right)^{2}.

Finally, divide the sum by the population size and find the square root of the result.

Apply the idea
  1. Calculate the population mean:

    \displaystyle \mu\displaystyle =\displaystyle \dfrac{33+32+32+32+31+32+32+32+32+32}{10}Use the formula for mean
    \displaystyle =\displaystyle 32Evaluate
  2. Complete the table and find the sum of all \left(x-\mu\right)^2:

    \text{No. of runs} \left(x \right)\left(x - \mu\right)\left(x - \mu\right)^{2}
    3311
    3200
    3200
    3200
    31-11
    3200
    3200
    3200
    3200
    3200
    \displaystyle \left(x_{1} - \mu \right)^{2}+ \left(x_{2} - \mu \right)^{2}+\ldots +\left(x_{N} - \mu \right)^{2}\displaystyle =\displaystyle 1+0+0+0+1+0+0+0+0+0
    \displaystyle =\displaystyle 2
  3. Divide by N=10 to find the variance.

    \displaystyle \sigma^2\displaystyle =\displaystyle \dfrac{\left(x_{1} - \mu \right)^{2}+ \left(x_{2} - \mu \right)^{2}+\ldots +\left(x_{N} - \mu \right)^{2}}{N}Write the formula
    \displaystyle =\displaystyle \dfrac{2}{10}Substitute known values
    \displaystyle =\displaystyle 0.2Evaluate

The population variance is approximately \sigma^{2}=0.2.

Reflect and check

The variance is quite small, so indicates that there is not much variation or dispersion in the data set.

b

Calculate his standard deviation by hand using a spreadsheet or table. Round your answer to two decimal places.

Worked Solution
Create a strategy

We have already calculated the variance, so to find the standard deviation, we just need to take the square root.

Apply the idea

Find the square root of the variance:

\displaystyle \sigma\displaystyle =\displaystyle \sqrt{\sigma^{2}}
\displaystyle =\displaystyle \sqrt{0.2}Use the answer from the previous part
\displaystyle =\displaystyle 0.4472Evaluate

The population standard deviation is approximately \sigma=0.45.

c

Use technology to find his standard deviation, rounded to two decimal places.

Worked Solution
Create a strategy

Use the population standard deviation function, \sigma on your calculator.

Apply the idea

Using Statistics mode, enter each data point into your calculator, then choose One Variable Analysis.

A screenshot of the GeoGebra Statistics tool showing the menu that contains the One Variable Analysis option. Speak to your teacher for more details.

Calculate the statistics by choosing the button with the \Sigma \text{x} symbol. Then, look for the population standard deviation, \sigma.

A screenshot of the GeoGebra Statistics tool showing how to calculate the statistics of the data set provided. Speak to your teacher for more details.

\sigma \approx 0.45

Reflect and check

Notice, there was also the sample standard deviation s=0.4714 which is close to \sigma=0.4472, so would allow us to describe the spread, but not give the same precise value.

Example 4

The given data sets show the time to get to school for 20 students at two different school, rounded to the nearest minute. Some summary statistics are given.

School A
345891111111213
14151518192733334245
School B
15151717171818181819
19222222232323242424
School ASchool B
Mean17.419.9
Median13.519
Variance142.149.09
a

Interpret and compare the means for both schools.

Worked Solution
Create a strategy

The mean is also called the average and describes a typical value that balances all of the data points. Check the difference between the means provided in the summary.

Apply the idea

We can see from the given summary that School A has a lower mean of 17.4 compared to School B that has a mean of 19.9. This shows that the typical travel time for students going to School A is shorter by 2.5 minutes compared to students going to School B.

b

Interpret and compare the medians for both schools.

Worked Solution
Create a strategy

Median is the middle value of a data set, which is not impacted by extreme values. Check the difference between the medians provided in the summary.

Apply the idea

We can see from the given summary that School A has a lower median of 13.5 compared to School B that has a median of 19. This shows that the typical travel time for students going to School A is shorter by 5.5 minutes compared to students going to School B.

Reflect and check

The two measures of center tell the same story that School B students have a longer journey to school than School A students. However, the median makes this conclusion more obvious because the difference is larger.

c

Calculate the standard deviations for both schools.

Worked Solution
Create a strategy

We were given the variance, so just need to use that the standard deviation is the square root of the variance.

Apply the idea

We were given that:

School ASchool B
Variance142.149.09

So we can square root to get:

School ASchool B
Variance142.149.09
Standard deviation\sqrt{142.14}=11.92\sqrt{9.09}=3.01
Reflect and check

The units for the standard deviation are the same as the original context, so we can say that the standard deviation for School A is 11.92 minutes and the standard deviation for School B is 3.01 minutes.

d

Interpret and compare the standard deviations for both schools.

Worked Solution
Create a strategy

Observe the difference between the standard deviations for Schools A and B. Remember that the standard deviation quantifies how spread out the values in the data set are relative to the mean value.

Apply the idea

The standard deviation for school A is 11.92 while it is 3.01 for school B. This suggests that the time students take to get to school A have greater variability or the values are more spread out, compared to school B. On average, there is 11.92 minutes of difference from the mean for school A, indicating higher variability. On the other hand, there is less variability or the values are closer to each other for school B,with an average difference from the mean of 3.01 minutes.

Reflect and check

The higher standard deviation or variability for students traveling to school A suggests that they may have encountered heavier traffic conditions, frequent road closures, or other factors that contribute in increased travel time.

Idea summary

We can describe univariate data using measures of dispersion (spread). Measures of dispersion (spread) are a single data value that describes how varied a data set is.

  • Range: the difference between the upper extreme and the lower extreme.

  • Interquartile range: the difference between the upper quartile and the lower quartile.

  • Variance: A measure of the spread of a data set. the mean of the squares of the differences between each element and the mean of the data set.

  • Standard deviation: A measure of the spread of a data set. The square root of the mean of the squares of the differences between each element and the mean of the data set or the square root of the variance.

BenefitsDrawbacks
RangeEasy to calculate, tells about the extremes of the dataHeavily impacted by extreme values
Interquartile rangeTells us about the middle half of the data, not impacted by extreme valuesDoes not include all data values
Standard deviationTells us about how far values are from the mean, widely used in other areas of statisticsImpacted by extreme values, best to use technology to calculate

Formulate questions and collect univariate data

The statistical investigation process is a process that begins with the need to solve a real-world problem and aims to reflect the way statisticians work. The data cycle gives us a nice structure to follow:

A data cycle with four stages. At the top, there is Formulate questions represented by a speech bubble with a question mark. To the right, Collect or acquire data is shown with an icon of a person and a magnifying glass. At the bottom, Organize and represent data is illustrated with a dot plot. To the left, Analyze and communicate results is indicated by a person with charts. Clockwise arrows are drawn from one stage to the next.

To help us formulate or write a question about numerical univariate data, we need to consider what variables we want to explore.

We want to formulate a question about univariate data if we are thinking about frequencies, measures of center or spread, or amounts. This is compared to bivariate data where we are thinking about relationships and predictions.

Continuous numerical data

Data that is measured, so a piece of data could be any decimal or fractional number.

If we measure the length of a person's foot, we might get any decimal value, limited only by the precision of our ruler.

Example:

Time, temperature, length, height, weight

While univariate data always has exactly one variable, we may compare that variable across different categories like comparing the heights of Freshman to Seniors.

We've formulated statistical (investigative) questions with bivariate data, now we'll write them for univariate data.

Well formulated statistical questionsNot statistical questions
How heavy are babies when they are born?What is the heaviest recorded weight of a baby?
How do the lengths of Oscar nominated films compare to the lengths of Caines Film Festival winner?How long was the "Titantic" movie?

There are different ways to collect data for our statistical question, different questions are more suited to different methods.

  • Observation: Watching and noting things as they happen

  • Survey: Asking people questions to get information

  • Scientific experiment: Doing tests in a controlled way to get data

  • Acquire existing secondary data: Use data which was collected by a reliable source like census data, Common Online Data Analysis Platform (CODAP), or peer reviewed studies.

When doing a survey or using secondary sources, it is important that the data is collected from a sample that is representative of the population, so that our analysis of the data is valid.

We will aim to collect very large data sets because they provide a reasonable approximation for the population. This means that data displays like histograms will need to be used to group that data.

Examples

Example 5

Determine the type of data that needs to be collected for each statistical question.

a

How many times have students in the school been to Washington, DC?

Worked Solution
Create a strategy

For univariate data, there is one variable of interest in the study, for bivariate data we are looking for a relationship between two variables.

Apply the idea

This statistical question is looking at numerical data because it is asking for a count of how many times students have visited Washington, DC. It's not directly comparing two variables, so it's not bivariate.

Reflect and check

We would be collecting discrete numerical data.

b

Can we accurately predict a dog's adult size based on their birth weight?

Worked Solution
Create a strategy

For univariate data, there is one variable of interest in the study, for bivariate data we are looking for a relationship between two variables.

Apply the idea

This statistical question is bivariate because it involves analyzing the relationship between two variables: a dog's birth weight and its adult size. It aims to determine whether there's a predictive relationship between these two variables.

c

How long do people take to run 3 mile races? Is this comparable for different age groups?

Worked Solution
Create a strategy

For univariate data, there is one variable of interest in the study, for bivariate data we are looking for a relationship between two variables.

Apply the idea

This statistical question is collects numerical data and compares them over categorical variables.

We can say that we are comparing sets of univariate data across different categories.

Reflect and check

We may also say this is bivariate data, because it is looking at the relationship between two variables: the time it takes people to run a 3-mile race and their age groups. It seeks to determine whether there are differences in race times across different age groups, thus involving the comparison of two variables.

Example 6

Atanasio's basketball coach just had a knee replacement. Now he is interested in wait times for joint replacement surgeries.

a

Formulate a question that could be used to explore the scenario.

Worked Solution
Create a strategy

To formulate a question about numerical univariate data, we need consider what variable we want to explore. In this case, we want to explore the wait times for joice replacement surgeries.

Apply the idea

A sample question would be "What are the waiting times for joint replacement surgeries over the past year at the local hospital?"

b

Formulate a question that requires the use of a measure of center to explore the scenario.

Worked Solution
Create a strategy

To formulate a question that requires a measure of center, we need to identify which measure of center would best represent the needed data.

Apply the idea

Waiting times for joint replacement surgeries may vary greatly. We should consider using the measure of center that is less impacted by extreme values, which is the median

A sample question would be "What is the median wait time for joint replacement surgeries across the hospitals in our region?"

Reflect and check

Using the mean might not be ideal for this scenario because it is more sensitive to extreme values compared to the median. If there are a few unusually long waiting times, the mean will be skewed towards these values, giving a distorted representation of the typical waiting time.

The mode might not provide meaningful insight into the waiting times, because for the context of waiting times for joint replacement surgeries, it's less likely to have repeated exact waiting times due to the variability in individual cases.

c

Formulate a question that requires the use of a measure of dispersion to explore the scenario.

Worked Solution
Create a strategy

To formulate a question that requires a measure of dispersion, we need to identify which measure would best for the scenario.

Apply the idea

A sample question would be "How do wait times for joint replacement surgeries at the local hospital vary?"

Reflect and check

We could use any measure of spread or dispersion to answer this question.

The interquartile range would tell us about the middle half of the data, so would deemphasize those who got in very quickly or had to wait a very long time.

Standard deviation would give a good idea of the overall variation from the average, but might be harder for those without a statistical background to understand.

Using range would be straightforward and provide us a clear understanding of the spread of data by simply showing the difference between the highest and lowest values.

Example 7

Diego loves attending amusement parks, but does not like waiting in lines. This leads him to ask the question: "How long do people wait to ride the newest roller coaster at Busch Gardens Williamsburg?"

a

Determine which method would be the most appropriate and explain why.

Worked Solution
Create a strategy

Choose a method that is realistic, ethical and would match the question.

Apply the idea

Acquiring secondary data would be the best option to know how long people wait to ride the newest roller coaster at Busch Gardens Williamsburg. Many amusement parks utilize mobile apps or online platforms to provide real-time information about ride wait times. This is not only a practical approach, but would also provide wide and accurate data easily. This method respects visitors' privacy as participation is entirely optional.

Reflect and check

Observation can be time-consuming and labor-intensive, especially for a popular attraction like a new roller coaster. It may not be feasible to continuously observe and record waiting times over an extended period. Survey can be challenging in a dynamic environment like an amusement park. Visitors may be unwilling to participate. Scientific experiment is clearly not applicable as it requires controlling the environment.

b

Explain how a sample could be selected to get unbiased data.

Worked Solution
Create a strategy

When we select a sample, we need to make sure that it is representative of the population.

Apply the idea

We can select a sample to get unbiased data by making sure the sample is randomly selected, is big enough, and has the same characteristics as the population.

If we were doing an observation or survey, we would need to ensure that we were selecting people at a variety of times throughout the day. Like asking every 10th person who gets off the ride about how long they had to wait or tracking the wait time for every 10th person using observation.

Reflect and check

When acquiring secondary data, we don't usually have much control over the sample as demographics are not necessarily collected.

c

Explain what the standard deviation of the data set might tell us.

Worked Solution
Create a strategy

Remember that standard deviation is the measure of how far individual values in the data set are from the mean.

Apply the idea

In the scenario of how long people wait to ride the newest roller coaster at Busch Gardens Williamsburg, the standard deviation tells us the variability or how dispersed the different waiting times are from the mean or average.

A higher standard deviation would mean that there are waiting times that are spread out and vary a lot throughout the day or for different seats on the ride, and a lower standard deviation tells us that the waiting times are quite consistent.

d

Explain what the median of the data might tell us.

Worked Solution
Create a strategy

Remember that the median is the middle value of the data set that is arranged in order.

Apply the idea

In the scenario of how long people wait to ride the newest roller coaster at Busch Gardens Williamsburg, the median would tell us the usual waiting time without being affected by extremes or waiting times that are way longer or shorter than the usual.

Idea summary

We can formulate questions and then collect continuous numerical data to explore univariate data with large data sets. Univariate data will have one variable or attribute collected for each member of the sample or population.

A very large data set can help provide a reasonable approximation for the population. However, we need to make sure that the data is selected from a representative sample of the population that reflects.

We can collect the data using surveys, experiments, observation, or secondary sources.

Outcomes

A2.ST.1

The student will apply the data cycle (formulate questions; collect or acquire data; organize and represent data; and analyze data and communicate results) with a focus on univariate quantitative data represented by a smooth curve, including a normal curve.

A2.ST.1a

Formulate investigative questions that require the collection or acquisition of a large set of univariate quantitative data or summary statistics of a large set of univariate quantitative data and investigate questions using a data cycle.

A2.ST.1b

Collect or acquire univariate data through research, or using surveys, observations, scientific experiments, polls, or questionnaires.

A2.ST.1j

Compare multiple data distributions using measures of center, measures of spread, and shape of the distributions.

What is Mathspace

About Mathspace