7. Probability & Statistics

A data set that is **symmetric** and bell-shaped about the mean is said to have an approximately **normal distribution**.

If a data set is not symmetrical about the mean, we cannot use normal distribution to interpret it.

Recall that the the standard deviation, denoted by \sigma, describes the spread of the data.

A small standard deviation provides a tight cluster around the mean

A larger standard deviation shows data that is more spread out

Consider the data sets represented by the histograms.

Histogram 1

Histogram 2

Histogram 3

Histogram 4

Match each of the histograms to the correct mean and standard deviation.

\mu=16, \sigma=3

\mu=15,\sigma=2

\mu=19,\sigma=4

\mu=18,\sigma=2

Justify your choices.

Consider this normally distributed data set with a mean of 92.5 pounds and a standard deviation of 5 pounds. The data is centered around the mean weight, and we can use the standard deviation to divide the curve into different sections.

If the mean is 92.5, one standard deviation above the mean is 92.5+5 = 97.5. One standard deviation below the mean is 92.5-5=87.5. This means that data values between 87.5 and 92.5 pounds lie within one standard deviation of the mean.

Continuing this pattern, we can say that data values between 82.5 and 102.5 pounds lie within two standard deviations of the mean, and data values between 77.5 and 107.5 pounds lie within three standard deviations of the mean.

The normal curve is a probability distribution, and the total area under the curve is 100\%, or 1. When data is approximately normally distributed, the percentage of data between 1, 2, and 3 standard deviations can be accurately summarized using the **Empirical Rule**.

Determine whether each distribution is normally distributed.

a

Leaf | |
---|---|

1 | 6\ 7\ 7 |

2 | 2\ 2\ 2\ 2\ 3\ 3\ 3 |

3 | 3\ 3\ 3\ 6\ 6\ 6\ 7\ 7\ 7\ 7\ 7 |

4 | 4\ 4\ 4\ 4\ 4\ 4 |

5 | 7\ 7 |

Key: 2 \vert 3 = 23

Worked Solution

b

Worked Solution

c

Worked Solution

The data on daily high temperatures for a certain town is approximately normally distributed. The mean high temperature for this city is 78\degree\text{F}, and the standard deviation is 6\degree\text{F}.

a

Identify the intervals on the histogram that have data points within 1 standard deviation of the mean.

Worked Solution

b

Select the normal curve that approximates the data.

A

B

C

D

Worked Solution

Consider the normally distributed data sets shown.

a

Which data set has a higher mean?

Worked Solution

b

Which data set has a smaller standard deviation?

Worked Solution

The grades on a recent exam are approximately normally distributed with a mean score of 72 and a standard deviation of 4.

a

Construct a normal curve and label the boundaries for the Empirical Rule.

Worked Solution

b

Find the percentage of students who scored between 64 and 68 on the exam.

Worked Solution

c

If 32 students took the exam, determine the number of students expected to score 80 or more on the exam.

Worked Solution

Farrah is a movie buff and dreams of becoming a director. She notices that a lot of movies have similar running times and formulates the question, "How long are the most popular movies today?" She decides to investigate this further using the data cycle.

a

Describe a method Farrah can use to collect data.

Worked Solution

b

The data Farrah gathered on the running time, in minutes, of the top 30 movies is shown:\begin{aligned} &119,\,126,\,120,\,115,\,120,\,133,\,114,\,120,\,110,\,105,\,130,\,128,\,124,\,129,\,130,\\&107,\,108,\,119,\,118,\,114,\,103,\,124,\,130,\,117,\,122,\,113,\,137,\,136,\,110,\,119 \end{aligned}Use technology to create a smooth curve to model the distribution and describe the shape of the curve.

Worked Solution

c

Answer the statistical question that Farrah formulated.

Worked Solution

d

Formulate a new question that can be answered by the normal curve that approximates the data.

Worked Solution

e

Use the data to answer the statistical question from part (d).

Worked Solution

Idea summary

A data set that is symmetric and bell-shaped is said to have an approximately normal distribution. The mean, median, and mode are approximately equal in a normal distribution.

The center of the normal distribution is at the arithmetic mean, \mu. The standard deviation \sigma, describes the spread of the data.

The normal curve represents a probability distribution, and the area under the entire curve is equal to 100\%, or 1. The percentage of data between 1, 2, and 3 standard deviations can be accurately summarized using the **Empirical Rule**.