Sometimes we will see a frequency polygon on a histogram which is a line graph that follows the shape of the histogram using either the left corners, centers, or right corners of the bars.
Notice that the pieces of the bars that are above the frequency polygon, could be used to fill in the unfilled areas below the frequency polygon.
To give us a better understanding of the overall shape of the data, we can draw a smooth curve over the histogram. This curve is sometimes called a density curve and shows where values are concentrated.
We have formal ways to describe the shape:
Based on symmetry or skew of the distribution, we can make observations about the measures of center - mean, median, and mode.
Select different distrbution shapes using the checkboxes under the histogram and notice how the measures of center compare.
What do you notice about the mean, median and mode when the data is symmetrical?
What do you notice about the mean, median and mode when the data is positively skewed?
What do you notice about the mean, median and mode when the data is negatively skewed?
The range is a helpful measure of spead when analyzing data distributions. In later lessons, we will look at how standard deviation can be read from a symmetrical smooth curve.
The given histogram represents the distribution of hours students sleep each night.
Select the smooth curve that most accurately models this distribution.
The given smooth curve was created from data collected on the statistical question "What weekly pay is typical for a job while in high school?"
Describe and interpret the shape of the distribution.
Estimate and interpret an appropriate measure of center.
Estimate and interpret an appropriate measure of spread.
What can be inferred from this distribution?
A distribution is symmetrical if its left and right sides are mirror images of one another.
A data set that has positive or right skew has a longer tail of values to the right of the data set. The mass of the distribution is concentrated on the left of the figure.
A data set that has negative or left skew has a longer tail of values to the left of the data set. The mass of the distribution is concentrated on the right of the figure.
When we look at two or more univariate data displays, like smooth curves, there are certain characteristics that can help us compare the two data distributions.
Identifying the shape of a density curve can help us understand the corresponding data set. Here are some examples of how we describe the shape of density curves:
Consider the following histograms that show the height of students in two basketball teams. We know that one graph represents a team made up of Grade 12 students and the other represents a Grade 9 team.
Are there the same number of students in each team? Does it matter?
What are the similarities and differences in terms of measures of spread, central tendency and shape of data?
Which team do you think corresponds to the Grade 12 team, and which team do you think corresponds to the Grade 9 team?
It is important to be able to compare data sets because it helps us make conclusions or judgements about the data. For example, suppose Jim scores 50\% on a geography test and 70\% on a history test. Based on those grades alone, it makes sense to say that he did better in history.
However, looking at the smooth curves that represent the class results, we can see they tell a different story.
Notice the geography class had a mean of 40\%, while the history class had a mean of 80\%. Now we know that Jim scored well above the average in geography, and well below the average in history. With this extra information, it makes more sense to say that he did better in geography.
The following curves show the average math test results for two different classes. Curves 1 and 2 show the results for class 1 and 2 respectively.
State the similarities and differences between the following pair of density curves.
Interpret the test results of class 1 and class 2.
If Anthony scored 60\% in Class 1, and Brodie scored 80\% in Class 2, who did better?
The following curves show the distributions of the race times for two different years of the Shelby Forest Loop Marathon.
Describe the similarities and differences between the following pair of smooth curves.
We can first identify and then compare the measures of center, measures of spread, and shape of smooth curves and histograms.
The context of the curves is important to consider when interpreting the comparisons.