Continuous numerical data, such as times, heights, weights or temperatures, are based on measurements, so any data value is possible within a large range of values.
As an example, the following frequency distribution table represents the times taken for $72$72 runners to complete a ten kilometre race.
Class interval | Frequency |
---|---|
$45\le\text{time }<50$45≤time <50 | $9$9 |
$50\le\text{time }<55$50≤time <55 | $7$7 |
$55\le\text{time }<60$55≤time <60 | $20$20 |
$60\le\text{time }<65$60≤time <65 | $30$30 |
$65\le\text{time }<70$65≤time <70 | $6$6 |
What may surprise us at first is that the table above has only five rows, even though it represents $72$72 different data values. The data is first grouped into class intervals (also known as classes or bins), in the frequency distribution table.
In the table above,
Every data value must go into exactly one and only one class interval.
Class intervals should be equal width.
There are several different ways that class intervals are defined. Here are some examples with two adjacent class intervals:
Class interval formats | Description | |
---|---|---|
$45<\text{time }\le50$45<time ≤50 | $50<\text{time }\le55$50<time ≤55 |
Upper endpoint included, lower endpoint excluded. |
$45\le\text{time }<50$45≤time <50 | $50\le\text{time }<55$50≤time <55 |
Lower endpoint included, upper endpoint excluded. |
$45$45 to $<50$<50 | $50$50 to $<55$<55 |
Lower endpoint included, upper endpoint excluded. |
$45$45 - $49$49 | $50$50 - $54$54 |
Suitable for data rounded to the nearest whole number, or discrete data. |
$45$45 → $50$50 | $50$50 → $55$55 |
Not clear which endpoints are included or excluded. Assume upper endpoint is included. |
Regardless of the format used, each class interval for a given set of data should be consistent across all class intervals.
Note: In this course, class intervals for any particular set of data will be the same width. There are situations in data representation when class intervals are different widths, but this is beyond the scope of this course.
The class centre is the average of the endpoints of each interval.
For example, if the class interval is $45\le\text{time }<50$45≤time <50, or $45$45 - $50$50, the class centre is calculated as follows:
class centre | $=$= | $\frac{45+50}{2}$45+502 |
$=$= | $47.5$47.5 |
Because the class centre is an average of the endpoints, it is often used as a single value to represent the class interval.
Find the class centre for the class interval $19\le t<23$19≤t<23 where $t$t represents time.
Using the example of running times, we can add a 'class centre' column to the frequency distribution table.
Class interval | Class centre | Frequency |
---|---|---|
$45\le\text{time }<50$45≤time <50 | $47.5$47.5 | $9$9 |
$50\le\text{time }<55$50≤time <55 | $52.5$52.5 | $7$7 |
$55\le\text{time }<60$55≤time <60 | $57.5$57.5 | $20$20 |
$60\le\text{time }<65$60≤time <65 | $62.5$62.5 | $30$30 |
$65\le\text{time }<70$65≤time <70 | $67.5$67.5 | $6$6 |
The class centre is used to create a frequency polygon.
A frequency polygon is a line graph that displays the frequency distribution of a set of data.
As part of a fuel watch initiative, the price of petrol, $p$p, at a service station was recorded each day for $21$21 days. The frequency table shows the findings.
Price (in cents per litre) | Class Centre | Frequency |
---|---|---|
$120.9120.9<p≤125.9 | $123.4$123.4 | $4$4 |
$125.9125.9<p≤130.9 | $128.4$128.4 | $6$6 |
$130.9130.9<p≤135.9 | $133.4$133.4 | $5$5 |
$135.9135.9<p≤140.9 | $138.4$138.4 | $6$6 |
What was the highest price that could have been recorded?
How many days was the price above $130.9$130.9 cents?