Data Analysis

Hong Kong

Stage 1 - Stage 3

Lesson

When we are trying to the understand what our data is telling us, we usually find measures of central tendency (e.g. median, mean and mode) as well as measures of spread, such as the range. The range is easily affected by outliers, however, so to get a better picture of the spread in a data set we often find the set's quartiles.

Quartiles are scores at particular locations in the data set - similar to the median, but instead of dividing a data set into halves, they divide a data set into quarters. Let's look at how we would divide up some data sets into quarters now.

- Here is a data set with $8$8 scores:

$\editable{1}$1 | $\editable{3}$3 | $\editable{4}$4 | $\editable{7}$7 | $\editable{11}$11 | $\editable{12}$12 | $\editable{14}$14 | $\editable{19}$19 |

First locate the median, between the $4$4th and $5$5th scores:

Median | ||||||||||||||

$\downarrow$↓ | ||||||||||||||

$\editable{1}$1 | $\editable{3}$3 | $\editable{4}$4 | $\editable{7}$7 | $\editable{11}$11 | $\editable{12}$12 | $\editable{14}$14 | $\editable{19}$19 |

Now there are $4$4 scores in each half of the data set, so split each of the four scores in half to find the quartiles. We can see the first quartile (Q1) is between the $2$2nd and $3$3rd scores; there are two scores on either side of Q1. Similarly, the upper quartile (Q3) is between the $6$6th and $7$7th scores:

Q1 | Median | Q3 | ||||||||||||

$\downarrow$↓ | $\downarrow$↓ | $\downarrow$↓ | ||||||||||||

$\editable{1}$1 | $\editable{3}$3 | $\editable{4}$4 | $\editable{7}$7 | $\editable{11}$11 | $\editable{12}$12 | $\editable{14}$14 | $\editable{19}$19 |

- Now let's look at a situation with $9$9 scores:

Q1 | Median | Q3 | ||||||||||||||

$\downarrow$↓ | $\downarrow$↓ | $\downarrow$↓ | ||||||||||||||

$\editable{8}$8 | $\editable{8}$8 | $\editable{10}$10 | $\editable{11}$11 | $\editable{13}$13 | $\editable{14}$14 | $\editable{18}$18 | $\editable{22}$22 | $\editable{25}$25 |

This time, the $5$5th term is the median. There are four terms on either side of the median, like for the set with eight scores. So Q1 is still between the $2$2nd and $3$3rd scores and Q3 is between the $6$6th and $7$7th scores.

- Finally, let's look at a set with $10$10 scores:

Q1 | Median | Q3 | ||||||||||||||||

$\downarrow$↓ | $\downarrow$↓ | $\downarrow$↓ | ||||||||||||||||

$\editable{12}$12 | $\editable{13}$13 | $\editable{14}$14 | $\editable{19}$19 | $\editable{19}$19 | $\editable{21}$21 | $\editable{22}$22 | $\editable{22}$22 | $\editable{28}$28 | $\editable{30}$30 |

For this set, the median is between the $5$5th and $6$6th scores. This time, however, there are $5$5 scores on either side of the median. So Q1 is the $3$3rd term and Q3 is the $8$8th term.

Each quartile represents $25%$25% of the data set. In other words, the lowest score to the lower quartile represents $25%$25% of the data, the lower quartile to the median represents another $25%$25%, the median to the upper quartile is another $25%$25%and the upper quartile to the highest score represents another $25%$25%. We can add these quartiles together. For example, $50%$50% of the scores in a data set lie between the lower and upper quartiles.

These quartiles are sometimes named as percentiles. A percentile is a percentage that indicates the value below which a given percentage of observations in a group of observations fall. For example, if a score is in the $75$75th percentile in a statistical test, it is higher than $75%$75% of all other scores. The median represents the $50$50th percentile- the halfway point in a data set.

The first quartile is also called the lower quartile. It is the middle score between the lowest score and the median and it represents the $25$25th percentile.

The first quartile score is the $\frac{n+1}{4}$`n`+14th score, where $n$`n` is the total number of scores.

The second quartile is the median, which we have already learnt about and it represents the $50$50th percentile.

The median is the $\frac{n+1}{2}$`n`+12th score, where $n$`n` is the number of scores.

The third quartile is also called the upper quartile. It is the middle score between the median and the highest score. It represents the $75$75th percentile.

The third quartile is the $\frac{3\left(n+1\right)}{4}$3(`n`+1)4th score, where $n$`n` is the total number of scores.

The interquartile range (IQR) is the difference between the upper quartile and the lower quartile. $50%$50% of scores lie within the IQR because $2$2 full quartiles lie in this range.

Box plots are a great way to graphically display quartiles in a data set.

Answer the following, given this set of scores: $33,38,50,12,33,48,41$33,38,50,12,33,48,41

(a) Sort the scores in ascending order.

**Think:** Ascending means lowest to highest.

**Do:** $12,33,33,38,41,48,50$12,33,33,38,41,48,50

(b) Find the total number of scores.

**Think:** We just need to count how many scores there are.

**Do:** There are $7$7 scores.

(c) Find the median.

**Think:** Which term is in the middle?

**Do:** The fourth score is the middle score so the median is $38$38.

(d) Find the first quartile of the set of scores.

**Think:** What score is the middle score between the lowest score and the median? We calculate this just like we do the median.

**Do:** The second score is the lower quartile score, so the first quartile is $33$33

(e) Find the third quartile of the set of scores.

**Think:** What score is the middle score between the median and the highest score? We calculate this just like we do the median.

**Do:** The sixth score is the upper quartile, so the third quartile is $48$48

(f) Find the interquartile range.

**Think:** The interquartile range (IQR) is the difference between the upper quartile and the lower quartile.

**Do:** $48-33=15$48−33=15

Answer the following using this set of scores: $-2,10,-1,6,9,6,-6$−2,10,−1,6,9,6,−6.

(a) Sort the scores into ascending order.

**Think:** ascending means smallest to largest.

**Do:** $-6,-2,-1,6,6,9,10$−6,−2,−1,6,6,9,10

(b) Find the total number of scores.

**Think:** We just need to count how many scores there are.

**Do:** There are $7$7 scores.

(c) Find the median.

**Think:** The median is the middle score in our data set.

**Do:** The median is $6$6.

(d) Find the first quartile of the set of scores.

Think: The first quartile in the set of scores is the $\frac{n+1}{4}$`n`+14th score, where n is the total number of scores.

**Do:** The first quartile is $-2$−2.

(e) Find the third quartile of the set of scores.

**Think:** What score is halfway between the median and the highest score?

**Do:** The third quartile is $9$9.

(f) Find the interquartile range.

**Think:** The IQR is the difference between the third quartile and the first quartile.

**Do:** $9-\left(-2\right)=11$9−(−2)=11

For the following set of scores in the histogram:

(a) Input the data in the following distribution table:

Score ($x$x) |
Frequency ($f$f) |
$f\times x$f×x |
Cumulative Frequency (cf) |
---|---|---|---|

$30$30 | $5$5 | $150$150 | $5$5 |

$40$40 | $5$5 | $200$200 | $10$10 |

$50$50 | $5$5 | $250$250 | $15$15 |

$60$60 | $1$1 | $60$60 | $16$16 |

$70$70 | $3$3 | $210$210 | $19$19 |

Totals |
$19$19 | $870$870 |

(b) Find the median using the distribution table above.

**Think:** Which score represents the middle number?

**Do:**

$\text{Middle score }$Middle score | $=$= | $\frac{n+1}{2}$n+12 |

$=$= | $\frac{19+1}{2}$19+12 | |

$=$= | $\text{10th score}$10th score |

The tenth score is the median, so the median is $40$40.

(c) Find the first quartile

**Think:** We can use the frequency table to work out which score lies between the lowest score and the median.

**Do:** The first quartile is $30$30.

(d) Find the third quartile.

**Think:** We score is the middle score between the median and the highest score?

**Do:** The third quartile is $50$50.

(e) Find the interquartile range.

**Think:** We need to find the difference between the third quartile and the first quartile.

**Do:** $50-30=20$50−30=20

Answer the following, given this set of scores:

$33,38,50,12,33,48,41$33,38,50,12,33,48,41

Sort the scores in ascending order.

Find the number of scores.

Find the median.

Find the first quartile of the set of scores.

Find the third quartile of the set of scores.

Find the interquartile range.

Answer the following using this set of scores:

$-3,-3,1,9,9,6,-9$−3,−3,1,9,9,6,−9

Sort the scores in ascending order.

Find the number of scores.

Find the median.

Find the first quartile of the set of scores.

Find the third quartile of the set of scores.

Find the interquartile range.

For the following set of scores in the bar chart to the right:

Input the data in the following distribution table:

Score $\left(x\right)$( `x`)Freq $\left(f\right)$( `f`)$fx$ `f``x`Cumulative Freq $\left(cf\right)$( `c``f`)$30$30 $\editable{}$ $\editable{}$ $\editable{}$ $40$40 $\editable{}$ $\editable{}$ $\editable{}$ $50$50 $\editable{}$ $\editable{}$ $\editable{}$ $60$60 $\editable{}$ $\editable{}$ $\editable{}$ $70$70 $\editable{}$ $\editable{}$ $\editable{}$ **Totals**$\editable{}$ $\editable{}$ Find the median score using the distribution table above.

Find the first quartile score.

Find the third quartile score.

Find the interquartile range.