Data Analysis

Hong Kong

Stage 1 - Stage 3

Lesson

Data is collected all the time for lots of different reasons. In fact, there are millions of studies and surveys conducted each day. However, not all the data that is collected is useful. That's why it's important to make sure that the information you're presenting is meaningful, which depends on the kinds of questions you ask when you are collecting your data and how you analyse it afterwards.

For example, if I wanted to know the most popular type of movies people watch, would it make sense if I simply asked $50$50 people, "What's your favourite movie?" If I get $50$50 different answers, what conclusions can I really make?

It makes more sense to group the data. Rather than have every movie as a possibility, I could ask, "What is your favourite movie genre?" This limits the number of possible answers significantly.

We may also want to group our data after we have collected it to make it more meaningful. For example, if I wanted to know what people of different ages thought about a certain issue, I could say what $17$17 year olds thought, $18$18 year olds, $19$19 year olds thought and so on. However, it may make sense to group the ages and say what $17-25$17−25 year olds thought $26-35$26−35 year olds thought and so on. This way we can make generalisations about the opinions of people of a certain age.

When we group data, we create class intervals, which tell us the range of scores in a particular group. For example, if our class interval was $1-5$1−5, we know that the individual scores included in this class interval are $1,2,3,4$1,2,3,4 and $5$5.

To help make it easier to work with our data, we usually find the class centre. The class centre is the median score of each class interval. So for the interval $1-5$1−5, the class centre would be $3$3 because this is the middle score in that interval.

Class centres are often used as averages for class intervals and are used to analyse data sets because they are a good estimate for the values of the class interval.

Let's look through some examples of these different ways we can work with grouped data.

Find the class centre for the class interval $\left(27-40\right)$(27−40)

What would be the most appropriate way of representing data from:

A survey conducted of $1000$1000 people, asking them how many languages they speak?

Leaving the data ungrouped and constructing a frequency table

AGrouping the responses and constructing a frequency table

BA survey conducted of $1000$1000 people, asking them how many different countries they know the names of?

Grouping the responses and constructing a frequency table

ALeaving the data ungrouped and constructing a frequency table

B

As part of a fuel watch initiative, the price of petrol at a service station was recorded each day for $21$21 days. The frequency table shows the findings.

Price (in cents per litre) | Class Centre | Frequency |
---|---|---|

$130.9$130.9-$135.9$135.9 | $133.4$133.4 | $6$6 |

$135.9$135.9-$140.9$140.9 | $138.4$138.4 | $5$5 |

$140.9$140.9-$145.9$145.9 | $143.4$143.4 | $5$5 |

$145.9$145.9-$150.9$150.9 | $148.4$148.4 | $5$5 |

What was the highest price that could have been recorded?

How many days was the price above $140.9$140.9 cents?