In our previous chapter we explored in detail the concept of a discrete random variable (DRV), we will now consider continuous random variables (CRV).
Notice from the names, the only difference is the word discrete which has been changed to continuous–and that's the only difference in the definitions too!
When determining whether we are looking at a DRV or a CRV, we firstly need to make sure that the experiment or situation is random and varies.
Then consider whether the outcomes consist of discrete or continuous values. A useful question to consider is whether the outcomes are counted or measured.
However, the issue becomes clouded when we realise that measuring instruments only measure to a limited precision so that, in practice, the number of possible values for a measurement is finite and the corresponding random variable, strictly speaking, must be discrete. Thus, the notion of a continuous random variable from measured data is an idealisation. In practice, a discrete random variable can be approximated for convenience by the continuous kind when the number of possible discrete values is large.
An $18$18-pack carton of extra large eggs advertises that each egg ranges in weight from $67$67 g to $72$72 g.
(a) Can the weight of an egg selected from a randomly chosen carton be modelled by a probability distribution?
Think: We need to check firstly that the outcomes (the weight of the eggs) are random and vary. We then need to think about what sort of data we're dealing with.
Do: The weights certainly vary because we're told they range between $67$67 g and $72$72 g. The eggs are randomly assigned to each space in the egg carton and we would choose one at random.
We also know that we would need to measure the mass of the chosen egg, not count it.
Therefore, we can say that this situation can be modelled by a continuous random variable.
(b) Which of the following graphs best models the shape of this continuous probability distribution?
Think: Let's think carefully about what we expect to happen in this real-life situation and what each graph would represent.
Do: Let's take a look at what each graph is telling us.
When we think about the weight of the eggs in a carton that is advertised as containing extra large eggs, we should expect that the majority of the eggs will be the same size and mass, and that there's a lower chance that an egg will be at the lower or higher range of weights. So the mean weight should have the highest probability. We are therefore looking at the fourth graph as our answer.
When we collect real data, for example, if we were to measure and record the weights of eggs that were being packaged into cartons, we could analyse the distribution of the data by grouping the data appropriately and drawing a histogram. Since we're interested in moving towards modelling various situations and analysing their probabilities, a frequency histogram is more beneficial.
To begin to make sense of a set of measurements from an experiment or from an observational study, we partition the possible values of the relevant random variable into a convenient number of contiguous sub-range intervals. Then, we count the frequency of outcomes within each interval.
A histogram is formed from this grouping of the data. It has columns for each sub-range interval with the area of each column proportional to the number of observations in the particular interval. With intervals of equal width, the height of the columns will also be proportional to the number of observations.
We take the proportion of observations in a particular interval as an estimate of the probability that a future observation will fall within that interval.
For example, the histogram on the left below shows a random sample of weights of apples in an orchard. We could use the proportion of apples that lie in the interval $90$90 to $94$94 grams to estimate the probability of a randomly selected apple from the orchard lying in this range.
Frequency histogram |
Frequency histogram with probability density function |
We could take a larger sample and use smaller class intervals as shown in the histogram on the right. We could then imagine if the number of observations could be increased indefinitely while the width of the sub-range intervals is made very narrow that, in this case, the continuous shape shown would be formed. This imagined curve corresponds to what is called a probability density function.
The area above a particular interval and below the probability density curve corresponds to the probability that a future observation will fall within that interval. This is a similar idea to the way a histogram works.
The advantage of modelling the spread of a data set with such a curve rather than with a histogram is that we can often describe the curve using a known mathematical function and this can make probability calculations easier.
The data given shows the heights of a group of $16$16 year-olds to the nearest cm.
Heights (cm) |
---|
$148,161,154,160,150,153,155,158,156,168,147,157,153,165,148,162,164,163,154,154$148,161,154,160,150,153,155,158,156,168,147,157,153,165,148,162,164,163,154,154 |
Complete the following relative frequency table
Height | Frequency | Relative Frequency |
---|---|---|
$145\le h<150$145≤h<150 | $\editable{}$ | $\editable{}$ |
$150\le h<155$150≤h<155 | $\editable{}$ | $\editable{}$ |
$155\le h<160$155≤h<160 | $\editable{}$ | $\editable{}$ |
$160\le h<165$160≤h<165 | $\editable{}$ | $\editable{}$ |
$165\le h<170$165≤h<170 | $\editable{}$ | $\editable{}$ |
$170\le h<175$170≤h<175 | $\editable{}$ | $\editable{}$ |
Use the table from part (a) to make a relative frequency histogram.
Use your relative frequencies to calculate the probability of a student being between $155$155 and $159$159 cm tall, inclusive.
Use your relative frequencies to calculate the probability of a student being less than $155$155 cm.
The cumulative frequency for a set of continuous data is given below.
Complete the cumulative relative frequency column in the table:
Hours | Cumulative Frequency | Cumulative Relative Frequency |
$725$725$\le t<$≤t<$775$775 | $1$1 | $\editable{}$ |
$775$775$\le t<$≤t<$825$825 | $1$1 | $\editable{}$ |
$825$825$\le t<$≤t<$875$875 | $5$5 | $\editable{}$ |
$875$875$\le t<$≤t<$925$925 | $14$14 | $\editable{}$ |
$925$925$\le t<$≤t<$975$975 | $29$29 | $\editable{}$ |
$975$975$\le t<$≤t<$1025$1025 | $35$35 | $\editable{}$ |
$1025$1025$\le t<$≤t<$1075$1075 | $40$40 | $\editable{}$ |
$1075$1075$\le t<$≤t<$1125$1125 | $44$44 | $\editable{}$ |
$1125$1125$\le t<$≤t<$1175$1175 | $47$47 | $\editable{}$ |
$1175$1175$\le t<$≤t<$1225$1225 | $48$48 | $\editable{}$ |
$1225$1225$\le t<$≤t<$1275$1275 | $49$49 | $\editable{}$ |
$1275$1275$\le t<$≤t<$1325$1325 | $50$50 | $\editable{}$ |
Use the data to calculate $P(X<975)$P(X<975).
Use the data to calculate $P(X\ge825)$P(X≥825).
Use the data to calculate $P(X<1025\ |\ X\ge875)$P(X<1025 | X≥875).
Give your answer to two decimal places.
The time in seconds between customers arriving at an express checkout is gathered.
Determine the relative frequencies for each class interval.
Class intervals |
$0-5$0−5 |
$5-10$5−10 | $10-15$10−15 | $15-20$15−20 | $20-25$20−25 | $25-30$25−30 | $30-35$30−35 |
---|---|---|---|---|---|---|---|
Relative frequency | $\editable{}$ | $\editable{}$ | $\editable{}$ | $\editable{}$ | $\editable{}$ | $\editable{}$ | $\editable{}$ |
Cumulative frequency | $0.29$0.29 | $0.52$0.52 | $0.68$0.68 | $0.83$0.83 | $0.9$0.9 | $0.95$0.95 | $1$1 |
If $200$200 customers were monitored, how many waited less than $20$20 seconds?
If $200$200 customers were monitored, how many waited between $15$15 and $20$20 seconds?
Describe the skewness of the data.
Positive skew
Negative skew
Symmetrical
Estimate the probability that, of the next $4$4 customers, the first $2$2 wait more than $20$20 seconds and the last two do not.
Round your answer to two decimal places.
Estimate the probability that, of the next $4$4 customers, exactly $2$2 wait more than $20$20 minutes.
Round your answer to two decimal places.