In the merging of statistics and probability we begin to examine theoretical and experimental situations where we observe the outcomes of random phenomena.
A random variable, is a variable whose possible values are numerical outcomes of a random phenomenon. These observations are quantities that are either discrete or continuous. This chapter will focus on discrete random variables.
Let's recall here the difference between categorical, discrete and continuous data types, before we continue.
When thinking about what a discrete random variable (or DRV for short) actually is, the name itself tells you the three properties it has.
Let's say there's a cat who's about to give birth to three kittens. Before they are born we know each will either be male or female. The exact combination of male and female kittens is unknown before they are born. We can instead consider all possible outcomes of the birth. An easy way to do this is to represent the possibilities with the following tree diagram.
What we've done so far is create a simple sample space, something we have done many times before. What we need to do now though, is choose something to focus on in this situation. There are two obvious things to focus on in this situation: either we focus on the number of female kittens or the number of male kittens.
Let's focus on the number of female kittens.
How many female kittens will we see once all the kittens are born? Either $0,1,2$0,1,2 or $3$3. These are countable and able to be written in order.
So the number of female kittens will vary and will occur at random.
When a quantity varies, we can define a variable. In this case let's define $X$X as the number of female kittens born.
$x$x will have values of $0,1,2$0,1,2 and $3$3, where $x$x represents the possible outcomes for event $X$X occurring.
Each of these values for $X$X have a particular chance or probability of occurring. We can use our tree diagram and a table to summarise these probabilities.
$x$x | $0$0 | $1$1 | $2$2 | $3$3 |
---|---|---|---|---|
$P(X=x)$P(X=x) | $\frac{1}{8}$18 | $\frac{3}{8}$38 | $\frac{3}{8}$38 | $\frac{1}{8}$18 |
What we've created here is a discrete probability distribution and represented it with an individual probability table.
$x$x represents the individual outcomes of event $X$X occurring.
$P(X=x)$P(X=x) represents the probability that outcome $x$x occurs for random variable $X$X. Put more simply, it's the probability of each of the outcomes occurring.
We could also represent this information with a cumulative probability table. That is, $P\left(X\le x\right)$P(X≤x) represents the probability of the outcome being less than or equal to $x$x.
$x$x | $0$0 | $1$1 | $2$2 | $3$3 |
---|---|---|---|---|
$P\left(X\le x\right)$P(X≤x) | $\frac{1}{8}$18 | $\frac{4}{8}$48 | $\frac{7}{8}$78 | $\frac{8}{8}$88 |
The weights of babies born in a local hospital in the last month have been recorded. One midwife is interested in the probability that of the next $5$5 babies born, the number of babies that would weigh more than $2.4$2.4 kg.
Can this situation be modelled by a discrete random variable?
Yes
No
If $Y$Y represents the number of babies in the next $5$5 babies born that weigh more than $2.4$2.4 kg, list all the possible outcomes.
Write all the outcomes on the same line, separated by commas.
A multiple choice test contains $10$10 questions, each with subparts (a) and (b). The answer to each subpart is awarded a half mark if correct, and zero if incorrect. If a student randomly answers each question, can the number of marks gained on this test be modelled by a discrete random variable?
Yes
No
The quality control manager of the installation of a fibre-optic network is monitoring the faults found in the cable being used.
Can the metres between successive faults in the fibre optic cable being analysed be modelled by a discrete random variable?
Yes
No
What is the reason why this can not be represented by a discrete random variable?
The possible outcomes are continuous, and therefore not discrete.
The possible outcomes are categorical, and therefore it's not a random variable.
Can the number of faults found in a randomly chosen $100$100 m length of the fibre optic cable be modelled by a discrete random variable?
Yes
No