Features of Binomial Distributions

Lesson

Practice

Lesson

When some proportion $\theta$θ of a population is observed to have a particular characteristic, we tend to use our relative frequency understanding of probability to conclude that the probability of a randomly chosen individual in the population having the characteristic is $\theta$θ.

In an experiment or observational study that we believe it appropriate to model with a binomial probability distribution, we may already know from previous work that the probability of a success on each trial is $\theta$θ and we may have planned to perform $n$n trials in the experiment. It is natural to interpret the probability $\theta$θ in the relative frequency sense, as a proportion, and to conclude that the expected number of successes over the $n$n trials is $n\theta$nθ.

This is indeed the expected value or mean of the distribution, which is also presented in the chapter on the binomial distribution and derived in a different way in the chapter on the Bernoulli distribution.

If we let $X$X be a random variable that represents the number of successes in an experiment in which there are $n$n independent trials or observations, then for the expected or average value of $X$X we write

$E\left(X\right)=n\theta$E(X)=nθ

where $\theta$θ is the probability of success on each trial.

Caution is needed, however, in proceeding from a general observation of a relative frequency to a conclusion about the probabilities of the various possible numbers of successes in an experiment, as the following example illustrates. The binomial distribution is not always applicable.

Example

Suppose it has been observed that each year in the state of Victoria, $23%$23% of people in the $20$20 to $55$55age range experience a mild to serious attack of the sniffles in the month of July.

Using the relative frequency idea of probability, we could argue that on average an individual in the population has a probability of $0.23$0.23 of contracting the sniffles.

If we let $Y$Y be the random variable whose values are the possible numbers of people who get the sniffles, we would be justified in saying that $E\left(Y\right)=0.23n$E(Y)=0.23n where $n$n is the number of people in the population.

However, we would not be justified in using the proportion $0.23$0.23 as the parameter in a binomial distribution, to calculate the probability that $E\left(Y\right)$E(Y) cases would actually occur or that any other particular number of cases would occur.

The binomial model is unreliable when the trials are not independent. In this case, some members of the population would be at greater risk of contracting the sniffles than others due to possible contagion in work and family environments, pre-existing health conditions or exposure capacity. That is, the probability is not the same for every observation.

Thus, in a subset of, say, $50$50 people, it would be risky to predict $50\times0.23=11.5$50×0.23=11.5 cases even though this is the number that would be expected as a long-term average. In the case of epidemics, a probability distribution with a different shape may be more appropriate.

Shape

The shape of a binomial distribution depends on the value of the probability parameter. In all cases, the expected value of the distribution has the highest probability. The following diagrams show the probabilities of from $0$0 to $25$25 successes in $25$25 independent trials, with three different values of the probability parameter.

The first graph, with parameter $0.15$0.15, is said to be skewed to the right; the second graph, with parameter $0.5$0.5, is symmetrical about the mean; and the third graph, with the largest probability parameter, is skewed to the left.

Worked Examples

Question 1

A certain disease has a survival rate of $64%$64%. Of the next $110$110 people who contract the disease, how many would you expect to survive? Round your answer to the nearest whole number.

Question 2

A subject exam consists of $48$48 multiple choice questions. Each question has $4$4 options, of which $1$1 is correct. $Neville$ guessed the answers to all of the questions.

How many questions would he expect to get correct?

Outcomes

12D.B.1.4

Recognize conditions (e.g., independent trials) that give rise to a random variable that follows a binomial probability distribution, calculate the probability associated with each value of the random variable, represent the distribution numerically using a table and graphically using a probability histogram, and make connections to the algebraic representation P(X=x)=nCr(n, x) p^x (1-p)^(n-x)