When an experiment can have either of two possible outcomes, usually called success and failure, it gives rise to a Bernoulli random variable. We assign the values $X=1$X=1 and $X=0$X=0 to a Bernoulli random variable $X$X according to whether a trial of the experiment results in a success or a failure. Also, we assign probabilities $p$p and $q$q to the two outcomes.
Thus, we write $P\left(X=1\right)=p$P(X=1)=p and $P\left(X=0\right)=q=1-p$P(X=0)=q=1−p.
The expected value or mean of the Bernoulli random variable $X$X may be thought of informally as the average amount of 'success' per trial over a very large number of trials. This is just $p$p and we write $\mu_X=p$μX=p or $E(X)=p$E(X)=p.
If we experiment and calculate the amount of 'success' per trial over just a few trials, we will quite likely obtain a value different from $p$p. By doing this repeatedly, we obtain a spread of values centred around the mean, $p$p. This spread of values is what is meant by the variance of the random variable $X$X. Using the definition of variance, we write $Var(X)=E(X-\mu_X)^2$Var(X)=E(X−μX)2 and evaluate this from the definition as
$Var(X)=p(1-\mu_x)^2+q(0-\mu_x)^2=$Var(X)=p(1−μx)2+q(0−μx)2=$pq^2+qp^2$pq2+qp2$=pq(p+q)$=pq(p+q)$=pq$=pq
Thus, a Bernoulli random variable has mean $\mu_X=p$μX=p and variance $Var\left(X\right)=p\left(1-p\right)$Var(X)=p(1−p).
We are often interested in strings of independent Bernoulli trials. The distinguishing feature of the Binomial distribution is that we are interested in the probability of observing each possible number of successes in a string of Bernoulli trials.
In an experiment involving $n$n trials, there could be anywhere from $0$0 to $n$n successes. As $p$p is the long-run proportion of successes over many trials, if the $n$n trials were to be repeated many times, we would expect the number of successes on average, to be $np$np and this number is the mean of the binomial distribution.
The actual number of observed successes varies about this mean, giving rise to a variance $np\left(1-p\right)$np(1−p) which you should compare with the variance for the Bernoulli distribution.
Suppose $r$r successes are observed, and $n-r$n−r failures. We can count the number of ways this outcome can occur, namely $^nC_r$nCr or in equivalent notation, $\binom{n}{r}$(nr). From the theory of combinatorics, we know that this is evaluated by $^nC_r=\frac{n!}{r!\left(1-r\right)!}$nCr=n!r!(1−r)!.
The numbers $^nC_r$nCr are the same as the coefficients that arise in the expansion of the binomial expression $\left(a+b\right)^n$(a+b)n. Hence, the name binomial distribution.
We can now calculate the probabilities associated with the outcomes of a binomial experiment. The probability of a particular instance of $r$r successes and $n-r$n−r failures must be $p^r\left(1-p\right)^{n-r}$pr(1−p)n−r. But, because there are $^nC_r$nCr ways in which this outcome can occur, we conclude that
$P\left(N=r\right)=\binom{n}{r}p^r\left(1-p\right)^{n-r}$P(N=r)=(nr)pr(1−p)n−r
where $N$N is called a binomial random variable. It takes integer values from $0$0 to $n$n.
Although it may not be strictly true, we assume for the sake of this example that the occurrence of rain on a given day over a thirty-day period is independent of the weather on the preceding and following days. Suppose that according to historical records the probability of rain on any day in April is $0.2$0.2.
The mean number of rainy days in April is $np=30\times0.2=6$np=30×0.2=6. However, in the most recent month of April, there were $10$10 rainy days. The variance is $np\left(1-p\right)=30\times0.2\times0.8=4.8$np(1−p)=30×0.2×0.8=4.8 and we might wonder how unlikely it is to get a number of rainy days this far or further away from the mean.
The probability of getting exactly the mean number of rainy days is $\binom{30}{6}\times0.2^6\times0.8^24=0.179$(306)×0.26×0.824=0.179 to three decimal places.
The probability of getting exactly ten days of rain is $\binom{30}{10}\times0.2^{10}\times0.8^{20}=0.035$(3010)×0.210×0.820=0.035 to three decimal places.
We could calculate the probability of observing at least $10$10 days of rain by first calculating the probabilities of exactly $0$0, $1$1, $2$2, $3$3, $4$4, $5$5,$6$6, $7$7, $8$8, and $9$9 days of rain. The sum of these is the probability of seeing fewer than $10$10 days of rain and the number we want is one minus this amount.
You should work through this calculation to check that the probability of observing $10$10 or more rainy days would be $0.061$0.061 to three decimal places. So, the observed event is not easily explained as a random fluctuation.
Find the value of $\nCr{5}{4}\times\left(0.1\right)^4\times0.9+\nCr{5}{5}\times\left(0.1\right)^5\times\left(0.9\right)^0$5C4×(0.1)4×0.9+5C5×(0.1)5×(0.9)0.
Census data show that $80%$80% of the population in a particular country have brown eyes.
A random sample of $900$900 people is selected from the population.
What is the mean number of people in the sample who have brown eyes?
What is the standard deviation of the number of people in the sample who have brown eyes?