topic badge

6.06 Independent events and data

Lesson

In research we often want to discover if one event has an impact of another, such as:

  • Does taking a medication reduce the risk of a certain cancer?
  • Has a marketing campaign affected sales?
  • Are creative people more likely to be left handed?
  • Does studying Mathematics improve your chance of a higher income?
  • Does your weight impact the likelihood of heart disease?

What we trying to determine is if these events are independent.

Recall from our previous lesson that two events are independent if the occurrence of one does not affect the probability of the other. For conditional probability this means that $P\left(A|B\right)=P(A)$P(A|B)=P(A) and $P\left(B|A\right)=P(B)$P(B|A)=P(B). Using data we can calculate probabilities and test for independence by checking if these conditional probability statements hold or the fact that for independent events the following statement will be true:

$P\left(A\cap B\right)=P\left(A\right)\times P\left(B\right)$P(AB)=P(A)×P(B)

With data from a well designed experiment we can use the experimental probabilities as point estimates for the population probabilities of $P\left(A\right)$P(A)$P\left(B\right)$P(B) and $P\left(A\cap B\right)$P(AB) and test if the above statement holds.

However, when we check for independence using real world data, it would be rare to get perfectly equal probabilities. So in practice, we test to see if the probabilities are significantly different, if they are we can then conclude the events are not independent. How different the probabilities need to be in order to be considered "significantly" different and to what degree the two events are dependent will be studied in depth in further statistics. 

Worked examples

Example 1

A research laboratory wants to investigate the effectiveness of a flu vaccine. It follows 150 randomly selected people from a community and notes if they received a flu shot prior to the flu season and records their health outcome over the season.

  Flu Shot No Shot Total
Sick 12 25 37
Stayed healthy 63 50 113
Total: 75 75 150

(a) What is the probability that a randomly selected person was sick and had received the flu shot?

$P\left(\text{Sick}\cap\text{Flu shot}\right)$P(SickFlu shot) $=$= $\frac{12}{150}$12150
  $=$= $\frac{2}{25}$225
  $=$= $0.08$0.08

(b) Calculate $P\left(\text{Sick}\right)\times P\left(\text{Flu Shot}\right)$P(Sick)×P(Flu Shot)

$P\left(\text{Sick}\right)\times P\left(\text{Flu Shot}\right)$P(Sick)×P(Flu Shot) $=$= $\frac{37}{150}\times\frac{75}{150}$37150×75150
  $=$= $\frac{37}{300}$37300
  $=$= $0.123$0.123 (3 decimal places)

(c) Does it appear that the events are independent?

No, the probabilities are significantly different therefore it does not appear from the data that these events are independent. It appears that having the flu shot makes a person less likely to fall ill.

Example 2

A student wishes to know if being left-handed is independent of gender. The student conducts a survey of 200 randomly selected people and records the following results:

  Male Female Total
Left-Handed 13 7 20
Right-handed 107 72 179
Total: 120 80 200

(a) What is the probability that a randomly chosen person is male and left-handed?

$P\left(\text{Male}\cap\text{Left-handed}\right)$P(MaleLeft-handed) $=$= $\frac{13}{200}$13200
  $=$= $0.065$0.065

(b) Calculate $P\left(\text{Male}\right)\times P\left(\text{Left-handed}\right)$P(Male)×P(Left-handed).

$P\left(\text{Male}\right)\times P\left(\text{Left-Handed}\right)$P(Male)×P(Left-Handed) $=$= $\frac{120}{200}\times\frac{20}{200}$120200×20200
  $=$= $\frac{3}{50}$350
  $=$= $0.06$0.06 

(c) Does it appear that the events are independent?

The probabilities are different but are reasonably close. We would need further data or a different test to conclude if these sets are independent.

(Fun fact: Left-handedness is not independent of gender, $23$23% more men are left-handed, roughly $11$11% of men and $9$9% of women are left-handed.)

 

Practice questions

question 1

The probability of two independent events, $A$A and $B$B are, $P\left(A\right)=0.5$P(A)=0.5 and $P\left(B\right)=0.8$P(B)=0.8.

Determine the probability of:

  1. Both $A$A and $B$B occurring.

  2. Neither $A$A nor $B$B.

  3. $A$A or $B$B or both.

  4. $B$B but not $A$A.

  5. $A$A given that $B$B occurs.

Question 2

A sample of $20$20 students from the same school were asked whether they had a haircut in the last month (H) and whether they had been sick in the last month (S).

The results are shown in the two table given.

 

Students A B C D E F G H I J
Hair $✓$ $✓$ $-$ $✓$ $-$ $✓$ $✓$ $-$ $-$ $✓$
Sick $-$ $✓$ $-$ $✓$ $-$ $-$ $✓$ $✓$ $-$ $-$
Students K L M N O P Q R S T
Hair $-$ $-$ $-$ $✓$ $-$ $✓$ $-$ $✓$ $-$ $-$
Sick $✓$ $✓$ $✓$ $✓$ $-$ $-$ $✓$ $✓$ $-$ $✓$

 

  1. What is the probability a student had a haircut in the last month (H) based off the data?

  2. What is the probability a student had been sick in the last month (S) based off the data?

  3. What is the probability a student was sick and had a haircut in the last month, from the given data?

  4. What is the value of $P\left(H\right)\times P\left(S\right)$P(H)×P(S)?

  5. Does the data suggest that getting haircuts and being sick in the last month are independent or dependent?

    Independent

    A

    Dependent

    B

Question 3

$126$126 patients were tested for a gene, and then tested to how attractive they were to mosquitos. Patients could either be categorised as repellant (R), neutral (N) or attractive (A) to mosquitos, and either had the standard (S) or mutant (M) gene.

  R N A
S $22$22 $30$30 $38$38
M $8$8 $12$12 $16$16
Total $30$30 $42$42 $54$54

 

  1. Does the data suggest that having the standard gene (S) and having a neutral attraction to mosquitos (N) are independent?

    Yes

    A

    No

    B

 

Outcomes

1.3.15

understand the notion of independence of an event A from an event B, as defined by P(A|B)=P(A)

1.3.16

establish and use the formula 𝑃(𝐴∩𝐵) = 𝑃(A)𝑃(𝐵) for independent events 𝐴 and 𝐵, and recognise the symmetry of independence

1.3.17

use relative frequencies obtained from data as point estimates of conditional probabilities and as indications of possible independence of events

What is Mathspace

About Mathspace