From our probability work we know that if two events are independent then it means that one event has no influence on the other event. For example, it has been shown that gender and IQ are independent as intelligence has nothing to do with whether a person is male or female.
In contrast, two events are called dependent when one event does have influence on the other event. For example, we might guess that favourite movie genre is possibly somewhat dependent on the gender of the person.
A $\chi^2$χ2 test can be used to test if two variables from the same sample can be said to be independent.
Remember that null hypothesis means 'no effect' hypothesis, in this case this means 'not dependent'. We write the hypotheses for tests of independence using words.
The null hypothesis for this type of test will always be $H_0$H0: The variables are independent.
The alternative hypothesis will therefore always be $H_1$H1: The events are dependent.
Let's explore the hypothesis that movie genre preference is not dependent on gender, at a significance level of $1%$1%, by considering the following table of data collected from a survey where people were asked "What is your favourite movie genre?":
Observed frequency table:
Action | Comedy | Romance | Total | |
---|---|---|---|---|
Male | $51$51 | $50$50 | $11$11 | $112$112 |
Female | $34$34 | $42$42 | $54$54 | $130$130 |
Total | $85$85 | $92$92 | $65$65 | $242$242 |
In practice the calculator will be used to calculate the test statistic $\chi^2,df$χ2,df and the $p$p-value, which will be compared to a level of significance in the same way we have done for our other hypothesis tests. Here we will take a look at how these values are manually calculated in order to understand the process better.
We have been given the observed frequencies and, just like in a goodness of fit test, we wish to compare these with the expected frequencies ($np$np). In order to complete the expected frequency table we use the rule of probability for independent events. If $A$A and $B$B are independent events then $P(A\text{ and }B)=P(A)\times P(B)$P(A and B)=P(A)×P(B).
In order to complete the first cell: Expected value = $n\times P(Male\ \bigcap\ Action)=n\times P(Male)\times P(Action)=242\times\frac{112}{242}\times\frac{85}{242}=39.34$n×P(Male ∩ Action)=n×P(Male)×P(Action)=242×112242×85242=39.34
Expected frequency table:
Action | Comedy | Romance | Total | |
---|---|---|---|---|
Male | $39.34$39.34 | $42.58$42.58 | $30.08$30.08 | $112$112 |
Female | $45.66$45.66 | $49.42$49.42 | $34.92$34.92 | $130$130 |
Total | $85$85 | $92$92 | $65$65 | $242$242 |
Using the rule as for the previous lesson: $\chi^2=\sum_{\quad}^{\quad}\frac{\left(f_{observed}-f_{expected}\right)^2}{f_{expected}}$χ2= ∑ (fobserved−fexpected)2fexpected we can show that the test statistic $\chi^2=31.37$χ2=31.37
As for our previous tests, the degrees of freedom is also a required value as it defines the shape of the $\chi^2$χ2 curve.
$df=$df= (number of rows $-1$−1) $\times$×(number of columns $-1$−1)
Important - do not include the totals row and column in your calculation!
In this problem we find $df=(2-1)\times(3-1)=2$df=(2−1)×(3−1)=2
Using our calculator we find that $p=1.53698\times10^{-7}$p=1.53698×10−7
This is a tiny $p$p-value of approximately $0$0, far less than the level of significance of $1%$1%. Remember that this means it is highly unlikely to get these observed values if the events are independent, therefore the events probably are NOT independent. As $p<\alpha$p<α we say that we reject the null hypothesis, $H_0$H0, and state that movie genre preference does appear to be dependent on gender.
Note: we can also reject the null hypothesis if the calculated $\chi^2$χ2 value is greater or equal to the critical $\chi^2$χ2 value.
In practice, you will not be required to calculate the expected values, test statistic or $p$p-value manually. You will enter the observed values as a matrix in your calculator and then the $\chi^2$χ2 test application will be used to do the rest. Remember that a matrix with $3$3 rows and $4$4 columns has size of $3\times4$3×4.
TI-nspire calculator instructions |
Casio fx-CG 50 calculator instructions |
TI-84 Plus CE calculator instructions |
---|---|---|
Press menu then 7 Matrix & Vector | Press menu then Run-Matrix | Press matrix (2nd $x^{\left(-1\right)}$x(−1)) |
Press 1 Create then 1 Matrix | Press F3 (MAT/VCT) | Move to EDIT and choose 1: [A] |
Set rows and columns when prompted | Press F3 DIM to set Mat A to correct dimensions | Set the matrix size eg: 2 x 3 |
Enter the data into the rows and columns | Type in the data, pressing enter each time | Type in the data, pressing enter each time |
Press the arrow to move outside the matrix, then ctrl then var then c (this stores the name of the matrix as c) |
Press menu then select Statistics | Press stat |
Press menu then select 6 Statistics | Press F3 for TEST then F3 for CHI | Select C: $\chi^2$χ2 -Test from the TESTS menu |
Press 7 Stat Tests then 8 $\chi^2$χ2 2-way Test | Press F2 for 2WAY | Check Observed: is set to [A] |
Set c as the Observed matrix | Make sure Observed is set to Mat A and Expected is set to Mat B |
Highlight Calculate |
Press OK view the results | Scroll down to Execute and press EXE | Press enter to display the results. |
At the last Commonwealth Games, $220$220 spectators were asked which of four events they preferred to watch. Results are displayed in the following table:
Age under $40$40 | Age $40$40 or more | |
---|---|---|
Hockey | $35$35 | $31$31 |
Lawn Bowls | $6$6 | $24$24 |
Swimming | $45$45 | $49$49 |
Archery | $9$9 | $21$21 |
In order to see whether age had any influence over favourite sport, researchers conducted a $\chi^2$χ2 test of independence at a $5%$5% significance level.
(a) State the hypotheses set for the problem.
Think: With a test for independence we always have a null hypothesis of NO dependence.
Do: $H_0$H0: The preferred sport is independent of age. $H_1$H1: The preferred sport is not independent of age.
(b) State the number of degrees of freedom.
Do: Formula is $df$df = (number of rows $-1$−1)$\times$×(number of columns $-1$−1) =$(4-1)\times(2-1)=3$(4−1)×(2−1)=3
(c) Calculate the test statistic, $\chi^2,$χ2, correct to two decimal places.
Do: Carefully enter the observed data into your calculator using a $4\times2$4×2 matrix to find $\chi^2=12.15$χ2=12.15
(d) Given that the critical value for a significance value of $5%$5% with $3$3 degrees of freedom is $7.81$7.81, determine whether to accept or reject $H_0$H0.
Think: Our calculated $\chi^2=12.15\ge7.81$χ2=12.15≥7.81.
Do: Since the calculated value of $\chi^2$χ2 is greater than the critical value of $\chi^2$χ2, we can reject the null hypothesis $H_0$H0.
(e) Find the $p$p-value, correct to five decimal places.
Do: Read from the calculator that $p=0.00689$p=0.00689
(f) Comment on your findings.
Think: This is a low $p$p-value, therefore there is a low chance of these observed values if the variables are independent, therefore they are likely to be dependent.
Do: $p<\alpha$p<α, therefore we this is further reason to reject the null hypothesis $H_0$H0. It appears that preferred sport is not independent of age.