topic badge
Standard level

9.05 Chi squared test for independence

Lesson

Dependent or independent?

From our probability work we know that if  two events are independent then it means that one event has no influence on the other event. For example, it has been shown that gender and IQ are independent as intelligence has nothing to do with whether a person is male or female. 

In contrast, two events are called dependent when one event does have influence on the other event. For example, we might guess that favourite movie genre is possibly somewhat dependent on the gender of the person.

A $\chi^2$χ2 test can be used to test if two variables from the same sample can be said to be independent.

Remember that null hypothesis means 'no effect' hypothesis, in this case this means 'not dependent'. We write the hypotheses for tests of independence using words. 

The null hypothesis for this type of test will always be $H_0$H0: The variables are independent. 
The alternative hypothesis will therefore always be $H_1$H1: The events are dependent. 

Manual calculations

Let's explore the hypothesis that movie genre preference is not dependent on gender, at a significance level of $1%$1%,  by considering the following table of data collected from a survey where people were asked "What is your favourite movie genre?":

Observed frequency table:

  Action Comedy Romance Total
Male $51$51 $50$50 $11$11 $112$112
Female $34$34 $42$42 $54$54 $130$130
Total $85$85 $92$92 $65$65 $242$242

 

In practice the calculator will be used to calculate the test statistic $\chi^2,df$χ2,df and the $p$p-value, which will be compared to a level of significance in the same way we have done for our other hypothesis tests. Here we will take a look at how these values are manually calculated in order to understand the process better. 

We have been given the observed frequencies and, just like in a goodness of fit test, we wish to compare these with the expected frequencies ($np$np). In order to complete the expected frequency table we use the rule of probability for independent events. If $A$A and $B$B are independent events then $P(A\text{ and }B)=P(A)\times P(B)$P(A and B)=P(A)×P(B).

In order to complete the first cell:  Expected value = $n\times P(Male\ \bigcap\ Action)=n\times P(Male)\times P(Action)=242\times\frac{112}{242}\times\frac{85}{242}=39.34$n×P(Male  Action)=n×P(Male)×P(Action)=242×112242×85242=39.34

Expected frequency table:

  Action Comedy Romance Total
Male $39.34$39.34 $42.58$42.58 $30.08$30.08 $112$112
Female $45.66$45.66 $49.42$49.42 $34.92$34.92 $130$130
Total $85$85 $92$92 $65$65 $242$242

 

Using the rule as for the previous lesson: $\chi^2=\sum_{\quad}^{\quad}\frac{\left(f_{observed}-f_{expected}\right)^2}{f_{expected}}$χ2= (fobservedfexpected)2fexpected we can show that the test statistic $\chi^2=31.37$χ2=31.37

As for our previous tests, the degrees of freedom is also a required value as it defines the shape of the $\chi^2$χ2 curve.

Degrees of freedom for $\chi^2$χ2  test for independence

$df=$df= (number of rows $-1$1) $\times$×(number of columns $-1$1

Important - do not include the totals row and column in your calculation! 

In this problem we find $df=(2-1)\times(3-1)=2$df=(21)×(31)=2

Using our calculator we find that $p=1.53698\times10^{-7}$p=1.53698×107

This is a tiny $p$p-value of approximately $0$0,  far less than the level of significance of $1%$1%. Remember that this means it is highly unlikely to get these observed values if the events are independent, therefore the events probably are NOT independent. As $p<\alpha$p<α we say that we reject the null hypothesis, $H_0$H0,  and state that movie genre preference does appear to be dependent on gender. 

Note: we can also reject the null hypothesis if the calculated $\chi^2$χ2 value is greater or equal to the critical $\chi^2$χ2 value. 

Using the calculator

In practice, you will not be required to calculate the expected values, test statistic or $p$p-value manually. You will enter the observed values as a matrix in your calculator and then the $\chi^2$χ2 test application will be used to do the rest. Remember that a matrix with $3$3 rows and $4$4 columns has size of $3\times4$3×4

Instructions for $\chi^2$χ2 test for independence:

TI-nspire
calculator instructions
Casio fx-CG 50
calculator instructions
TI-84 Plus CE
calculator instructions
Press menu then 7 Matrix & Vector Press menu then Run-Matrix Press matrix (2nd $x^{\left(-1\right)}$x(1))
Press 1 Create then 1 Matrix Press F3 (MAT/VCT) Move to EDIT and choose  1: [A]
Set rows and columns when prompted Press F3 DIM to set Mat A to correct dimensions Set the matrix size eg: 2 x 3
Enter the data into the rows and columns Type in the data, pressing enter each time Type in the data, pressing enter each time
     
Press the arrow to move outside the matrix, 
then ctrl then var then c 
(this stores the name of the matrix as c)
Press menu then select Statistics  Press stat 
Press menu then select 6 Statistics Press F3 for TEST then F3 for CHI Select C: $\chi^2$χ2 -Test from the TESTS menu
Press 7 Stat Tests then 8 $\chi^2$χ2 2-way Test Press F2 for 2WAY Check Observed: is set to [A]
Set c as the Observed matrix Make sure Observed is set to Mat A
and Expected is set to Mat B
Highlight Calculate
Press OK  view the results Scroll down to Execute and press EXE Press enter to display the results.

Worked example

example 1

At the last Commonwealth Games, $220$220 spectators were asked which of four events they preferred to watch. Results are displayed in the following table:

  Age under $40$40 Age $40$40 or more
Hockey $35$35 $31$31
Lawn Bowls $6$6 $24$24
Swimming $45$45 $49$49
Archery $9$9 $21$21

In order to see whether age had any influence over favourite sport, researchers conducted a $\chi^2$χ2 test of independence at a $5%$5% significance level. 

(a) State the hypotheses set for the problem.

Think: With a test for independence we always have a null hypothesis of NO dependence.

Do: $H_0$H0: The preferred sport is independent of age. $H_1$H1: The preferred sport is not independent of age. 

(b) State the number of degrees of freedom.

Do: Formula is $df$df = (number of rows $-1$1)$\times$×(number of columns $-1$1) =$(4-1)\times(2-1)=3$(41)×(21)=3

(c) Calculate the test statistic, $\chi^2,$χ2, correct to two decimal places.

Do: Carefully enter the observed data into your calculator using a $4\times2$4×2 matrix to find $\chi^2=12.15$χ2=12.15 

(d) Given that the critical value for a significance value of $5%$5% with $3$3 degrees of freedom is $7.81$7.81, determine whether to accept or reject $H_0$H0.

Think: Our calculated $\chi^2=12.15\ge7.81$χ2=12.157.81.
Do: Since the calculated value of $\chi^2$χ2 is greater than the critical value of $\chi^2$χ2, we can reject the null hypothesis $H_0$H0.

(e) Find the $p$p-value, correct to five decimal places.

Do: Read from the calculator that $p=0.00689$p=0.00689

(f) Comment on your findings. 

Think: This is a low $p$p-value, therefore there is a low chance of these observed values if the variables are independent, therefore they are likely to be dependent. 

Do: $p<\alpha$p<α, therefore we this is further reason to reject the null hypothesis $H_0$H0. It appears that preferred sport is not independent of age.

 

What is Mathspace

About Mathspace