Chi-square test of independence

The chi-square test of independence determines whether two categorical fields are independent. If the fields are not independent, they are associated.

The following procedure describes how the chi-square value is calculated:

  1. Determine the expected frequency with the assumption that the fields are independent. The expected frequency for each combination of categories is the joint probability for the two fields multiplied by the total count. The joint probability of two independent fields is the product of the two probabilities for each combination of categories.

    For example, consider two fields: gender and favorite color. The total count is 100. There are 40 males, and 20 people whose favorite color is gray. Assuming that gender and color preference are independent, the expected frequency of males whose favorite color is gray is (40/100)*(20/100)*100, which calculates to 8.

  2. For each combination, subtract the expected frequency from the actual (observed) frequency.
  3. Take the square of each of these results and divide each square by the expected frequency.
  4. Add up all the results.

The chi-square value is compared to a theoretical chi-square distribution to determine the probability of obtaining the value by chance.

  • This probability is the significance value.
  • If the significance value is less than the significance level, the frequencies are significantly different.
  • For sparse tables, IBM® Cognos Analytics makes an adjustment to the chi-square test which reduces the contribution of cells with a small expected value, which would otherwise have a disproportionately large contribution to the statistic.

The effect size for this test is Cramérs V.