Chi-square test of independence
The chi-square test of independence determines whether two categorical fields are independent. If the fields are not independent, they are associated.
Note: This statistical test is a fixed system operation and cannot be modified.
The following procedure describes how the chi-square value is calculated:
- Determine the expected frequency with the assumption that the fields are independent. The
expected frequency for each combination of categories is the joint probability for the two fields
that are multiplied by the total count. The joint probability of two independent fields is the
product of the two probabilities for each combination of categories.
For example, consider two fields, gender and favorite color. The total count is 100, with 40 males, and 20 people whose favorite color is gray. Assuming that gender and color preference are independent, the expected frequency of males whose favorite color is gray is (40/100)*(20/100)*100, which calculates to 8.
- For each combination, subtract the expected frequency from the actual (observed) frequency.
- Take the square of each of these results and divide each square by the expected frequency.
- Add up all the results.
The chi-square value is compared to a theoretical chi-square distribution to determine the probability of obtaining the value by chance.
- This probability is the significance value.
- If the significance value is less than the significance level, the frequencies are significantly different.
- For sparse tables, IBM® Cognos Analytics adjusts the chi-square test to reduce the contribution of cells with a small expected value, which would otherwise have a disproportionately large contribution to the statistic.
The effect size for this test is Cramérs V.