Analysis of variance (ANOVA)
Analysis of variance, or ANOVA, is a linear modeling method for evaluating the relationship among fields. For key drivers and for insights that are related to a number of charts, ANOVA tests whether the mean target value varies across categories of one input or combinations of categories of two inputs.
To test if the means are different, an ANOVA test compares the explained variance (caused by the input fields) to the unexplained variance (caused by the error source). If the ratio of explained variance to unexplained variance is high, the means are statistically different.
IBM® Cognos Analytics can calculate one-way ANOVA tests (with one input) and two-way ANOVA tests (with two inputs). If an input is continuous, the input is binned to create groups whose target means can be compared with the ANOVA test. A one-way ANOVA test is an extension of the t test, but an ANOVA test can compare any number of means. The t test can compare only two means.
Although an ANOVA test reveals a statistical difference between means, it does not indicate which means are different. IBM Cognos Analytics visualization insights feature reports groups that are causing the means to be different as meaningful differences.
One-way ANOVA
The one-way ANOVA test uses an F value. The following procedure describes how the F value is calculated:
- Calculate the overall mean for the continuous field.
- Calculate the mean square for the categorical field (the explained
variance).
- Calculate the sum of squares for the categorical field.
- For each category, subtract overall mean from the category’s mean.
- Take the square of each of these results and add them together.
- Divide the sum of squares for the categorical field by the appropriate degrees of freedom.
- Calculate the sum of squares for the categorical field.
- Calculate the mean square for the error source (the unexplained
variance).
- Calculate the sum of squares for the error source.
- Within each category, subtract the category’s mean from each record value.
- Take the square of each difference and add them together.
- Divide the sum of square for the error source by the appropriate degrees of freedom.
- Calculate the sum of squares for the error source.
- Divide the mean square for the categorical field by the mean square for the error source. In other words, calculate the ratio of explained variance to unexplained variance. This is the F value.
The F value is compared to a theoretical F distribution to determine the probability of obtaining the F value by chance.
- This probability is the significance value.
- If the significance value is less than the significance level, the means are significantly different.
Adjusted R2 is used to estimate model predictive strength. Significance level is set to 5% and the model predictive strength must be greater than 10% to indicate reliable predictive relationship between the target and the input field.
Predictive strength is reported for one-way key drivers and an insight for charts that display an average of numeric measure across categories of a categorical field.
Two-way ANOVA
Like the one-way ANOVA, the two-way ANOVA test calculates an F value. It is used to test whether means in the full two-way model are significantly different. The procedure is similar to the one-way ANOVA, except that two categorical fields are used as inputs instead of a single categorical field. Means and sum of squares statistics are computed for each combination of categories from the categorical fields.
Adjusted R2 is also used to estimate model predictive strength. Significance level is set to 5% and the model predictive strength must be greater than 10% for model to be considered. In addition, the two-way model must have at least 10% relative improvement over the predictive strengths of the nested one-way models to indicate reliable predictive relationship between the target and two input fields. Relative improvement is computed as the percentage of the difference between 100% and predictive strength of the nested one-way model.
Predictive strength is reported for two-way key drivers and an insight for charts that display an average of numeric measure across categories of two categorical fields.