Insights in visualizations for counts
Insights for counts are available whenever count is displayed for each category of a single categorical field.
They are also available when count is displayed for each combination of categories of a pair of categorical fields in the visualization. In the latter case the pair may be two explanatory fields, as in the rows and columns of a heat map, or one explanatory and one repeat field, as in the bars and repeat (column) of a bar chart.
Insights for counts of combined categories of three categorical fields are supported for one explanatory and the combination of two repeat fields, as in the segments, repeat (column), and repeat (row) of a pie visualization.
Overview
Use such visualizations when you are interested in comparing the number of items in different categories, or combination of categories.
Algorithms
IBM® Cognos Analytics with Watson reports the average count across all categories of the specified response field and applies statistical tests to detect categories where the counts are statistically most different from the average.
Visualizations with two or three categorical fields and counts for each combination of categories are treated differently. Cognos Analytics does not only compare the counts across categories but detects any relationship between the categorical fields. Cognos Analytics treats one field as the response field and the others as the explanatory field.
Cognos Analytics reports the most frequent category in visualizations with one categorical explanatory field, one or two categorical repeat fields, and a count response field.
Details
- Single categorical field
-
The first test that is applied is the chi-square test of equal frequencies to establish whether any counts are available that are significantly different from the average. If the test result is significant, Cognos Analytics applies the influence chi-square test for each category separately. Cognos Analytics computes the effect size for categories where the influence test is statistically significant and reports the categories with the largest effect size under the meaningful differences.
- Restrictions
-
The following list describes the conditions that determine whether insights are suggested for this algorithm.
- Two categorical fields
-
Cognos Analytics treats one categorical field as the response field and the other as the explanatory field. The original count field is used as input to the algorithms.
Chi-square test of independence with the adjustment for sparse data is used to establish whether a relationship exists between the response field and the explanatory field. If the test result is significant, Cognos Analytics computes the predictive strength for this model as adjusted count R-squared, with low-frequency categories filtered. The relationship is declared reliable and the predictive strength is reported if it is greater than 10%.
If the test result above is significant, all combinations of explanatory and response categories are analyzed further by applying the influence chi-square test for each combination. Combinations of explanatory and response categories where the influence test is significant are considered influential. Effect size is computed for each influential combination of categories and the combinations with the largest effect size are reported under meaningful differences.
If the roles of the two categorical fields are explanatory and repeat, the most frequent algorithm is applied. The counts are summed over each distinct category of the explanatory field. The largest sum is reported, together with the number of categories having that sum. Note that the repeat field is not used by this algorithm, but only triggers when the algorithm is applied.
- Restrictions
-
The following lists describe the conditions that determine whether insights are suggested for this algorithm.
Three categorical fields
These algorithms are applied only when there is one explanatory field and two repeat fields. The combination of the two repeat fields is treated as if it were a single categorical field, where the categories are the pairs of categories from the two repeat fields.
The predictive strength is calculated exactly as in the two categorical fields case, using the paired repeat fields as the predictor of the explanatory field. The chi-square test of independence with the adjustment for sparse data is used to test significance of the relationship, and the adjusted count R-squared with low-frequency categories filtered is used to find the predictive strength.
The meaningful differences are calculated exactly as in the two categorical fields case, identifying combinations of the explanatory field and the paired repeat fields for which the count is unusual. The influence chi-square test is used to test significance for each combination, and the combinations with the largest effect size are reported.
The most frequent algorithm is applied exactly as in the two categorical fields case, summing the counts over each distinct category of the explanatory field. The largest sum is reported, together with the number of categories having that sum. Note that the repeat fields are not used by this algorithm, but only triggers when the algorithm is applied.
- Response
- Exactly 1
Summarization level = Count
Differences between Cognos Analytics version 11.1 R2 and R3
For visualizations with two categorical fields, when the response field had a category representing missing data (the "(no value)" category):
- In Cognos Analytics version 11.1 R2 the adjusted count R-squared calculation for predictive strength omitted data values for the missing-data category. In Cognos Analytics version 11.1 R3 the values are included. This may affect the predictive strength reported for heatmaps where the heat data slot has count-aggregated data and the rows and columns are two categorical fields.
- In Cognos Analytics version 11.1 R2 the meaningful differences did not report any unusual cells associated with the missing-data category. In Cognos Analytics version 11.1 R3 the cells are reported. This may affect the meaningful differences shown for heatmaps where the heat data slot has count-aggregated data and the rows and columns are two categorical fields.