The overall distribution is the distribution of records in the target field before any grouping with the input field or fields.
The overall distribution is essentially the expected distribution. If the inputs have no effect on the target, the input categories and levels (bins) would have the same distribution as the overall distribution. Comparing these distributions (conditional distributions) to the overall distribution can reveal the effect of the input categories or levels. The distribution of records in the target field might change for the categories or levels.
For example, if the input field is gender and the target field is favorite color, the overall distribution is the number of people for each color. This distribution is compared to the conditional distributions, which are the number of men for each color and the number of women for each color.
IBM® Cognos Analytics compares conditional distributions to the overall distribution in the following analyses:
- Key drivers for categorical target. The overall distribution is the marginal distribution of the target field. For one input, the comparison shows how the distribution changes for each category or level (bin) in the input field. For two inputs, the comparison shows how the distribution changes for each combination of categories or levels in the input fields.
- Decision trees. The overall distribution is the distribution of the root node, which is also the marginal distribution of the target field. The comparison shows how the distribution changes in each node.