Two-way key drivers

Two-way drivers rely on modeling and ranking pairs of categorical predictors at one time.

Overview

Given a target field, IBM® Cognos Analytics with Watson uses a statistical model for analysis of a pair of other data fields and estimates its strength in predicting the target values. Search over different predictor pairs is usually not exhaustive and also some high-ranking pairs can be filtered out from the final results. The goal is to provide an overview and variety of predictor pairs that improve upon predictive strength of a single predictor models that are displayed as one-way drivers. Therefore, the insights obtained from one-way drivers are expanded and the user obtains relevant information on the pairs of fields in the data. Both one way driver and two-way driver analysis results are available in the driver analysis and spiral charts. They can be viewed separately by selecting a corresponding chart viewing option. Each displayed one-way or two-way driver can be expanded into a new visualization directly from the Driver analysis visualization in Explore.

Algorithms

Analysis for each two-way driver is based on a statistical model that includes the target and a pair of categorical predictors. The model is applied after data preparation and building all the one-way drivers. The first predictor in the pair is selected from the top 50 one-way drivers and the second is selected from the top 25 one-way drivers. This search strategy ensures that most of the top-ranking predictor pairs would be considered for modeling. The two-way ANOVA (analysis of variance) analysis is applied for numeric targets and Chi-square test of independence is applied for categorical targets with the chi-square adjustment for sparse data.

For each considered pair of fields, a hypothesis test on whether the pair has a significant impact on the target is performed. Only those pairs which pass the test and have sufficiently high predictive strength are selected as possible two-way drivers.

Details

The restriction of selection of data fields and data rows for one-way drivers apply to the two-way drivers as well. This is expected as potential predictor fields for two-way drivers are selected from the top one-way drivers based on their respective predictive strength. However, the model significance of one-way driver and the minimum predictive strength is not required for their entry into a two-way model. A resulting two-way driver must have its predictive strength higher than 10% and provide more than 10% relative improvement over the predictive strength for each of the contained one-way drivers. Relative improvement is computed as the percentage of the difference between 100% and predictive strength of the nested one-way driver. Resulting two-way drivers that satisfy these criteria are ranked by their predictive strength and the top 20 are made available for display.