One-way key drivers
One-way drivers are a model-based exploratory tool.
Overview
Given a target field, the tool uses a statistical model to analyze any other available data field and estimates its strength in predicting the target values. Such data fields are called target predictors or drivers. Each potentially relevant data field is analyzed and only the top drivers regarding their predictive strength are displayed. You obtain insights regarding available drivers and their ranking according to their predictive strength for the specified target in the data. One-way driver analysis results are available both in the driver analysis and spiral visualizations. Visual drill-down into each separate driver is enabled for driver analysis visualization in Explore only.
Algorithms
Analysis for each one-way driver is based on a statistical model that includes the target and a single categorical predictor. The model is applied after the data preparation step for the target field and all potential predictor fields. For example, all numeric predictor fields are binned during the data preparation step and treated as categorical in the analysis. One-way ANOVA is applied for numeric targets and Chi-square test of independence is applied for categorical targets with the chi-square adjustment for sparse data.
For each field in the list of potential drivers, a hypothesis test on whether the field has a significant impact on the target is performed. Only those fields which pass the test and have sufficiently high predictive strength are selected as possible one way key drivers.
Details
Preliminary analysis based on smarts capabilities reduces the number of potential drivers in some cases. The goal is to remove irrelevant or redundant fields. The list of used drivers is available in the UI and you can add any initially excluded drivers to the analysis. The top 20 resulting drivers with predictive strength higher than 10% are available for display.
Some restrictions are enforced on the size of the data to improve performance and speed. If the data contains more than 250 fields, the least relevant fields are excluded before driver analysis. You can add the excluded fields back into the analysis through the UI as described above. If specified data contains more than 10,000 rows, it might be sampled down to approximately 10,000 rows for purpose of driver analysis. A warning is displayed in such instances: To improve performance, due to the number of rows in the data source, the analysis is based on a representative sample of the entire data. The results are expected to closely approximate results that would be obtained by using all the rows in the original data.