Construction and feature selection
To improve the predictive power of your data, you can transform the input fields, or construct new ones based on the existing fields.
Transform, construct and select input fields to improve predictive power. Toggles all fields on the panel either on or off.
Merge sparse categories to maximize association with target. Select this to make a more parsimonious model by reducing the number of variables to be processed in association with the target. If required, change the probability value from the default of 0.05.
Note that if all categories are merged into one, the original and derived versions of the field are excluded because they have no value as a predictor.
When there is no target, merge sparse categories based on counts. If you are dealing with data that has no target, you can choose to merge sparse categories of either, or both, ordinal (ordered set) and nominal (set) features. Specify the minimum percentage of cases, or records, in the data that identifies the categories to be merged; the default is 10.
Categories are merged using the following rules:
- Merging is not performed on binary fields.
- If there are only two categories during merging, merging stops.
- If there is no original category, nor any category created during merging, with fewer than the specified minimum percent of cases, merging stops.
Bin continuous fields while preserving predictive power. Where you have data that includes a categorical target, you can bin continuous inputs with strong associations to improve processing performance. If required, change the probability value for the homogenous subsets from the default of 0.05.
If the binning operation results in a single bin for a particular field, the original and binned versions of the field are excluded because they have no value as a predictor.
Perform feature selection. Select this option to remove features with a low correlation coefficient. If required, change the probability value from the default of 0.05.
This option only applies to continuous input features where the target is continuous, and to categorical input features.
Perform feature construction. Select this option to derive new features from a combination of several existing features (which are then discarded from modeling).
This option only applies to continuous input features where the target is continuous, or where there is no target.