Automatic Data Preparation (linear models)

This view shows information about which fields were excluded and how transformed fields were derived in the automatic data preparation (ADP) step. For each field that was transformed or excluded, the table lists the field name, its role in the analysis, and the action taken by the ADP step. Fields are sorted by ascending alphabetical order of field names. The possible actions taken for each field include:

  • Derive duration: months computes the elapsed time in months from the values in a field containing dates to the current system date.
  • Derive duration: hours computes the elapsed time in hours from the values in a field containing times to the current system time.
  • Change measurement level from continuous to ordinal recasts continuous fields with less than 5 unique values as ordinal fields.
  • Change measurement level from ordinal to continuous recasts ordinal fields with more than 10 unique values as continuous fields.
  • Trim outliers sets values of continuous predictors that lie beyond a cutoff value (3 standard deviations from the mean) to the cutoff value.
  • Replace missing values replaces missing values of nominal fields with the mode, ordinal fields with the median, and continuous fields with the mean.
  • Merge categories to maximize association with target identifies "similar" predictor categories based upon the relationship between the input and the target. Categories that are not significantly different (that is, having a p-value greater than 0.05) are merged.
  • Exclude constant predictor / after outlier handling / after merging of categories removes predictors that have a single value, possibly after other ADP actions have been taken.