TRANSFORM Subcommand (ADP command)

The TRANSFORM subcommand is used to merge similar categories of categorical inputs, bin values of continuous inputs, and construct and select new input fields from continuous inputs using principal components analysis.

MERGESUPERVISED Keyword

The MERGESUPERVISED keyword specifies how to merge similar categories of a nominal or ordinal input in the presence of a target.

  • If there are no categorical inputs, MERGESUPERVISED is ignored.
  • If there is no target specified on the FIELDS subcommand, MERGESUPERVISED is ignored.

YES(PVALUE=value). Supervised merge. Similar categories are identified based upon the relationship between the input and the target. Categories that are not significantly different; that is, having a p-value greater than the value of PVALUE, are merged. Specify a value greater than 0 and less than or equal to 1. The default is 0.05. YES is the default.

NO. Do not merge categories.

MERGEUNSUPERVISED Keyword

The MERGEUNSUPERVISED keyword specifies how to merge similar categories of a nominal or ordinal input when there is no target.

  • If there are no categorical inputs, MERGEUNSUPERVISEDi s ignored.
  • If there is a target specified on the FIELDS subcommand, MERGEUNSUPERVISED is ignored.

YES(ORDINAL|NOMINAL|MINPCT=value). Unsupervised merge. The equal frequency method is used to merge categories with less than MINPCTof the total number of records. Specify a value greater than or equal to 0 and less than or equal to 100. The default is 10 if MINPCTis not specified. If YES is specified without ORDINAL or NOMINAL, then no merging is performed.

NO. Do not merge categories. NO is the default.

BINNING Keyword

The BINNING keyword specifies how to discretize continuous inputs in the presence of a categorical target.

SUPERVISED(PVALUE=value). Supervised binning. Bins are created based upon the properties of "homogeneous subsets", which are identified by the Scheffe method using PVALUE as the alpha for the critical value for determining homogeneous subsets. SUPERVISED is the default. Specify a value greater than 0 and less than or equal to 1. The default is 0.05

If there is no target specified on the FIELDS subcommand, or the target is not categorical, or there are no continuous inputs, then SUPERVISED is ignored.

NONE. Do not bin values of continuous inputs.

SELECTION Keyword

The SELECTION keyword specifies how to perform feature selection for continuous inputs in the presence of a continuous target.

YES(PVALUE=value). Perform feature selection. A continuous input is removed from the analysis if the p-value for its correlation with the target is greater than PVALUE. YES is the default.

If there is no target specified on the FIELDSsubcommand, or the target is not continuous, or there are no continuous inputs, then YESis ignored.

NO. Do not perform feature selection.

CONSTRUCTION Keyword

The CONSTRUCTIONkeyword specifies how to perform feature construction for continuous inputs in the presence of a continuous target.

YES(ROOT=rootname). Perform feature construction. New predictors are constructed from groups of "similar" predictors using principal component analysis. Optionally specify the rootname for constructed predictors using ROOT in parentheses. Specify a rootname (no quotes). The default is feature

If there is no target specified on the FIELDS subcommand, or the target is not continuous, or there are no continuous inputs, then YES is ignored.

NO. Do not perform feature construction. NO is the default.