TRANSFORM Subcommand (ADP command)
The TRANSFORM
subcommand is used to merge similar
categories of categorical inputs, bin values of continuous inputs,
and construct and select new input fields from continuous inputs using
principal components analysis.
MERGESUPERVISED Keyword
The MERGESUPERVISED
keyword specifies how to
merge similar categories of a nominal or ordinal input in the presence
of a target.
- If there are no categorical inputs,
MERGESUPERVISED
is ignored. - If there is no target specified on the
FIELDS
subcommand,MERGESUPERVISED
is ignored.
YES(PVALUE=value). Supervised merge. Similar categories
are identified based upon the relationship between the input and the
target. Categories that are not significantly different; that is,
having a p-value greater than the value of PVALUE
,
are merged. Specify a value greater than 0 and less than or equal
to 1. The default is 0.05. YES
is the default.
NO. Do not merge categories.
MERGEUNSUPERVISED Keyword
The MERGEUNSUPERVISED
keyword specifies how to
merge similar categories of a nominal or ordinal input when there
is no target.
- If there are no categorical inputs,
MERGEUNSUPERVISED
i s ignored. - If there is a target specified on the
FIELDS
subcommand,MERGEUNSUPERVISED
is ignored.
YES(ORDINAL|NOMINAL|MINPCT=value). Unsupervised merge. The
equal frequency method is used to merge categories with less than MINPCT
of
the total number of records. Specify a value greater than or equal
to 0 and less than or equal to 100. The default is 10 if MINPCT
is
not specified. If YES
is specified without ORDINAL
or NOMINAL
,
then no merging is performed.
NO. Do not merge categories. NO
is
the default.
BINNING Keyword
The BINNING
keyword specifies how to discretize
continuous inputs in the presence of a categorical target.
SUPERVISED(PVALUE=value). Supervised binning. Bins
are created based upon the properties of "homogeneous subsets", which
are identified by the Scheffe method using PVALUE
as
the alpha for the critical value for determining homogeneous subsets. SUPERVISED
is
the default. Specify a value greater than 0 and less than or equal
to 1. The default is 0.05
If there is no target specified on the FIELDS
subcommand,
or the target is not categorical, or there are no continuous inputs,
then SUPERVISED
is ignored.
NONE. Do not bin values of continuous inputs.
SELECTION Keyword
The SELECTION
keyword specifies how to perform
feature selection for continuous inputs in the presence of a continuous
target.
YES(PVALUE=value). Perform feature selection. A
continuous input is removed from the analysis if the p-value
for its correlation with the target is greater than PVALUE
. YES
is
the default.
If there is no target specified on the FIELDS
subcommand,
or the target is not continuous, or there are no continuous inputs,
then YES
is ignored.
NO. Do not perform feature selection.
CONSTRUCTION Keyword
The CONSTRUCTION
keyword specifies how to perform
feature construction for continuous inputs in the presence of a continuous
target.
YES(ROOT=rootname). Perform feature construction. New
predictors are constructed from groups of "similar" predictors using
principal component analysis. Optionally specify the rootname for
constructed predictors using ROOT
in parentheses.
Specify a rootname (no quotes). The default is feature
If there is no target specified on the FIELDS
subcommand,
or the target is not continuous, or there are no continuous inputs,
then YES
is ignored.
NO. Do not perform feature construction. NO
is
the default.