DEPCATEGORIES Subcommand (TREE command)
For categorical dependent variables, the DEPCATEGORIES
subcommand controls the categories
of the dependent variable included in the model and/or specifies target
categories.
- By default, all valid categories are used in the analysis and user-missing values are excluded.
- There is no default target category.
-
DEPCATEGORIES
applies only to categorical dependent variables. If the dependent variable is scale, the subcommand is ignored and a warning is issued.
Example
TREE risk [n] BY income [o] age [s] creditscore [s] employment [n]
/DEPCATEGORIES USEVALUES=[VALID 99] TARGET=[1].
USEVALUES keyword
USEVALUES
controls the categories of the dependent variable included in the
model.
- The keyword is followed by an equals
sign (=) and a list of values enclosed in square brackets, as in:
USEVALUES=[1 2 3]
. - At least two values must be specified.
- Any cases with dependent variable values not included in the list are excluded from the analysis.
- Values can be string or numeric but must be consistent with the data type of the dependent variable. String and date values must be quoted. Date values must be consistent with the variable’s print format.
- If the dependent variable is nominal, the list can include user-missing values. If the dependent variable is ordinal, any user-missing values in the list are excluded and a warning is issued.
- The keywords
VALID
andMISSING
can be used to specify all valid values and all user-missing values respectively. For example,USEVALUES=[VALID MISSING]
will include all cases with valid or user-missing values for the dependent variable. - A warning is issued if a specified category does not exist in the data or in the training sample if split-sample validation is in effect. See the topic VALIDATION Subcommand (TREE command) for more information.
TARGET Keyword
The TARGET
keyword specifies categories of the dependent variable that are
of primary interest in the analysis. For example, if you are trying
to predict mortality, "death" would be defined as the target category.
If you are trying to identify individuals with high or medium credit
risk, you would define both "high credit risk" and "medium credit
risk" as target categories.
- The keyword is followed
by an equals sign (=) and a list of values enclosed in square brackets,
as in:
TARGET=[1].
- There is no default target category. If not specified, some classification rule options and gains-related output are not available.
- Values can be string or numeric but must be consistent with the data type of the dependent variable. String and date values must be quoted. Date values must be consistent with the variable’s print format.
- If
USEVALUES
is also specified, the target categories must also be included (either implicitly or explicitly) in theUSEVALUES
list. - If a target category is user-missing and the dependent variable is not nominal, the target category is ignored and a warning is issued.