RULES Subcommand (TREE command)
The RULES
subcommand generates syntax that can be used to select or classify
cases based on values of independent (predictor) variables.
- You can generate rules for all nodes, all terminal nodes, the top n terminal nodes, terminal nodes that correspond to the top n percent of cases, or nodes with index values that meet or exceed a cutoff value.
- Rules are available in three different forms: internal command syntax format, SQL, and generic (plain English pseudocode).
- You can specify an external destination file for the rules.
- Each keyword is followed by an equals sign (=) and the value for that keyword.
Example
TREE risk [o] BY income age creditscore
/RULES NODES=TERMINAL SYNTAX=INTERNAL TYPE=SCORING
OUTFILE='/jobfiles/treescores.sps'.
NODES Keyword
The NODES
keyword specifies the scope of generated rules. Specify one of the
following alternatives:
TERMINAL. Generates rules for each terminal node. This is the default.
ALL. Generates rules for all nodes. Rules are shown for all parent and terminal nodes.
For categorical dependent variables with defined target categories, the following additional alternatives are available:
TOPN(value). Generates rules for the top n terminal nodes based on index values. The number must be a positive integer, enclosed in parentheses. If the number exceeds the number of nodes in the tree, a warning is issued and rules are generated for all terminal nodes.
TOPPCT(value). Generates rules for terminal nodes for the top n percent of cases based on index values. The percent value must be a positive number greater than zero and less than 100, enclosed in parentheses.
MININDEX(value). Generates rules for all terminal nodes with an index value greater than or equal to the specified value. The value be a positive number, enclosed in parentheses.
SYNTAX Keyword
The SYNTAX
keyword specifies the syntax of generated rules. It determines the
form of the selection rules in both output displayed in the Viewer
and selection rules saved to an external file.
INTERNAL. Command syntax
language. Rules are expressed as a set of commands that
define a filter condition that can be used to select subsets of cases
(this is the default) or as COMPUTE
statements that can be used to score cases (with TYPE=SCORING
).
SQL. SQL. Standard SQL rules are generated to select/extract records from a database or assign values to those records. The generated SQL rules do not include any table names or other data source information.
GENERIC. Plain English pseudocode. Rules are expressed as a set of logical "if...then" statements that describe the model’s classifications or predictions for each node.
TYPE Keyword
The TYPE
keyword specifies the type of SQL or internal command syntax rules
to generate. It is ignored if generic rules are requested.
SCORING. Scoring of
cases. The rules can be used to assign the model’s
predictions to cases that meet node membership criteria. A separate
rule is generated for each node within the scope specified on the NODES
keyword. This is the default.
SELECTION. Selection
of cases. The rules can be used to select cases that
meet node membership criteria. For internal command syntax and SQL
rules, a single rule is generated to select all cases within the scope
specified on the NODES
keyword.
Note: For internal command syntax or SQL rules with NODES=TERMINAL
and NODES=ALL
, TYPE=SELECTION
will produce
a rule that effectively selects every case included in the analysis.
SURROGATES Keyword
For CRT and QUEST, the SURROGATES
keyword controls whether rules use surrogate
predictors to classify cases that have missing predictor values. The
keyword is ignored if the method is not CRT or QUEST or if generic
rules are requested.
Rules that include surrogates can be quite complex. In general, if you just want to derive conceptual information about your tree, exclude surrogates. If some cases have incomplete predictor data and you want rules that mimic your tree, include surrogates.
INCLUDE. Include surrogates. Surrogate predictors are used in generated rules. This is the default.
EXCLUDE. Exclude surrogates. Rules exclude surrogate predictors.
LABELS Keyword
The LABELS
keyword specifies whether value and variable labels are used in
generic decision rules.
- By default, any defined value and variable labels are used. When labels aren’t available, values and variable names are used.
-
LABELS
is ignored for SQL and internal command syntax rules.
YES. Any defined value and variable labels are used in generic rules. This is the default.
NO. Values and variable names are used instead of labels.
OUTFILE Keyword
OUTFILE
writes the rules to an external text file.
- The keyword is followed by an equals sign (=) and a file specification enclosed in quotes.
- If the file specification includes a path, an error will result if the specified directory/folder location doesn’t exist.
- For command syntax, the file can be used as a command syntax file in both interactive and batch modes.
- For SQL syntax, the generated SQL does not include any table names or other data source information.
-
OUTFILE
is ignored ifNODES=NONE
, and a warning is issued.