RULES Subcommand (TREE command)

The RULES subcommand generates syntax that can be used to select or classify cases based on values of independent (predictor) variables.

  • You can generate rules for all nodes, all terminal nodes, the top n terminal nodes, terminal nodes that correspond to the top n percent of cases, or nodes with index values that meet or exceed a cutoff value.
  • Rules are available in three different forms: internal command syntax format, SQL, and generic (plain English pseudocode).
  • You can specify an external destination file for the rules.
  • Each keyword is followed by an equals sign (=) and the value for that keyword.

Example

TREE risk [o] BY income age creditscore
 /RULES NODES=TERMINAL SYNTAX=INTERNAL TYPE=SCORING
  OUTFILE='/jobfiles/treescores.sps'.

NODES Keyword

The NODES keyword specifies the scope of generated rules. Specify one of the following alternatives:

TERMINAL. Generates rules for each terminal node. This is the default.

ALL. Generates rules for all nodes. Rules are shown for all parent and terminal nodes.

For categorical dependent variables with defined target categories, the following additional alternatives are available:

TOPN(value). Generates rules for the top n terminal nodes based on index values. The number must be a positive integer, enclosed in parentheses. If the number exceeds the number of nodes in the tree, a warning is issued and rules are generated for all terminal nodes.

TOPPCT(value). Generates rules for terminal nodes for the top n percent of cases based on index values. The percent value must be a positive number greater than zero and less than 100, enclosed in parentheses.

MININDEX(value). Generates rules for all terminal nodes with an index value greater than or equal to the specified value. The value be a positive number, enclosed in parentheses.

SYNTAX Keyword

The SYNTAX keyword specifies the syntax of generated rules. It determines the form of the selection rules in both output displayed in the Viewer and selection rules saved to an external file.

INTERNAL. Command syntax language. Rules are expressed as a set of commands that define a filter condition that can be used to select subsets of cases (this is the default) or as COMPUTE statements that can be used to score cases (with TYPE=SCORING).

SQL. SQL. Standard SQL rules are generated to select/extract records from a database or assign values to those records. The generated SQL rules do not include any table names or other data source information.

GENERIC. Plain English pseudocode. Rules are expressed as a set of logical "if...then" statements that describe the model’s classifications or predictions for each node.

TYPE Keyword

The TYPE keyword specifies the type of SQL or internal command syntax rules to generate. It is ignored if generic rules are requested.

SCORING. Scoring of cases. The rules can be used to assign the model’s predictions to cases that meet node membership criteria. A separate rule is generated for each node within the scope specified on the NODES keyword. This is the default.

SELECTION. Selection of cases. The rules can be used to select cases that meet node membership criteria. For internal command syntax and SQL rules, a single rule is generated to select all cases within the scope specified on the NODES keyword.

Note: For internal command syntax or SQL rules with NODES=TERMINAL and NODES=ALL, TYPE=SELECTION will produce a rule that effectively selects every case included in the analysis.

SURROGATES Keyword

For CRT and QUEST, the SURROGATES keyword controls whether rules use surrogate predictors to classify cases that have missing predictor values. The keyword is ignored if the method is not CRT or QUEST or if generic rules are requested.

Rules that include surrogates can be quite complex. In general, if you just want to derive conceptual information about your tree, exclude surrogates. If some cases have incomplete predictor data and you want rules that mimic your tree, include surrogates.

INCLUDE. Include surrogates. Surrogate predictors are used in generated rules. This is the default.

EXCLUDE. Exclude surrogates. Rules exclude surrogate predictors.

LABELS Keyword

The LABELS keyword specifies whether value and variable labels are used in generic decision rules.

  • By default, any defined value and variable labels are used. When labels aren’t available, values and variable names are used.
  • LABELS is ignored for SQL and internal command syntax rules.

YES. Any defined value and variable labels are used in generic rules. This is the default.

NO. Values and variable names are used instead of labels.

OUTFILE Keyword

OUTFILE writes the rules to an external text file.

  • The keyword is followed by an equals sign (=) and a file specification enclosed in quotes.
  • If the file specification includes a path, an error will result if the specified directory/folder location doesn’t exist.
  • For command syntax, the file can be used as a command syntax file in both interactive and batch modes.
  • For SQL syntax, the generated SQL does not include any table names or other data source information.
  • OUTFILE is ignored if NODES=NONE, and a warning is issued.