Random Trees model nugget output

After you create a Random Trees model, the following information is available in the Output viewer:

Model information table

The Model information table provides key information about the model. The table always includes the following high-level model settings:

The name of the target field that is selected in either the Type node or the Random Trees node Fields tab.
The model building method - Random Trees.
The number of predictors input into the model.

The additional details that are shown in the table depend on whether you build a classification or regression model, and if the model is built to handle imbalanced data:

Classification model (default settings)
- Model accuracy
- Misclassification rule
Classification model (Handle imbalanced data selected)
- Gmean
- True positive rate, which is subdivided into classes.
Regression model
- Root mean squared error
- Relative error
- Variance explained

Records Summary

The summary shows how many records were used to fit the model, and how many were excluded. Both the number of records and the percentage of the whole number are shown. If the model was built to include frequency weight, the unweighted number of records that are included and excluded is also shown.

Predictor Importance

The Predictor Importance graph shows the importance of the top 10 inputs (predictors) in the model as a bar chart.

If there are more than 10 fields in the chart, you can change the selection of predictors that are included in the chart by using the slider beneath the chart. The indicator marks on the slider are a fixed width, and each mark on the slider represents 10 fields. You can move the indicator marks along the slider to display the next or previous 10 fields, ordered by predictor importance.

You can double-click the chart to open a separate dialog box in which you can edit the graph size. When you close this separate editing dialog box, the changes are applied to the chart that is displayed in the Output tab.

Top Decision Rules table

By default, this interactive table displays the statistics of the top rules, which are sorted by interestingness.

You can double-click the table to open a separate dialog box in which you can edit the rule information that is shown in the table. The information that is displayed, and the options that are available in the dialog box, depend on the data type of the target; such as, categorical, or continuous.

The following rule information is shown in the table:

The details of how the rule is applied and made up
If the results are in the most frequent category
Rule accuracy
Trees accuracy
Interestingness index

The interestingness index is calculated by using the following formula:

In this formula:

P(A(t)) is the trees accuracy
P(B(t)) is the rule accuracy
P(B(t)|A(t)) represents correct predictions by both the trees and the node
The remaining piece of the formula represents incorrect predictions by both the trees and the node.

You can alter the rule table layout by using the following Table contents options:

Top decision rules The top five decision rules, which are sorted by the interestingness index.
All rules The table contains all of the rules that are produced by the model but shows only 20 rules per page. When you select this layout, you can search for a rule by using the additional options of Find rule by ID and Page.

In addition, for a categorical target, you can alter the rule table layout by using the Top rules by category option. The top five decision rules are sorted by the percentage of total records for a Target category that you select.

Note: For categorical targets, the table is only available when Handle imbalanced data is not selected in the Basics tab of the Build Options.

If you change the layout of the rules table, you can copy the modified rules table back to the Output viewer by clicking the Copy to Viewer button at the upper left of the dialog box.

Confusion Matrix

For classification models, the confusion matrix shows the number of predicted results versus the actual observed results, including the proportion of correct predictions.

Note: The confusion matrix is not available for regression models, nor when Handle imbalanced data is selected in the Basics tab of the Build Options.