Analysis Output Browser
The analysis output browser displays the results of executing the Analysis node. The usual saving, exporting, and printing options are available from the File menu. See the topic Viewing output for more information.
When you first browse Analysis output, the results are expanded. To hide results after viewing them, use the expander control to the left of the item to collapse the specific results you want to hide or click the Collapse All button to collapse all results. To see results again after collapsing them, use the expander control to the left of the item to show the results or click the Expand All button to show all results.
Results for output field. The Analysis output contains a section for each output field for which there is a corresponding prediction field created by a generated model.
Comparing. Within the output field section is a subsection for each prediction field associated with that output field. For categorical output fields, the top level of this section contains a table showing the number and percentage of correct and incorrect predictions and the total number of records in the stream. For numeric output fields, this section shows the following information:
- Minimum Error. Shows the minimum error (difference between observed and predicted values).
- Maximum Error. Shows the maximum error.
- Mean Error. Shows the average (mean) of errors across all records. This indicates whether there is a systematic bias (a stronger tendency to overestimate than to underestimate, or vice versa) in the model.
- Mean Absolute Error. Shows the average of the absolute values of the errors across all records. Indicates the average magnitude of error, independent of the direction.
- Standard Deviation. Shows the standard deviation of the errors.
- Linear Correlation. Shows the linear correlation between the predicted and actual values. This statistic varies between –1.0 and 1.0. Values close to +1.0 indicate a strong positive association, so that high predicted values are associated with high actual values and low predicted values are associated with low actual values. Values close to –1.0 indicate a strong negative association, so that high predicted values are associated with low actual values, and vice versa. Values close to 0.0 indicate a weak association, so that predicted values are more or less independent of actual values. Note: A blank entry here indicates that linear correlation cannot be computed in this case, because either the actual or predicted values are constant.
- Occurrences. Shows the number of records used in the analysis.
Coincidence Matrix. For categorical output fields, if you requested a coincidence matrix in the analysis options, a subsection appears here containing the matrix. The rows represent actual observed values, and the columns represent predicted values. The cell in the table indicates the number of records for each combination of predicted and actual values.
Performance Evaluation. For categorical output fields, if you requested performance evaluation statistics in the analysis options, the performance evaluation results appear here. Each output category is listed with its performance evaluation statistic.
Confidence Values Report. For categorical output fields, if you requested confidence values in the analysis options, the values appear here. The following statistics are reported for model confidence values:
- Range. Shows the range (smallest and largest values) of confidence values for records in the stream data.
- Mean Correct. Shows the average confidence for records that are classified correctly.
- Mean Incorrect. Shows the average confidence for records that are classified incorrectly.
- Always Correct Above. Shows the confidence threshold above which predictions are always correct and shows the percentage of cases meeting this criterion.
- Always Incorrect Below. Shows the confidence threshold below which predictions are always incorrect and shows the percentage of cases meeting this criterion.
- X% Accuracy Above. Shows the confidence level at which accuracy is X%. X is approximately the value specified for Threshold for in the Analysis options. For some models and datasets, it is not possible to choose a confidence value that gives the exact threshold specified in the options (usually due to clusters of similar cases with the same confidence value near the threshold). The threshold reported is the closest value to the specified accuracy criterion that can be obtained with a single confidence value threshold.
- X Fold Correct Above. Shows the confidence value at which accuracy is X times better than it is for the overall dataset. X is the value specified for Improve accuracy in the Analysis options.
Agreement between. If two or more generated models that predict the same output field are included in the stream, you will also see statistics on the agreement between predictions generated by the models. This includes the number and percentage of records for which the predictions agree (for categorical output fields) or error summary statistics (for continuous output fields). For categorical fields, it includes an analysis of predictions compared to actual values for the subset of records on which the models agree (generate the same predicted value).
Evaluation Metrics. For binary classifiers, if you requested evaluation metrics in the analysis options, the values of the AUC and Gini coefficient evaluation metrics are shown in a table in this section. The table has one row for each binary classifier model. The evaluation metrics table is shown for each output field, rather than for each model.