Analyze
The Analyze panel is activated only if the objective is to predict a target. You can use it to specify whether the model is to include additional variables to contain:
- probabilities for each possible target field value
- distances between a case and its nearest neighbors
- raw and adjusted propensity scores (for flag targets only)
Append all probabilities. If this option is checked, probabilities for each possible value of a nominal or flag target field are displayed for each record processed by the node. If this option is unchecked, only the predicted value and its probability are displayed for nominal or flag target fields.
Save distances between cases and k nearest neighbors. For each focal record, a separate variable is created for each of the focal record’s k nearest neighbors (from the training sample) and the corresponding k nearest distances.
Propensity Scores
Propensity scores can be enabled in the modeling node, and on the Settings tab in the model nugget. This functionality is available only when the selected target is a flag field. See the topic Propensity Scores for more information.
Calculate raw propensity scores. Raw propensity scores are derived from the model based on the training data only. If the model predicts the true value (will respond), then the propensity is the same as P, where P is the probability of the prediction. If the model predicts the false value, then the propensity is calculated as (1 – P).
- If you choose this option when building the model, propensity scores will be enabled in the model nugget by default. However, you can always choose to enable raw propensity scores in the model nugget whether or not you select them in the modeling node.
- When scoring the model, raw propensity scores will be added in a field with the letters RP appended to the standard prefix. For example, if the predictions are in a field named $R-churn, the name of the propensity score field will be $RRP-churn.
Calculate adjusted propensity scores. Raw propensities are based purely on estimates given by the model, which may be overfitted, leading to over-optimistic estimates of propensity. Adjusted propensities attempt to compensate by looking at how the model performs on the test or validation partitions and adjusting the propensities to give a better estimate accordingly.
- This setting requires that a valid partition field is present in the stream.
- Unlike raw confidence scores, adjusted propensity scores must be calculated when building the model; otherwise, they will not be available when scoring the model nugget.
- When scoring the model, adjusted propensity scores will be added in a field with the letters AP appended to the standard prefix. For example, if the predictions are in a field named $R-churn, the name of the propensity score field will be $RAP-churn. Adjusted propensity scores are not available for logistic regression models.
- When calculating the adjusted propensity scores, the test or validation partition used for the calculation must not have been balanced. To avoid this, be sure the Only balance training data option is selected in any upstream Balance nodes. In addition, if a complex sample has been taken upstream this will invalidate the adjusted propensity scores.
- Adjusted propensity scores are not available for "boosted" tree and rule set models. See the topic Boosted C5.0 Models for more information.