Auto Classifier Node Model Options
The Model tab of the Auto Classifier node enables you to specify the number of models to be created, along with the criteria used to compare models.
Model name You can generate the model name automatically based on the target or ID field (or model type in cases where no such field is specified) or specify a custom name.
Use partitioned data. If a partition field is defined, this option ensures that data from only the training partition is used to build the model.
Create split models. Builds a separate model for each possible value of input fields that are specified as split fields. See the topic Building Split Models for more information.
Rank models by. Specifies the criteria used to compare and rank models. Options include overall accuracy, area under the ROC curve, profit, lift, and number of fields. Note that all of these measures will be available in the summary report regardless of which is selected here.
Note: For a nominal (set) target, ranking is restricted to either Overall Accuracy or Number of Fields.
When calculating profits, lift, and related statistics, the True value defined for the target field is assumed to represent a hit.
- Overall accuracy The percentage of records that is correctly predicted by the model relative to the total number of records.
- Area under the ROC curve The ROC curve provides an index for the performance of a model. The further the curve lies above the reference line, the more accurate the test.
- Profit (Cumulative) The sum of profits across cumulative percentiles (sorted in terms of confidence for the prediction), as computed based on the specified cost, revenue, and weight criteria. Typically, the profit starts near 0 for the top percentile, increases steadily, and then decreases. For a good model, profits will show a well-defined peak, which is reported along with the percentile where it occurs. For a model that provides no information, the profit curve will be relatively straight and may be increasing, decreasing, or level, depending on the cost/revenue structure that applies.
- Lift (Cumulative) The ratio of hits in cumulative quantiles relative to the overall sample (where quantiles are sorted in terms of confidence for the prediction). For example, a lift value of 3 for the top quantile indicates a hit rate three times as high as for the sample overall. For a good model, lift should start well above 1.0 for the top quantiles and then drop off sharply toward 1.0 for the lower quantiles. For a model that provides no information, the lift will hover around 1.0.
- Number of fields Ranks models based on the number of input fields used.
Rank models using. If a partition is in use, you can specify whether ranks are based on the training dataset or the testing set. With large datasets, use of a partition for preliminary screening of models may greatly improve performance.
Number of models to use. Specifies the maximum number of models to be listed in the model nugget produced by the node. The top-ranking models are listed according to the specified ranking criterion. Note that increasing this limit may slow performance. The maximum allowable value is 100.
Calculate predictor importance. For models that produce an appropriate measure of importance, you can display a chart that indicates the relative importance of each predictor in estimating the model. Typically you will want to focus your modeling efforts on the predictors that matter most, and consider dropping or ignoring those that matter least. Note that predictor importance may extend the time needed to calculate some models, and is not recommended if you simply want a broad comparison across many different models. It is more useful once you have narrowed your analysis to a handful of models that you want to explore in greater detail. See the topic Predictor Importance for more information.
Profit Criteria. Note. Only for flag targets. Profit equals the revenue for each record minus the cost for the record. Profits for a quantile are simply the sum of profits for all records in the quantile. Profits are assumed to apply only to hits, but costs apply to all records.
- Costs. Specify the cost associated with each record. You can select Fixed or Variable costs. For fixed costs, specify the cost value. For variable costs, click the Field Chooser button to select a field as the cost field. (Costs is not available for ROC charts.)
- Revenue. Specify the revenue associated with each record that represents a hit. You can select Fixed or Variable costs. For fixed revenue, specify the revenue value. For variable revenue, click the Field Chooser button to select a field as the revenue field. (Revenue is not available for ROC charts.)
- Weight. If the records in your data represent more than one unit, you can use frequency weights to adjust the results. Specify the weight associated with each record, using Fixed or Variable weights. For fixed weights, specify the weight value (the number of units per record). For variable weights, click the Field Chooser button to select a field as the weight field. (Weight is not available for ROC charts.)
Lift Criteria. Note. Only for flag targets. Specifies the percentile to use for lift calculations. Note that you can also change this value when comparing the results. See the topic Automated Model Nuggets for more information.