Logistic Node Model Options
Model name. You can generate the model name automatically based on the target or ID field (or model type in cases where no such field is specified) or specify a custom name.
Use partitioned data. If a partition field is defined, this option ensures that data from only the training partition is used to build the model.
Create split models. Builds a separate model for each possible value of input fields that are specified as split fields. See Building Split Models for more information.
Procedure. Specifies whether a binomial or multinomial model is created. The options available in the dialog box vary depending on which type of modeling procedure is selected.
- Binomial. Used when the target field is a flag or nominal field with two discrete values (dichotomous), such as yes/no, on/off, male/female.
- Multinomial. Used when the target field is a nominal field with more than two values. You can specify Main effects, Full factorial, or Custom.
Include constant in equation. This option determines whether the resulting equations will include a constant term. In most situations, you should leave this option selected.
Binomial Models
For binomial models, the following methods and options are available:
Method. Specify the method to be used in building the logistic regression model.
- Enter. This is the default method, which enters all of the terms into the equation directly. No field selection is performed in building the model.
- Forwards Stepwise. The Forwards Stepwise method of field selection builds the equation in steps, as the name implies. The initial model is the simplest model possible, with no model terms (except the constant) in the equation. At each step, terms that have not yet been added to the model are evaluated, and if the best of those terms adds significantly to the predictive power of the model, it is added. In addition, terms that are currently in the model are reevaluated to determine if any of them can be removed without significantly detracting from the model. If so, they are removed. The process repeats, and other terms are added and/or removed. When no more terms can be added to improve the model, and no more terms can be removed without detracting from the model, the final model is generated.
- Backwards Stepwise. The Backwards Stepwise method is essentially the opposite of the Forwards Stepwise method. With this method, the initial model contains all of the terms as predictors. At each step, terms in the model are evaluated, and any terms that can be removed without significantly detracting from the model are removed. In addition, previously removed terms are reevaluated to determine if the best of those terms adds significantly to the predictive power of the model. If so, it is added back into the model. When no more terms can be removed without significantly detracting from the model, and no more terms can be added to improve the model, the final model is generated.
Categorical inputs. Lists the fields that are identified as categorical, that is, those with a measurement level of flag, nominal, or ordinal. You can specify the contrast and base category for each categorical field.
- Field Name. This column contains the field names of the categorical inputs. To add continuous or numerical inputs into this column, click the Add Fields icon to the right of the list and select the required inputs.
-
Contrast. The interpretation of the regression coefficients for a categorical field depends
on the contrasts that are used. The contrast determines how hypothesis tests are set up to compare
the estimated means. For example, if you know that a categorical field has implicit order, such as a
pattern or grouping, you can use the contrast to model that order. The available contrasts are:
Indicator. Contrasts indicate the presence or absence of category membership. This is the default method.
Simple. Each category of the predictor field, except the reference category, is compared to the reference category.
Difference. Each category of the predictor field, except the first category, is compared to the average effect of previous categories. Also known as reverse Helmert contrasts.
Helmert. Each category of the predictor field, except the last category, is compared to the average effect of subsequent categories.
Repeated. Each category of the predictor field, except the first category, is compared to the category that precedes it.
Polynomial. Orthogonal polynomial contrasts. Categories are assumed to be equally spaced. Polynomial contrasts are available for numeric fields only.
Deviation. Each category of the predictor field, except the reference category, is compared to the overall effect.
-
Base Category. Specifies how the reference category is determined for the selected contrast
type. Select First to use the first category for the input field—sorted
alphabetically—or select Last to use the last category. The default base
category applies to variables that are listed in the Categorical inputs area.
Note: This field is unavailable if the contrast setting is Difference, Helmert, Repeated, or Polynomial.
The estimate of each field’s effect on the overall response is computed as an increase or decrease in the likelihood of each of the other categories relative to the reference category. This can help you identify the fields and values that are more likely to give a specific response.
The base category is shown in the output as 0.0. This is because comparing it to itself produces an empty result. All other categories are shown as equations relevant to the base category. See the topic Logistic Nugget Model Details for more information.
Multinomial Models
For multinomial models the following methods and options are available:
Method. Specify the method to be used in building the logistic regression model.
- Enter. This is the default method, which enters all of the terms into the equation directly. No field selection is performed in building the model.
- Stepwise. The Stepwise method of field selection builds the equation in steps, as the name implies. The initial model is the simplest model possible, with no model terms (except the constant) in the equation. At each step, terms that have not yet been added to the model are evaluated, and if the best of those terms adds significantly to the predictive power of the model, it is added. In addition, terms that are currently in the model are reevaluated to determine if any of them can be removed without significantly detracting from the model. If so, they are removed. The process repeats, and other terms are added and/or removed. When no more terms can be added to improve the model, and no more terms can be removed without detracting from the model, the final model is generated.
- Forwards. The Forwards method of field selection is similar to the Stepwise method in that the model is built in steps. However, with this method, the initial model is the simplest model, and only the constant and terms can be added to the model. At each step, terms not yet in the model are tested based on how much they would improve the model, and the best of those terms is added to the model. When no more terms can be added, or the best candidate term does not produce a large-enough improvement in the model, the final model is generated.
- Backwards. The Backwards method is essentially the opposite of the Forwards method. With this method, the initial model contains all of the terms as predictors, and terms can only be removed from the model. Model terms that contribute little to the model are removed one by one until no more terms can be removed without significantly worsening the model, yielding the final model.
- Backwards Stepwise. The Backwards Stepwise method is essentially the opposite of the Stepwise method. With this method, the initial model contains all of the terms as predictors. At each step, terms in the model are evaluated, and any terms that can be removed without significantly detracting from the model are removed. In addition, previously removed terms are reevaluated to determine if the best of those terms adds significantly to the predictive power of the model. If so, it is added back into the model. When no more terms can be removed without significantly detracting from the model, and no more terms can be added to improve the model, the final model is generated.
Base category for target. Specifies how the reference category is determined. This is used as the baseline against which the regression equations for all other categories in the target are estimated. Select First to use the first category for the current target field—sorted alphabetically—or select Last to use the last category. Alternatively, you can select Specify to choose a specific category and select the desired value from the list. Available values can be defined for each field in a Type node.
Often you would specify the category in which you are least interested to be the base category, for example, a loss-leader product. The other categories are then related to this base category in a relative fashion to identify what makes them more likely to be in their own category. This can help you identify the fields and values that are more likely to give a specific response.
The base category is shown in the output as 0.0. This is because comparing it to itself produces an empty result. All other categories are shown as equations relevant to the base category. See the topic Logistic Nugget Model Details for more information.
Model type. There are three options for defining the terms in the model. Main Effects models include only the input fields individually and do not test interactions (multiplicative effects) between input fields. Full Factorial models include all interactions as well as the input field main effects. Full factorial models are better able to capture complex relationships but are also much more difficult to interpret and are more likely to suffer from overfitting. Because of the potentially large number of possible combinations, automatic field selection methods (methods other than Enter) are disabled for full factorial models. Custom models include only the terms (main effects and interactions) that you specify. When selecting this option, use the Model Terms list to add or remove terms in the model.
Model Terms. When building a Custom model, you will need to explicitly specify the terms in the model. The list shows the current set of terms for the model. The buttons on the right side of the Model Terms list enable you to add and remove model terms.
- To add terms to the model, click the Add new model terms button. See the topic Adding Terms to a Logistic Regression Model for more information.
- To delete terms, select the desired terms and click the Delete selected model terms button.