Bayesian Network Node Expert Options
The node expert options enable you to fine-tune the model-building process. To access the expert options, set Mode to Expert on the Expert tab.
Missing values. By default, IBM® SPSS® Modeler only uses records that have valid values for all fields used in the model. (This is sometimes called listwise deletion of missing values.) If you have a lot of missing data, you may find that this approach eliminates too many records, leaving you without enough data to generate a good model. In such cases, you can deselect the Use only complete records option. IBM SPSS Modeler then attempts to use as much information as possible to estimate the model, including records where some of the fields have missing values. (This is sometimes called pairwise deletion of missing values.) However, in some situations, using incomplete records in this manner can lead to computational problems in estimating the model.
Append all probabilities. Specifies whether probabilities for each category of the output field are added to each record processed by the node. If this option is not selected, the probability of only the predicted category is added.
Independence test. A test of independence assesses whether paired observations on two variables are independent of each other. Select the type of test to be used, available options are:
- Likelihood ratio. Tests for target-predictor independence by calculating a ratio between the maximum probability of a result under two different hypotheses.
- Pearson chi-square. Tests for target-predictor independence by using a null hypothesis that the relative frequencies of occurrence of observed events follow a specified frequency distribution.
Bayesian network models conduct conditional tests of independence where additional variables are used beyond the tested pairs. In addition, the models explore not only the relations between the target and predictors, but also the relations among the predictors themselves
Significance level. Used in conjunction with the Independence test settings, this enables you to set a cut-off value to be used when conducting the tests. The lower the value, the fewer links remains in the network; the default level is 0.01.
Maximal conditioning set size. The algorithm for creating a Markov Blanket structure uses conditioning sets of increasing size to carry out independence testing and remove unnecessary links from the network. Because tests involving a high number of conditioning variables require more time and memory for processing you can limit the number of variables to be included. This can be especially useful when processing data with strong dependencies among many variables. Note however that the resulting network may contain some superfluous links.
Specify the maximal number of conditioning variables to be used for independence testing. The default setting is 5.
Feature selection. These options enable you to restrict the number of inputs used when processing the model in order to speed up the model building process. This is especially useful when creating a Markov Blanket structure due to the possible large number of potential inputs; it enables you to select the inputs that are significantly related to the target variable.
- Inputs always selected. Using the Field Chooser (button to the right of the text field), select the fields from the dataset that are always to be used when building the Bayesian network model. The target field is always selected. Note that the Bayesian network may still drop the items from this list during the model building process if other tests do not consider it significant. So this option simply ensures that the items in the list are used in the model building process itself, not that they will absolutely appear in the resulting Bayesian model.
-
Maximum number of inputs. Specify the total number of inputs from the dataset
to be used when building the Bayesian network model. The highest number you can enter is the total
number of inputs in the dataset. Note: If the number of fields selected in Inputs always selected exceeds the value of Maximum number of inputs, an error message is displayed.