Comparing Model Accuracy

  1. Run both Logistic nodes to create the model nuggets, which are added to the stream and to the Models palette in the upper-right corner.
    Figure 1. Attaching the model nuggets
    Attaching the model nuggets
  2. Attach Analysis nodes to the model nuggets and run the Analysis nodes using their default settings.
Figure 2. Attaching the Analysis nodes
Attaching the Analysis nodes

The Analysis of the non ADP-derived model shows that just running the data through the Logistic Regression node with its default settings gives a model with low accuracy - just 10.6%.

Figure 3. Non ADP-derived model results
Non ADP-derived model results

The Analysis of the ADP-derived model shows that running the data through the default ADP settings, you have built a much more accurate model that is 78.8% correct.

Figure 4. ADP-derived model results
ADP-derived model results

In summary, by just running the ADP node to fine tune the processing of your data, you were able to build a more accurate model with little direct data manipulation.

Obviously, if you are interested in proving or disproving a certain theory, or want to build specific models, you may find it beneficial to work directly with the model settings; however, for those with a reduced amount of time, or with a large amount of data to prepare, the ADP node may give you an advantage.

Explanations of the mathematical foundations of the modeling methods used in IBM® SPSS® Modeler are listed in the IBM SPSS Modeler Algorithms Guide, available from the \Documentation directory of the installation disk.

Note that the results in this example are based on the training data only. To assess how well models generalize to other data in the real world, you would use a Partition node to hold out a subset of records for purposes of testing and validation.