The Expert tab of the Auto Numeric node enables you to select the
algorithms and options to use and to specify stopping rules.
Select models. By default, all models are selected to be built; however, if you have
Analytic Server, you can
choose to restrict the models to those that can run on
Analytic Server and
preset them so that they either build split models or are ready to process very large data
sets.
Note: Local building of Analytic Server models
within the Auto Numeric node is not supported in SPSS® Modeler 18.0 or previous
versions.
Models
used. Use the check boxes in the column on the left to
select the model types (algorithms) to include in the comparison.
The more types you select, the more models will be created and the
longer the processing time will be.
Model type. Lists the available algorithms
(see below).
Model parameters. For each model type, you can use the
default settings or select Specify to choose options for each model type. The specific options are
similar to those available in the separate modeling nodes, with the
difference that multiple options or combinations can be selected.
For example, if comparing Neural Net models, rather than choosing
one of the six training methods, you can choose all of them to train
six models in a single pass.
Number of models. Lists the number of
models produced for each algorithm based on current settings. When
combining options, the number of models can quickly add up, so paying
close attention to this number is strongly recommended, particularly
when using large datasets.
Restrict maximum
time spent building a single model. (K-Means, Kohonen,
TwoStep, SVM, KNN, Bayes Net and Decision List models only) Sets a
maximum time limit for any one model. For example, if a particular
model requires an unexpectedly long time to train because of some
complex interaction, you probably don't want it to hold
up your entire modeling run.
Supported Algorithms
|
|
The Neural Net node uses a simplified model of the way the
human brain processes information. It works by simulating a large
number of interconnected simple processing units that resemble abstract
versions of neurons. Neural networks are powerful general function
estimators and require minimal statistical or mathematical knowledge
to train or apply. |
|
|
The Classification and Regression (C&R) Tree node generates
a decision tree that allows you to predict or classify future observations.
The method uses recursive partitioning to split the training records
into segments by minimizing the impurity at each step, where a node
in the tree is considered “pure” if 100% of cases in the node fall
into a specific category of the target field. Target and input fields
can be numeric ranges or categorical (nominal, ordinal, or flags);
all splits are binary (only two subgroups). |
|
|
The CHAID node generates decision trees using chi-square statistics
to identify optimal splits. Unlike the C&R Tree and QUEST nodes,
CHAID can generate nonbinary trees, meaning that some splits have
more than two branches. Target and input fields can be numeric range
(continuous) or categorical. Exhaustive CHAID is a modification of
CHAID that does a more thorough job of examining all possible splits
but takes longer to compute. |
|
|
Linear regression is a common statistical technique for summarizing
data and making predictions by fitting a straight line or surface
that minimizes the discrepancies between predicted and actual output
values. |
|
|
The Generalized Linear model expands the general linear model
so that the dependent variable is linearly related to the factors
and covariates through a specified link function. Moreover, the model
allows for the dependent variable to have a non-normal distribution.
It covers the functionality of a wide number of statistical models,
including linear regression, logistic regression, loglinear models
for count data, and interval-censored survival models. |
|
|
The k-Nearest Neighbor (KNN) node associates a new case
with the category or value of the k objects nearest to it in
the predictor space, where k is an integer. Similar cases are
near each other and dissimilar cases are distant from each other. |
|
|
The Support Vector Machine (SVM) node enables you to classify
data into one of two groups without overfitting. SVM works well with
wide data sets, such as those with a very large number of input fields. |
|
|
Linear regression models predict a continuous target based on linear relationships
between the target and one or more predictors. |
|
|
The Linear Support Vector Machine (LSVM) node enables you to classify data into one of two
groups without overfitting. LSVM is linear and works well with wide data sets, such as those with a
very large number of records. |
|
|
The Random Trees node is similar to the existing C&RT node; however, the Random Trees
node is designed to process big data to create a single tree and displays the resulting model in the
output viewer that was added in SPSS Modeler version 17. The Random Trees
tree node generates a decision tree that you use to predict or classify future observations. The
method uses recursive partitioning to split the training records into segments by minimizing the
impurity at each step, where a node in the tree is considered pure if 100% of cases in
the node fall into a specific category of the target field. Target and input fields can be numeric
ranges or categorical (nominal, ordinal, or flags); all splits are binary (only two
subgroups). |
|
|
The Tree-AS node is similar to the existing CHAID node; however, the Tree-AS node is designed
to process big data to create a single tree and displays the resulting model in the output viewer
that was added in SPSS Modeler version
17. The node generates a decision tree by using chi-square statistics (CHAID) to identify optimal
splits. This use of CHAID can generate nonbinary trees, meaning that some splits have more than two
branches. Target and input fields can be numeric range (continuous) or categorical. Exhaustive CHAID
is a modification of CHAID that does a more thorough job of examining all possible splits but takes
longer to compute. |