The Expert tab of the Auto Numeric node enables you to select the algorithms
and options to use and to specify stopping rules.
Select models. By default, all
models are selected to be built; however, if you have
Analytic Server, you can
choose to restrict the models to those that can run on
Analytic Server and
preset them so that they either build split models or are ready to process very large data
sets.
Note: Local building of Analytic Server models
within the Auto Numeric node is not
supported.
Models used. Use the check boxes in the column on the left to
select the model types (algorithms) to include in the comparison. The more types you select, the
more models will be created and the longer the processing time will be.
Model type. Lists the available algorithms (see below).
Model parameters. For each model type, you can use the default
settings or select Specify to choose options for each model type. The
specific options are similar to those available in the separate modeling nodes, with the difference
that multiple options or combinations can be selected. For example, if comparing Neural Net models,
rather than choosing one of the six training methods, you can choose all of them to train six models
in a single pass.
Number of models. Lists the number of models produced for each
algorithm based on current settings. When combining options, the number of models can quickly add
up, so paying close attention to this number is strongly recommended, particularly when using large
datasets.
Restrict maximum time spent building a single model. (K-Means,
Kohonen, TwoStep, SVM, KNN, Bayes Net and Decision List models only) Sets a maximum time limit for
any one model. For example, if a particular model requires an unexpectedly long time to train
because of some complex interaction, you probably don't want it to hold up your entire modeling
run.
Supported algorithms
|
The Neural Net node uses a simplified model of the way the human brain processes information.
It works by simulating a large number of interconnected simple processing units that resemble
abstract versions of neurons. Neural networks are powerful general function estimators and require
minimal statistical or mathematical knowledge to train or apply.
|
|
The Classification and Regression (C&R) Tree node generates a decision tree that allows
you to predict or classify future observations. The method uses recursive partitioning to split the
training records into segments by minimizing the impurity at each step, where a node in the tree is
considered “pure” if 100% of cases in the node fall into a specific category of the target field.
Target and input fields can be numeric ranges or categorical (nominal, ordinal, or flags); all
splits are binary (only two subgroups).
|
|
The CHAID node generates decision trees using chi-square statistics to identify optimal
splits. Unlike the C&R Tree and QUEST nodes, CHAID can generate nonbinary trees, meaning that
some splits have more than two branches. Target and input fields can be numeric range (continuous)
or categorical. Exhaustive CHAID is a modification of CHAID that does a more thorough job of
examining all possible splits but takes longer to compute.
|
|
Linear regression is a common statistical technique for summarizing data and making
predictions by fitting a straight line or surface that minimizes the discrepancies between predicted
and actual output values.
|
|
The Generalized Linear model expands the general linear model so that the dependent variable
is linearly related to the factors and covariates through a specified link function. Moreover, the
model allows for the dependent variable to have a non-normal distribution. It covers the
functionality of a wide number of statistical models, including linear regression, logistic
regression, loglinear models for count data, and interval-censored survival models.
|
|
The k-Nearest Neighbor (KNN) node associates a new case with the category or value of
the k objects nearest to it in the predictor space, where k is an integer. Similar
cases are near each other and dissimilar cases are distant from each other.
|
|
The Support Vector Machine (SVM) node enables you to classify data into one of two groups
without overfitting. SVM works well with wide data sets, such as those with a very large number of
input fields.
|
|
Linear regression models predict a continuous target based on linear relationships
between the target and one or more predictors.
|
|
The Linear Support Vector Machine (LSVM) node enables you to classify data into one of two
groups without overfitting. LSVM is linear and works well with wide data sets, such as those with a
very large number of records. |
|
The Random Trees node is similar to the existing C&RT node; however, the Random Trees
node is designed to process big data to create a single tree and displays the resulting model in the
output viewer that was added in SPSS® Modeler version 17. The Random Trees
tree node generates a decision tree that you use to predict or classify future observations. The
method uses recursive partitioning to split the training records into segments by minimizing the
impurity at each step, where a node in the tree is considered pure if 100% of cases in
the node fall into a specific category of the target field. Target and input fields can be numeric
ranges or categorical (nominal, ordinal, or flags); all splits are binary (only two
subgroups). |
|
The Tree-AS node is similar to the existing CHAID node; however, the Tree-AS node is designed
to process big data to create a single tree and displays the resulting model in the output viewer
that was added in SPSS Modeler version
17. The node generates a decision tree by using chi-square statistics (CHAID) to identify optimal
splits. This use of CHAID can generate nonbinary trees, meaning that some splits have more than two
branches. Target and input fields can be numeric range (continuous) or categorical. Exhaustive CHAID
is a modification of CHAID that does a more thorough job of examining all possible splits but takes
longer to compute. |
|
XGBoost Linear© is an advanced implementation of a gradient boosting
algorithm with a linear model as the base model. Boosting algorithms iteratively learn weak
classifiers and then add them to a final strong classifier. The XGBoost Linear node in SPSS Modeler is implemented in
Python. |
|
A GLE extends the linear model so that the target can have a non-normal distribution, is
linearly related to the factors and covariates via a specified link function, and so that the
observations can be correlated. Generalized linear mixed models cover a wide variety of models, from
simple linear regression to complex multilevel models for non-normal longitudinal data. |
|
XGBoost© is an advanced implementation of a gradient boosting algorithm.
Boosting algorithms iteratively learn weak classifiers and then add them to a final strong
classifier. XGBoost is very flexible and provides many parameters that can be overwhelming to most
users, so the XGBoost-AS node in SPSS Modeler exposes the core features and
commonly used parameters. The XGBoost-AS node is implemented in Spark. |