Overview (MLP command)
Neural networks are a data mining tool for finding
unknown patterns in databases. Neural networks can be used to make
business decisions by forecasting demand for a product as a function
of price and other variables or by categorizing customers based on
buying habits and demographic characteristics. The MLP
procedure fits a particular kind of
neural network called a multilayer perceptron. The multilayer perceptron
uses a feedforward architecture and can have multiple hidden layers.
It is one of the most commonly used neural network architectures.
Options
Prediction or classification. One or more dependent variables may be specified, and they may be scale, categorical, or a combination. If a dependent variable has a scale measurement level, then the neural network predicts continuous values that approximate the “true” value of some continuous function of the input data. If a dependent variable is categorical, then the neural network is used to classify cases into the “best” category based on the input predictors.
Rescaling. MLP
optionally rescales covariates or scale
dependent variables before training the neural network. There are
three rescaling options: standardization, normalization, and adjusted
normalization.
Training, testing, and holdout data.
MLP
optionally divides the dataset into
training, testing, and holdout data. The neural network is trained
using the training data. The training data or testing data, or both,
can be used to track errors across steps and determine when to stop
training. The holdout data is completely excluded from the training
process and is used for independent assessment of the final network.
Architecture selection.
MLP
can perform
automatic architecture selection, or it can build a neural network
based on user specifications. Automatic architecture selection creates
a neural network with one hidden layer and finds the “best”
number of hidden units. Alternatively, you can specify one or two
hidden layers and define the number of hidden units in each layer.
Activation functions. Units in the hidden layers can use the hyperbolic or sigmoid activation functions. Units in the output layer can use the hyperbolic, sigmoid, identity, or softmax activation functions.
Training methods. The neural network can be built using batch, online, or mini-batch training. Gradient descent and scaled conjugate gradient optimization algorithms are available.
Missing Values. The MLP
procedure has an option for treating
user-missing values of categorical variables as valid. User-missing
values of scale variables are always treated as invalid.
Output.
MLP
displays pivot table output
but offers an option for suppressing most such output. Graphical
output includes a network diagram (default) and a number of optional
charts: predicted-by-observed values, residual-by-predicted values,
ROC (Receiver Operating Characteristic) curves, cumulative gains,
lift, and independent variable importance. The procedure also optionally
saves predicted values in the active dataset. Synaptic weight estimates
can be saved in IBM® SPSS® Statistics or XML
files also.
Basic Specification
The basic specification is the MLP
command followed by one or more dependent variables, the BY
keyword and one or more factors, and
the WITH
keyword and one or more
covariates. By default, the MLP
procedure standardizes covariates and selects a training sample before
training the neural network. Automatic architecture selection is
used to find the “best” neural network architecture.
User-missing values are excluded, and default pivot table output
is displayed.
Syntax Rules
- All subcommands are optional.
- Subcommands may be specified in any order.
- Only a single instance of each subcommand is allowed.
- An error occurs if a keyword is specified more than once within a subcommand.
- Parentheses, equals signs, and slashes shown in the syntax chart are required.
- The command name, subcommand names, and keywords must be spelled in full.
- Empty subcommands are not allowed.
- Any split variable defined on the
SPLIT FILE
command may not be used as a dependent variable, factor, covariate, or partition variable.
Limitations
The WEIGHT
setting is ignored
with a warning by the MLP
procedure.
Categorical Variables
Although the MLP
procedure accepts categorical variables as predictors or as the
dependent variable, the user should be cautious when using a categorical
variable with a very large number of categories.
The MLP
procedure
temporarily recodes categorical predictors and dependent variables
using one-of-c coding for the duration
of the procedure. If there are c categories of a variable, then the variable is stored as c vectors, with the first category denoted
(1,0,...,0), the next category (0,1,0,...,0), ..., and the final category
(0,0,...,0,1).
This coding scheme increases
the number of synaptic weights. In particular, the total number of
input units is the number of scale predictors plus the number of categories
across all categorical predictors. As a result, this coding scheme
can lead to slower training, but more “compact” coding
methods usually lead to poorly fit neural networks. If your network
training is proceeding very slowly, you might try reducing the number
of categories in your categorical predictors by combining similar
categories or dropping cases that have extremely rare categories before
running the MLP
procedure.
All one-of-c coding is based on the training data, even if a testing or holdout sample is defined (see PARTITION Subcommand (MLP command)). Thus, if the testing or holdout samples contain cases with predictor categories that are not present in the training data, then those cases are not used by the procedure or in scoring. If the testing or holdout samples contain cases with dependent variable categories that are not present in the training data, then those cases are not used by the procedure but they may be scored.
Replicating results
The MLP
procedure uses random number generation during random assignment
of partitions, random subsampling for initialization of synaptic weights,
random subsampling for automatic architecture selection, and the simulated
annealing algorithm used in weight initialization and automatic architecture
selection. To reproduce the same randomized results in the future,
use the SET
command to set the
initialization value for the random number generator before each run
of the MLP
procedure.
MLP
results
are also dependent on data order. The online and mini-batch training
methods are explicitly dependent upon data order; however, even batch
training is data-order dependent due to the weight initialization
algorithm. For all three training methods, initialization of synaptic
weights uses simulated annealing, which takes a random sample from
the entire data set and randomly splits it into training (70%) and
testing (30%) samples. The size of the random sample is N = min(1000, memsize), where memsize is the user-controlled
maximum number of cases stored in memory. (See the /CRITERIA MEMSIZE
keyword.) If the entire
dataset has less than N cases,
then all N cases are used.
To minimize data order effects, randomly order
the cases before running the MLP
procedure. To verify the stability of a given solution, you may
want to obtain several different solutions with cases sorted in different
random orders. In situations with extremely large file sizes, multiple
runs can be performed with a sample of cases sorted in different random
orders.
Finally, MLP
results may be influenced by the variable order on
the command line due to the different pattern of initial values assigned
when the command line variable order is changed. As with data order
effects, you might try different command line variable orders to assess
the stability of a given solution.
In summary,
if you want to exactly replicate MLP
results in the future, use the same initialization value for the
random number generator, the same data order, and the same command
line variable order, in addition to using the same MLP
procedure settings.