Overview (MLP command)

Neural networks are a data mining tool for finding unknown patterns in databases. Neural networks can be used to make business decisions by forecasting demand for a product as a function of price and other variables or by categorizing customers based on buying habits and demographic characteristics. The MLP procedure fits a particular kind of neural network called a multilayer perceptron. The multilayer perceptron uses a feedforward architecture and can have multiple hidden layers. It is one of the most commonly used neural network architectures.

Options

Prediction or classification. One or more dependent variables may be specified, and they may be scale, categorical, or a combination. If a dependent variable has a scale measurement level, then the neural network predicts continuous values that approximate the “true” value of some continuous function of the input data. If a dependent variable is categorical, then the neural network is used to classify cases into the “best” category based on the input predictors.

Rescaling. MLP optionally rescales covariates or scale dependent variables before training the neural network. There are three rescaling options: standardization, normalization, and adjusted normalization.

Training, testing, and holdout data. MLP optionally divides the dataset into training, testing, and holdout data. The neural network is trained using the training data. The training data or testing data, or both, can be used to track errors across steps and determine when to stop training. The holdout data is completely excluded from the training process and is used for independent assessment of the final network.

Architecture selection. MLP can perform automatic architecture selection, or it can build a neural network based on user specifications. Automatic architecture selection creates a neural network with one hidden layer and finds the “best” number of hidden units. Alternatively, you can specify one or two hidden layers and define the number of hidden units in each layer.

Activation functions. Units in the hidden layers can use the hyperbolic or sigmoid activation functions. Units in the output layer can use the hyperbolic, sigmoid, identity, or softmax activation functions.

Training methods. The neural network can be built using batch, online, or mini-batch training. Gradient descent and scaled conjugate gradient optimization algorithms are available.

Missing Values. The MLP procedure has an option for treating user-missing values of categorical variables as valid. User-missing values of scale variables are always treated as invalid.

Output. MLP displays pivot table output but offers an option for suppressing most such output. Graphical output includes a network diagram (default) and a number of optional charts: predicted-by-observed values, residual-by-predicted values, ROC (Receiver Operating Characteristic) curves, cumulative gains, lift, and independent variable importance. The procedure also optionally saves predicted values in the active dataset. Synaptic weight estimates can be saved in IBM® SPSS® Statistics or XML files also.

Basic Specification

The basic specification is the MLP command followed by one or more dependent variables, the BY keyword and one or more factors, and the WITH keyword and one or more covariates. By default, the MLP procedure standardizes covariates and selects a training sample before training the neural network. Automatic architecture selection is used to find the “best” neural network architecture. User-missing values are excluded, and default pivot table output is displayed.

Note: Measurement level can affect the results. If any variables (fields) have an unknown measurement level, a data pass is performed to determine the measurement level before the analysis begins. For information on the determination criteria, see SET SCALEMIN.

Syntax Rules

  • All subcommands are optional.
  • Subcommands may be specified in any order.
  • Only a single instance of each subcommand is allowed.
  • An error occurs if a keyword is specified more than once within a subcommand.
  • Parentheses, equals signs, and slashes shown in the syntax chart are required.
  • The command name, subcommand names, and keywords must be spelled in full.
  • Empty subcommands are not allowed.
  • Any split variable defined on the SPLIT FILE command may not be used as a dependent variable, factor, covariate, or partition variable.

Limitations

The WEIGHT setting is ignored with a warning by the MLP procedure.

Categorical Variables

Although the MLP procedure accepts categorical variables as predictors or as the dependent variable, the user should be cautious when using a categorical variable with a very large number of categories.

The MLP procedure temporarily recodes categorical predictors and dependent variables using one-of-c coding for the duration of the procedure. If there are c categories of a variable, then the variable is stored as c vectors, with the first category denoted (1,0,...,0), the next category (0,1,0,...,0), ..., and the final category (0,0,...,0,1).

This coding scheme increases the number of synaptic weights. In particular, the total number of input units is the number of scale predictors plus the number of categories across all categorical predictors. As a result, this coding scheme can lead to slower training, but more “compact” coding methods usually lead to poorly fit neural networks. If your network training is proceeding very slowly, you might try reducing the number of categories in your categorical predictors by combining similar categories or dropping cases that have extremely rare categories before running the MLP procedure.

All one-of-c coding is based on the training data, even if a testing or holdout sample is defined (see PARTITION Subcommand (MLP command)). Thus, if the testing or holdout samples contain cases with predictor categories that are not present in the training data, then those cases are not used by the procedure or in scoring. If the testing or holdout samples contain cases with dependent variable categories that are not present in the training data, then those cases are not used by the procedure but they may be scored.

Replicating results

The MLP procedure uses random number generation during random assignment of partitions, random subsampling for initialization of synaptic weights, random subsampling for automatic architecture selection, and the simulated annealing algorithm used in weight initialization and automatic architecture selection. To reproduce the same randomized results in the future, use the SET command to set the initialization value for the random number generator before each run of the MLP procedure.

MLP results are also dependent on data order. The online and mini-batch training methods are explicitly dependent upon data order; however, even batch training is data-order dependent due to the weight initialization algorithm. For all three training methods, initialization of synaptic weights uses simulated annealing, which takes a random sample from the entire data set and randomly splits it into training (70%) and testing (30%) samples. The size of the random sample is N = min(1000, memsize), where memsize is the user-controlled maximum number of cases stored in memory. (See the /CRITERIA MEMSIZE keyword.) If the entire dataset has less than N cases, then all N cases are used.

To minimize data order effects, randomly order the cases before running the MLP procedure. To verify the stability of a given solution, you may want to obtain several different solutions with cases sorted in different random orders. In situations with extremely large file sizes, multiple runs can be performed with a sample of cases sorted in different random orders.

Finally, MLP results may be influenced by the variable order on the command line due to the different pattern of initial values assigned when the command line variable order is changed. As with data order effects, you might try different command line variable orders to assess the stability of a given solution.

In summary, if you want to exactly replicate MLP results in the future, use the same initialization value for the random number generator, the same data order, and the same command line variable order, in addition to using the same MLP procedure settings.