ARCHITECTURE Subcommand (MLP command)

The ARCHITECTURE subcommand is used to specify the neural network architecture. By default, automatic architecture selection is used to build the network. However, you have the option of overriding automatic architecture selection and building a more specific structure.

AUTOMATIC Keyword

The AUTOMATIC keyword indicates whether to use automatic architecture selection to build the neural network. Automatic architecture selection builds a network with one hidden layer. Using a prespecified range defining the minimum and maximum number of hidden units, automatic architecture selection computes the “best” number of units in the hidden layer. Automatic architecture selection uses the default activation functions for the hidden and output layers.

If automatic architecture selection is used, then a random sample from the total dataset (excluding any data records included in the holdout sample as defined on the PARTITION subcommand) is taken and split into training (70%) and testing (30%) samples. This random sample is used to find the architecture and fit the network. Then, the network is retrained using the entire dataset (taking into account the training, testing, and holdout samples defined on the PARTITION subcommand), with the synaptic weights obtained from the random sample used as the initial weights.

The size of the random sample N = min(1000, memsize), where memsize is the user-specified maximum number of cases to store in memory (see the MEMSIZE keyword in CRITERIA Subcommand (MLP command)). If the total dataset (excluding holdout cases) has less than N cases, then all cases (excluding holdout cases) are used. If you want to be able to reproduce results based on the AUTOMATIC keyword later, use the SET command to set the initialization value for the random number generator before running the MLP procedure.

YES. Use automatic architecture selection to build the network. This is the default.

  • The YES keyword may be followed by parentheses containing the MINUNITS and MAXUNITS options, which specify the minimum and maximum number of units, respectively, that automatic architecture selection will consider in determining the “best” number of units. It is invalid to specify only one option; you must specify both or neither. The options may be specified in any order and must be separated by a comma or space character. Both numbers must be integers greater than 0, with MINUNITS less than MAXUNITS. The defaults are MINUNITS=1, MAXUNITS=50.
  • If AUTOMATIC=YES is specified, then all other ARCHITECTURE subcommand keywords are invalid.

NO. Do not use automatic architecture selection to build the network. All other ARCHITECTURE subcommand keywords are valid only if AUTOMATIC=NO is specified.

HIDDENLAYERS Keyword

The HIDDENLAYERS keyword specifies the number of hidden layers in the neural network.

This keyword is honored only if automatic architecture selection is not used—that is, if AUTOMATIC=NO. If automatic architecture selection is in effect, then the HIDDENLAYERS keyword is ignored.

1. One hidden layer. This is the default. The HIDDENLAYERS=1 specification may be followed by the NUMUNITS option, which gives the number of units in the first hidden layer, excluding the bias unit. Specify AUTO to automatically compute the number of units based on the number of input and output units. Alternatively, specify an integer greater than or equal to 1 to request a particular number of hidden units. The default is AUTO.

2. Two hidden layers. The HIDDENLAYERS=2 specification may be followed by the NUMUNITS option, which gives the number of units in the first and second hidden layers, excluding the bias unit in each layer. Specify AUTO to automatically compute the numbers of units based on the number of input and output units. Alternatively, specify two integers greater than or equal to 1 to request particular numbers of hidden units in the first and second hidden layers, respectively. The default is AUTO.

HIDDENFUNCTION Keyword

The HIDDENFUNCTION keyword specifies the activation function to use for all units in the hidden layers. This keyword is honored only if automatic architecture selection is not used—that is, if AUTOMATIC=NO. If automatic architecture selection is in effect, then the HIDDENFUNCTION keyword is ignored.

TANH. Hyperbolic tangent. This function has form: γ(c) = tanh(c) = (e ce −c)/(e c+e −c). It takes real-valued arguments and transforms them to the range (–1, 1). This is the default activation function for all units in the hidden layers.

SIGMOID. Sigmoid. This function has form: γ(c) = 1/(1+e −c). It takes real-valued arguments and transforms them to the range (0, 1).

OUTPUTFUNCTION Keyword

The OUTPUTFUNCTION keyword specifies the activation function to use for all units in the output layer.

The activation function used in the output layer has a special relationship with the error function, which is the measure that the neural network is trying to minimize. In particular, the error function is automatically assigned based on the activation function for the output layer. Sum-of-squares error, the sum of the squared deviations between the observed dependent variable values and the model-predicted values, is used when the identity, sigmoid, or hyperbolic tangent activation function is applied to the output layer. Cross-entropy error is used when the softmax activation function is applied to the output layer.

The OUTPUTFUNCTION keyword is honored only if automatic architecture selection is not used—that is, if AUTOMATIC=NO. If automatic architecture selection is in effect, then OUTPUTFUNCTION is ignored.

IDENTITY. Identity. This function has form: γ(c) = c. It takes real-valued arguments and returns them unchanged. This is the default activation function for units in the output layer if there are any scale dependent variables.

SIGMOID. Sigmoid. This function has form: γ(c) = 1/(1+e −c). It takes real-valued arguments and transforms them to the range (0, 1).

TANH. Hyperbolic tangent. This function has form: γ(c) = tanh(c) = (e ce −c)/(e c+e −c). It takes real-valued arguments and transforms them to the range (–1, 1).

SOFTMAX. Softmax. This function has form: γ(c k) = exp(c k)/Σjexp(c j). It takes a vector of real-valued arguments and transforms it to a vector whose elements fall in the range (0, 1) and sum to 1. Softmax is available only if all dependent variables are categorical; if SOFTMAX is specified and there are any scale dependent variables, then an error is issued. This is the default activation function for units in the output layer if all dependent variables are categorical.