ARCHITECTURE Subcommand (MLP command)
The ARCHITECTURE
subcommand
is used to specify the neural network architecture. By default, automatic
architecture selection is used to build the network. However, you
have the option of overriding automatic architecture selection and
building a more specific structure.
AUTOMATIC Keyword
The AUTOMATIC
keyword indicates
whether to use automatic architecture selection to build the neural
network. Automatic architecture selection builds a network with one
hidden layer. Using a prespecified range defining the minimum and
maximum number of hidden units, automatic architecture selection computes
the “best” number of units in the hidden layer. Automatic
architecture selection uses the default activation functions for the
hidden and output layers.
If automatic
architecture selection is used, then a random sample from the total
dataset (excluding any data records included in the holdout sample
as defined on the PARTITION
subcommand)
is taken and split into training (70%) and testing (30%) samples.
This random sample is used to find the architecture and fit the network.
Then, the network is retrained using the entire dataset (taking into
account the training, testing, and holdout samples defined on the PARTITION
subcommand), with the synaptic
weights obtained from the random sample used as the initial weights.
The size of the random sample N = min(1000,
memsize), where memsize is the user-specified maximum number of cases
to store in memory (see the MEMSIZE
keyword in CRITERIA Subcommand (MLP command)). If the total
dataset (excluding holdout cases) has less than N cases, then all
cases (excluding holdout cases) are used. If you want to be able
to reproduce results based on the AUTOMATIC
keyword later, use the SET
command
to set the initialization value for the random number generator before
running the MLP
procedure.
YES. Use automatic architecture selection to build the network. This is the default.
- The
YES
keyword may be followed by parentheses containing theMINUNITS
andMAXUNITS
options, which specify the minimum and maximum number of units, respectively, that automatic architecture selection will consider in determining the “best” number of units. It is invalid to specify only one option; you must specify both or neither. The options may be specified in any order and must be separated by a comma or space character. Both numbers must be integers greater than 0, withMINUNITS
less thanMAXUNITS
. The defaults areMINUNITS=1
,MAXUNITS=50
. - If
AUTOMATIC=YES
is specified, then all otherARCHITECTURE
subcommand keywords are invalid.
NO. Do not use automatic architecture selection to build
the network. All other ARCHITECTURE
subcommand keywords are valid only if AUTOMATIC=NO
is specified.
HIDDENLAYERS Keyword
The HIDDENLAYERS
keyword
specifies the number of hidden layers in the neural network.
This keyword is honored only if automatic architecture selection
is not used—that is, if AUTOMATIC=NO
. If automatic architecture selection is in effect, then the HIDDENLAYERS
keyword is ignored.
1. One hidden
layer. This is the default. The HIDDENLAYERS=1
specification may be followed by the NUMUNITS
option, which gives the number
of units in the first hidden layer, excluding the bias unit. Specify AUTO
to automatically compute the number
of units based on the number of input and output units. Alternatively,
specify an integer greater than or equal to 1 to request a particular
number of hidden units. The default is AUTO
.
2. Two hidden
layers. The HIDDENLAYERS=2
specification may be followed by the NUMUNITS
option, which gives the number of units in the first and second
hidden layers, excluding the bias unit in each layer. Specify AUTO
to automatically compute the numbers
of units based on the number of input and output units. Alternatively,
specify two integers greater than or equal to 1 to request particular
numbers of hidden units in the first and second hidden layers, respectively.
The default is AUTO
.
HIDDENFUNCTION Keyword
The HIDDENFUNCTION
keyword
specifies the activation function to use for all units in the hidden
layers. This keyword is honored only if automatic architecture selection
is not used—that is, if AUTOMATIC=NO
. If automatic architecture selection is in effect, then the HIDDENFUNCTION
keyword is ignored.
TANH. Hyperbolic tangent. This function has form: γ(c) = tanh(c) = (e c−e −c)/(e c+e −c). It takes real-valued arguments and transforms them to the range (–1, 1). This is the default activation function for all units in the hidden layers.
SIGMOID. Sigmoid. This function has form: γ(c) = 1/(1+e −c). It takes real-valued arguments and transforms them to the range (0, 1).
OUTPUTFUNCTION Keyword
The OUTPUTFUNCTION
keyword
specifies the activation function to use for all units in the output
layer.
The activation function used in the output layer has a special relationship with the error function, which is the measure that the neural network is trying to minimize. In particular, the error function is automatically assigned based on the activation function for the output layer. Sum-of-squares error, the sum of the squared deviations between the observed dependent variable values and the model-predicted values, is used when the identity, sigmoid, or hyperbolic tangent activation function is applied to the output layer. Cross-entropy error is used when the softmax activation function is applied to the output layer.
The OUTPUTFUNCTION
keyword is honored only if automatic architecture
selection is not used—that is, if AUTOMATIC=NO
. If automatic architecture selection is
in effect, then OUTPUTFUNCTION
is ignored.
IDENTITY. Identity. This function has form: γ(c) = c. It takes real-valued arguments and returns them unchanged. This is the default activation function for units in the output layer if there are any scale dependent variables.
SIGMOID. Sigmoid. This function has form: γ(c) = 1/(1+e −c). It takes real-valued arguments and transforms them to the range (0, 1).
TANH. Hyperbolic tangent. This function has form: γ(c) = tanh(c) = (e c−e −c)/(e c+e −c). It takes real-valued arguments and transforms them to the range (–1, 1).
SOFTMAX. Softmax. This function has form: γ(c
k) = exp(c
k)/Σjexp(c
j). It takes a vector of real-valued arguments and transforms it to
a vector whose elements fall in the range (0, 1) and sum to 1. Softmax
is available only if all dependent variables are categorical; if SOFTMAX
is specified and there are any scale
dependent variables, then an error is issued. This is the default
activation function for units in the output layer if all dependent
variables are categorical.