Architecture (Multilayer Perceptron)
The Architecture tab is used to specify the structure of the network. The procedure can select the "best" architecture automatically, or you can specify a custom architecture.
Automatic architecture selection builds a network with one hidden layer. Specify the minimum and maximum number of units allowed in the hidden layer, and the automatic architecture selection computes the "best" number of units in the hidden layer. Automatic architecture selection uses the default activation functions for the hidden and output layers.
Custom architecture selection gives you expert control over the hidden and output layers and can be most useful when you know in advance what architecture you want or when you need to tweak the results of the Automatic architecture selection.
Hidden Layers
The hidden layer contains unobservable network nodes (units). Each hidden unit is a function of the weighted sum of the inputs. The function is the activation function, and the values of the weights are determined by the estimation algorithm. If the network contains a second hidden layer, each hidden unit in the second layer is a function of the weighted sum of the units in the first hidden layer. The same activation function is used in both layers.
Number of Hidden Layers. A multilayer perceptron can have one or two hidden layers.
Activation Function. The activation function "links" the weighted sums of units in a layer to the values of units in the succeeding layer.
- Hyperbolic tangent. This function has the form: γ(c) = tanh(c) = (e ^{c}−e ^{−c})/(e ^{c}+e ^{−c}). It takes real-valued arguments and transforms them to the range (–1, 1). When automatic architecture selection is used, this is the activation function for all units in the hidden layers.
- Sigmoid. This function has the form: γ(c) = 1/(1+e ^{−c}). It takes real-valued arguments and transforms them to the range (0, 1).
Number of Units. The number of units in each hidden layer can be specified explicitly or determined automatically by the estimation algorithm.
Output Layer
The output layer contains the target (dependent) variables.
Activation Function. The activation function "links" the weighted sums of units in a layer to the values of units in the succeeding layer.
- Identity. This function has the form: γ(c) = c. It takes real-valued arguments and returns them unchanged. When automatic architecture selection is used, this is the activation function for units in the output layer if there are any scale-dependent variables.
- Softmax. This function has the form: γ(c _{k}) = exp(c _{k})/Σ_{j}exp(c _{j}). It takes a vector of real-valued arguments and transforms it to a vector whose elements fall in the range (0, 1) and sum to 1. Softmax is available only if all dependent variables are categorical. When automatic architecture selection is used, this is the activation function for units in the output layer if all dependent variables are categorical.
- Hyperbolic tangent. This function has the form: γ(c) = tanh(c) = (e ^{c}−e ^{−c})/(e ^{c}+e ^{−c}). It takes real-valued arguments and transforms them to the range (–1, 1).
- Sigmoid. This function has the form: γ(c) = 1/(1+e ^{−c}). It takes real-valued arguments and transforms them to the range (0, 1).
Rescaling of Scale Dependent Variables. These controls are available only if at least one scale-dependent variable has been selected.
- Standardized. Subtract the mean and divide by the standard deviation, (x−mean)/s.
- Normalized. Subtract the minimum and divide by the range, (x−min)/(max−min). Normalized values fall between 0 and 1. This is the required rescaling method for scale-dependent variables if the output layer uses the sigmoid activation function. The correction option specifies a small number ε that is applied as a correction to the rescaling formula; this correction ensures that all rescaled dependent variable values will be within the range of the activation function. In particular, the values 0 and 1, which occur in the uncorrected formula when x takes its minimum and maximum value, define the limits of the range of the sigmoid function but are not within that range. The corrected formula is [x−(min−ε)]/[(max+ε)−(min−ε)]. Specify a number greater than or equal to 0.
- Adjusted Normalized. Adjusted version of subtracting the minimum and dividing by the range, [2*(x−min)/(max−min)]−1. Adjusted normalized values fall between −1 and 1. This is the required rescaling method for scale-dependent variables if the output layer uses the hyperbolic tangent activation function. The correction option specifies a small number ε that is applied as a correction to the rescaling formula; this correction ensures that all rescaled dependent variable values will be within the range of the activation function. In particular, the values −1 and 1, which occur in the uncorrected formula when x takes its minimum and maximum value, define the limits of the range of the hyperbolic tangent function but are not within that range. The corrected formula is {2*[(x−(min−ε))/((max+ε)−(min−ε))]}−1. Specify a number greater than or equal to 0.
- None. No rescaling of scale-dependent variables.
How To Specify Architecture for a Multilayer Perceptron
This feature requires the Neural Networks option.
- From the menus choose:
- In the Multilayer Perceptron dialog box, click the Architecture tab.