CRITERIA Subcommand (MLP command)
The CRITERIA
subcommand specifies
computational and resource settings for the MLP
procedure.
TRAINING Keyword
The TRAINING
keyword specifies
the training type, which determines how the neural network processes
training data records.
The online and mini-batch
training methods are explicitly dependent upon data order; however,
even batch training is dependent upon data order because initialization
of synaptic weights involves subsampling from the dataset. To minimize
data order effects, randomly order the cases before running the MLP
procedure.
BATCH. Batch training. Updates the synaptic weights only after passing all training data records—that is, batch training uses information from all records in the training dataset. Batch training is often preferred because it directly minimizes the total prediction error. However, batch training may need to update the weights many times until one of the stopping rules is met and, hence, may need many data passes. It is most useful for smaller datasets. This is the default training type.
ONLINE. Online training. Updates the synaptic weights after every single training data record—that is, online training uses information from one record at a time. Online training continuously gets a record and updates the weights until one of the stopping rules is met. If all the records are used once and none of the stopping rules is met, then the process continues by recycling the data records. Online training is superior to batch only for larger datasets with associated predictors. If there are many records and many inputs, and their values are not independent of each other, then online training can more quickly obtain a reasonable answer than batch training.
MINIBATCH. Mini-batch
training. Divides the training data records into K groups of approximately equal size, then
updates the synaptic weights after passing one group—that is,
mini-batch training uses information from a group of records. The
process then recycles the data group if necessary. The number of
training records per mini-batch is determined by the MINIBATCHSIZE
keyword. Mini-batch training
offers a compromise between batch and online training, and it may
be best for "medium-size" datasets.
MINIBATCHSIZE Keyword
The MINIBATCHSIZE
keyword
specifies the number of training records per mini-batch.
- Specify
AUTO
to automatically compute the number of records per mini-batch as R = min(max(M/10,2),memsize), where M is the number of training records and memsize is the maximum number of cases to store in memory (see theMEMSIZE
keyword below). If the remainder of M/R is r, then when the end of the data is reached, the process places the final r records in the same mini-batch with the first R−r records of the next data pass. This “wrapping” of mini-batches will place different cases in the mini-batches with each data pass unless R divides M with no remainder. - Alternatively, specify an integer greater than or equal to 2 and
less than or equal to memsize to request a particular number of records. If the number of training
records turns out to be less than the specified
MINIBATCHSIZE
, the number of training records is used instead. - The default is
AUTO
. - This keyword is ignored if
TRAINING = MINIBATCH
is not in effect.
MEMSIZE Keyword
The MEMSIZE
keyword specifies
the maximum number of cases to store in memory when synaptic weight
initialization, automatic architecture selection, and/or mini-batch
training is in effect.
- Specify an integer greater than or equal to 2. The default is 1000.
OPTIMIZATION Keyword
The OPTIMIZATION
keyword
specifies the optimization algorithm used to determine the synaptic
weights.
GRADIENTDESCENT. Gradient descent. Gradient descent is the required optimization algorithm for online
and mini-batch training. It is optional for batch training. When
gradient descent is used with online and mini-batch training, the
algorithm’s user-specified parameters are the initial learning
rate, lower bound for the learning rate, momentum, and number of data
passes (see the LEARNINGINITIAL
, LEARNINGLOWER
, MOMENTUM
, and LEARNINGEPOCHS
keywords, respectively). With batch training, the user-specified
parameters are the initial learning rate and the momentum.
SCALEDCONJUGATE. Scaled conjugate
gradient. Scaled conjugate gradient is the default for
batch training. The assumptions that justify the use of conjugate
gradient methods do not apply to the online and mini-batch training,
so this method may not be used if TRAINING
= ONLINE
or MINIBATCH
. The user-specified parameters are the initial lambda and sigma
(see the LAMBDAINITIAL
and SIGMAINITIAL
keywords).
LEARNINGINITIAL Keyword
The LEARNINGINITIAL
keyword
specifies the initial learning rate η0 for the gradient descent optimization algorithm.
- Specify a number greater than 0. The default is 0.4.
- This keyword is ignored if
OPTIMIZATION = GRADIENTDESCENT
is not in effect.
LEARNINGLOWER Keyword
The LEARNINGLOWER
keyword
specifies the lower boundary for the learning rate ηlow when gradient descent is used with
online or mini-batch training.
- Specify a number
greater than 0 and less than the initial learning rate (see the
LEARNINGINITIAL
keyword). The default is 0.001. - This keyword is ignored if
TRAINING = ONLINE
orMINIBATCH
andOPTIMIZATION = GRADIENTDESCENT
are not in effect.
MOMENTUM Keyword
The MOMENTUM
keyword specifies
the initial momentum rate α for the gradient descent optimization
algorithm.
- Specify a number greater than 0. The default is 0.9.
- This keyword is ignored if
OPTIMIZATION = GRADIENTDESCENT
is not in effect.
LEARNINGEPOCHS Keyword
The LEARNINGEPOCHS
keyword
specifies the number of epochs (data passes of the training set) p to reduce the learning rate when gradient
descent is used with online or mini-batch training. You can control
the learning rate decay factor β by specifying the number of
epochs it takes for the learning rate to decrease from η0 to ηlow. This corresponds to β = (1/p
K)*ln(η0/ηlow), where K is
the total number of mini-batches in the training dataset. For online
training, K = M, where M is the number of training records.
- Specify an integer greater than 0. The default is 10.
- This keyword is ignored if
TRAINING = ONLINE
orMINIBATCH
andOPTIMIZATION = GRADIENTDESCENT
are not in effect.
LAMBDAINITIAL Keyword
The LAMBDAINITIAL
keyword
specifies the initial lambda, λ0, for the scaled conjugate gradient optimization algorithm.
- Specify a number greater than 0 and less than 10-6. The default is 0.0000005.
- This keyword is ignored if
OPTIMIZATION = SCALEDCONJUGATE
is not in effect.
SIGMAINITIAL Keyword
The SIGMAINITIAL
keyword
specifies the initial sigma, σ0, for the scaled conjugate gradient optimization algorithm.
- Specify a number greater than 0 and less than 10-4. The default is 0.00005.
- This keyword is ignored if
OPTIMIZATION = SCALEDCONJUGATE
is not in effect.
INTERVALCENTER and INTERVALOFFSET Keywords
The INTERVALCENTER
and INTERVALOFFSET
keywords
specify the interval [a
0−a, a
0+a] in which weight vectors
are randomly generated when simulated annealing is used. INTERVALCENTER
corresponds to a
0 and INTERVALOFFSET
corresponds
to a.
- Simulated annealing is used to break out of a local minimum, with the goal of finding the global minimum, during the optimization algorithm. This approach is used in weight initialization and automatic architecture selection.
- Specify a number for
INTERVALCENTER
. TheINTERVALCENTER
default is 0. Specify a number greater than 0 forINTERVALOFFSET
. TheINTERVALOFFSET
default is 0.5. The default interval is [−0.5, 0.5].