CRITERIA Subcommand (MLP command)

The CRITERIA subcommand specifies computational and resource settings for the MLP procedure.

TRAINING Keyword

The TRAINING keyword specifies the training type, which determines how the neural network processes training data records.

The online and mini-batch training methods are explicitly dependent upon data order; however, even batch training is dependent upon data order because initialization of synaptic weights involves subsampling from the dataset. To minimize data order effects, randomly order the cases before running the MLP procedure.

BATCH. Batch training. Updates the synaptic weights only after passing all training data records—that is, batch training uses information from all records in the training dataset. Batch training is often preferred because it directly minimizes the total prediction error. However, batch training may need to update the weights many times until one of the stopping rules is met and, hence, may need many data passes. It is most useful for smaller datasets. This is the default training type.

ONLINE. Online training. Updates the synaptic weights after every single training data record—that is, online training uses information from one record at a time. Online training continuously gets a record and updates the weights until one of the stopping rules is met. If all the records are used once and none of the stopping rules is met, then the process continues by recycling the data records. Online training is superior to batch only for larger datasets with associated predictors. If there are many records and many inputs, and their values are not independent of each other, then online training can more quickly obtain a reasonable answer than batch training.

MINIBATCH. Mini-batch training. Divides the training data records into K groups of approximately equal size, then updates the synaptic weights after passing one group—that is, mini-batch training uses information from a group of records. The process then recycles the data group if necessary. The number of training records per mini-batch is determined by the MINIBATCHSIZE keyword. Mini-batch training offers a compromise between batch and online training, and it may be best for "medium-size" datasets.

MINIBATCHSIZE Keyword

The MINIBATCHSIZE keyword specifies the number of training records per mini-batch.

  • Specify AUTO to automatically compute the number of records per mini-batch as R = min(max(M/10,2),memsize), where M is the number of training records and memsize is the maximum number of cases to store in memory (see the MEMSIZE keyword below). If the remainder of M/R is r, then when the end of the data is reached, the process places the final r records in the same mini-batch with the first Rr records of the next data pass. This “wrapping” of mini-batches will place different cases in the mini-batches with each data pass unless R divides M with no remainder.
  • Alternatively, specify an integer greater than or equal to 2 and less than or equal to memsize to request a particular number of records. If the number of training records turns out to be less than the specified MINIBATCHSIZE, the number of training records is used instead.
  • The default is AUTO.
  • This keyword is ignored if TRAINING = MINIBATCH is not in effect.

MEMSIZE Keyword

The MEMSIZE keyword specifies the maximum number of cases to store in memory when synaptic weight initialization, automatic architecture selection, and/or mini-batch training is in effect.

  • Specify an integer greater than or equal to 2. The default is 1000.

OPTIMIZATION Keyword

The OPTIMIZATION keyword specifies the optimization algorithm used to determine the synaptic weights.

GRADIENTDESCENT. Gradient descent. Gradient descent is the required optimization algorithm for online and mini-batch training. It is optional for batch training. When gradient descent is used with online and mini-batch training, the algorithm’s user-specified parameters are the initial learning rate, lower bound for the learning rate, momentum, and number of data passes (see the LEARNINGINITIAL, LEARNINGLOWER, MOMENTUM, and LEARNINGEPOCHS keywords, respectively). With batch training, the user-specified parameters are the initial learning rate and the momentum.

SCALEDCONJUGATE. Scaled conjugate gradient. Scaled conjugate gradient is the default for batch training. The assumptions that justify the use of conjugate gradient methods do not apply to the online and mini-batch training, so this method may not be used if TRAINING = ONLINE or MINIBATCH. The user-specified parameters are the initial lambda and sigma (see the LAMBDAINITIAL and SIGMAINITIAL keywords).

LEARNINGINITIAL Keyword

The LEARNINGINITIAL keyword specifies the initial learning rate η0 for the gradient descent optimization algorithm.

  • Specify a number greater than 0. The default is 0.4.
  • This keyword is ignored if OPTIMIZATION = GRADIENTDESCENT is not in effect.

LEARNINGLOWER Keyword

The LEARNINGLOWER keyword specifies the lower boundary for the learning rate ηlow when gradient descent is used with online or mini-batch training.

  • Specify a number greater than 0 and less than the initial learning rate (see the LEARNINGINITIAL keyword). The default is 0.001.
  • This keyword is ignored if TRAINING = ONLINE or MINIBATCH and OPTIMIZATION = GRADIENTDESCENT are not in effect.

MOMENTUM Keyword

The MOMENTUM keyword specifies the initial momentum rate α for the gradient descent optimization algorithm.

  • Specify a number greater than 0. The default is 0.9.
  • This keyword is ignored if OPTIMIZATION = GRADIENTDESCENT is not in effect.

LEARNINGEPOCHS Keyword

The LEARNINGEPOCHS keyword specifies the number of epochs (data passes of the training set) p to reduce the learning rate when gradient descent is used with online or mini-batch training. You can control the learning rate decay factor β by specifying the number of epochs it takes for the learning rate to decrease from η0 to ηlow. This corresponds to β = (1/p K)*ln(η0low), where K is the total number of mini-batches in the training dataset. For online training, K = M, where M is the number of training records.

  • Specify an integer greater than 0. The default is 10.
  • This keyword is ignored if TRAINING = ONLINE or MINIBATCH and OPTIMIZATION = GRADIENTDESCENT are not in effect.

LAMBDAINITIAL Keyword

The LAMBDAINITIAL keyword specifies the initial lambda, λ0, for the scaled conjugate gradient optimization algorithm.

  • Specify a number greater than 0 and less than 10-6. The default is 0.0000005.
  • This keyword is ignored if OPTIMIZATION = SCALEDCONJUGATE is not in effect.

SIGMAINITIAL Keyword

The SIGMAINITIAL keyword specifies the initial sigma, σ0, for the scaled conjugate gradient optimization algorithm.

  • Specify a number greater than 0 and less than 10-4. The default is 0.00005.
  • This keyword is ignored if OPTIMIZATION = SCALEDCONJUGATE is not in effect.

INTERVALCENTER and INTERVALOFFSET Keywords

The INTERVALCENTER and INTERVALOFFSET keywords specify the interval [a 0a, a 0+a] in which weight vectors are randomly generated when simulated annealing is used. INTERVALCENTER corresponds to a 0 and INTERVALOFFSET corresponds to a.

  • Simulated annealing is used to break out of a local minimum, with the goal of finding the global minimum, during the optimization algorithm. This approach is used in weight initialization and automatic architecture selection.
  • Specify a number for INTERVALCENTER. The INTERVALCENTER default is 0. Specify a number greater than 0 for INTERVALOFFSET. The INTERVALOFFSET default is 0.5. The default interval is [−0.5, 0.5].