Training (Multilayer Perceptron)

The Training tab is used to specify how the network should be trained. The type of training and the optimization algorithm determine which training options are available.

Type of Training. The training type determines how the network processes the records. Select one of the following training types:

Batch. Updates the synaptic weights only after passing all training data records; that is, batch training uses information from all records in the training dataset. Batch training is often preferred because it directly minimizes the total error; however, batch training may need to update the weights many times until one of the stopping rules is met and hence may need many data passes. It is most useful for "smaller" datasets.
Online. Updates the synaptic weights after every single training data record; that is, online training uses information from one record at a time. Online training continuously gets a record and updates the weights until one of the stopping rules is met. If all the records are used once and none of the stopping rules is met, then the process continues by recycling the data records. Online training is superior to batch for "larger" datasets with associated predictors; that is, if there are many records and many inputs, and their values are not independent of each other, then online training can more quickly obtain a reasonable answer than batch training.
Mini-batch. Divides the training data records into groups of approximately equal size, then updates the synaptic weights after passing one group; that is, mini-batch training uses information from a group of records. Then the process recycles the data group if necessary. Mini-batch training offers a compromise between batch and online training, and it may be best for "medium-size" datasets. The procedure can automatically determine the number of training records per mini-batch, or you can specify an integer greater than 1 and less than or equal to the maximum number of cases to store in memory. You can set the maximum number of cases to store in memory on the Options tab.

Optimization Algorithm. This is the method used to estimate the synaptic weights.

Scaled conjugate gradient. The assumptions that justify the use of conjugate gradient methods apply only to batch training types, so this method is not available for online or mini-batch training.
Gradient descent. This method must be used with online or mini-batch training; it can also be used with batch training.

Training Options. The training options allow you to fine-tune the optimization algorithm. You generally will not need to change these settings unless the network runs into problems with estimation.

Training options for the scaled conjugate gradient algorithm include:

Initial Lambda. The initial value of the lambda parameter for the scaled conjugate gradient algorithm. Specify a number greater than 0 and less than 0.000001.
Initial Sigma. The initial value of the sigma parameter for the scaled conjugate gradient algorithm. Specify a number greater than 0 and less than 0.0001.
Interval Center and Interval Offset. The interval center (a ₀) and interval offset (a) define the interval [a ₀−a, a ₀+a], in which weight vectors are randomly generated when simulated annealing is used. Simulated annealing is used to break out of a local minimum, with the goal of finding the global minimum, during application of the optimization algorithm. This approach is used in weight initialization and automatic architecture selection. Specify a number for the interval center and a number greater than 0 for the interval offset.

Training options for the gradient descent algorithm include:

Initial Learning Rate. The initial value of the learning rate for the gradient descent algorithm. A higher learning rate means that the network will train faster, possibly at the cost of becoming unstable. Specify a number greater than 0.
Lower Boundary of Learning Rate. The lower boundary on the learning rate for the gradient descent algorithm. This setting applies only to online and mini-batch training. Specify a number greater than 0 and less than the initial learning rate.
Momentum. The initial momentum parameter for the gradient descent algorithm. The momentum term helps to prevent instabilities caused by a too-high learning rate. Specify a number greater than 0.
Learning rate reduction, in Epochs. The number of epochs (p), or data passes of the training sample, required to reduce the initial learning rate to the lower boundary of the learning rate when gradient descent is used with online or mini-batch training. This gives you control of the learning rate decay factor β = (1/p K)*ln(η₀/η_low), where η₀ is the initial learning rate, η_low is the lower bound on the learning rate, and K is the total number of mini-batches (or the number of training records for online training) in the training dataset. Specify an integer greater than 0.

How To Set Training Criteria for Multilayer Perceptron

This feature requires the Neural Networks option.

From the menus choose:
Analyze > Neural Networks > Multilayer Perceptron...
In the Multilayer Perceptron dialog box, click the Training tab.