Tune hyperparameters

Submit a hyperparameter tuning job for your training model.

Before you begin

Hyperparameter tuning is run on a train model, to learn more about creating a training model, see Create a training model.

About this task

When tuning hyperparameters, IBM Spectrum Conductor Deep Learning Impact takes advantage of IBM Spectrum Conductor to launch multiple parallel searches for the optimal hyperparameters when training your model. As a result, a new tuned model is created that contains the most optimal hyperparameters which maximize your models accuracy.

During a hyperparameter tuning job on a Caffe training model, IBM Spectrum Conductor Deep Learning Impact automatically passes the hyperparameters suggested by the hyperparameter tuning optimizer to the solver.prototxt.

During a hyperparameter tuning job on a TensorFlow training model, after you import the hyperparameter tuning packages into your model, IBM Spectrum Conductor Deep Learning Impact uses the learning rate operator and optimizer provided.

Procedure

From the cluster management console, select Workload > Spark > Deep Learning.
Click on the Models tab.
Click on the training model that you want to tune. From the overview page you can see the existing hyperparameters.
Click Hyperparameter Tuning.
Start New Tuning.
Specify the name of your tuning job.
Specify the hyperparameters search type.
Specify the tuning parameter settings.
Tuning hyperparameters options include:
Optimizer

Hyperparameter that indicates the back propagation algorithm used to train the network.

Learning rate range

Hyperparameter that specifies range of learning rate to tune. The value can be continuous numbers like 0.001-0.1 or discrete values like 0.001,0.002,0.003. Only Random search type is supported.

Hidden state size (TensorFlow only)

Hyperparameter that specifies the range of a hidden state size to tune in a recurrent neural network (RNN). This value can be a positive integer.

Weight decay range (Caffe only)

Hyperparameter that specifies the range of weight decay to tune. The value can be continuous numbers like 0.001-0.1 or discrete values like 0.001,0.002,0.003. Only Random search type is supported.

Max batch size

Hyperparameter that indicates the number of training examples utilized in one iteration.

Momentum range

Hyperparameter that specifies range of momentum to tune. The value can be continuous numbers like 0.001-0.1 or discrete values like 0.001,0.002,0.003. Only Random search type is supported.

Tuning workload settings include:
Number of workers

The number of workers.

GPUs per worker

The number of GPUs per worker.

Max iterations

Maximum number of iterations for each tuning job.

Total tuning jobs number

The total number of tuning jobs that run. Tuning tasks are stopped with the current best result when all jobs are finished.

Max tuning jobs in parallel

The maximum number of tuning jobs that can run in parallel depending on available resources.

Max running time (in minutes)

The maximum running time in minutes. Tuning tasks are stopped with the current best result when the maximum running time is reached.

Synchronization mode

Mode must be set to either asynchronous or synchronous. In asynchronous mode, a worker sends its newly computed gradients to a parameter server, the parameter server then aggregates and sends the updated gradients back to the worker to start the next computational iteration. In synchronous mode, workers wait for all other workers to complete their computation so that they communication with each other to aggregate on the latest gradients before starting the next computational iteration.
Click Start Tuning.

Results

A tuned model is created. The tuned model has the same name as the tunning job.

What to do next

Try to run a training job with the tuned training model, see Start a training run.