Training a model

After the data set has all of the object labels added, you can train your deep learning model. Trained models can then be deployed for use.

From the Data set page, click Train.
In the Train data set window, fill out the values as appropriate, then click Train:
Type of training
Image classification
Choose this if you want to use the model to categorize images as belonging to one of the types that you defined in the data set.
Note: To train a model for classification, the data set must meet these requirements:
There must be at least two categories.

Each category must have at least five images.
Object detection

Choose this if you want to use the model to label objects within images.
Note: To train a model for object detection, the data set must contain at least five images with an object tagged for each defined label. For example, if the data set has "Car" and "Motorcycle" defined as objects, at least five images need to have "Car" tagged, and at least five images need to have "Motorcycle" tagged.
Model selection

System default - Image classification

Models trained with the system default model can only be run on a GPU

Optimized for accuracy - Object detection

Models optimized for accuracy can only be run on a GPU.

Optimized for speed - Object detection

Models optimized for speed can be run anywhere, but might not be as accurate as those optimized for accuracy. These models use "you only look once" (YOLO) V2 and will take several hours to train.
You will choose the accelerator to deploy to when deploying the model. You can choose GPU, CPU, or Xilinx FPGA - 16 bit (technology preview).

Custom model

Select an imported model to use for training.

Advanced options

Base model

PowerAI Vision comes with several common models such as flowers, faces, and so on, that you can use to help classify your data. This option is available only when training for image classification.

Model hyperparameters

For advanced users, these setting are available to help fine-tune the training.
Max iteration

The maximum number of times the data is passed through the training algorithm.

Momentum (Object detection only)

This value increases the step size used when trying to find the minimum value of the error curve. A larger step size can keep the algorithm from stopping at a local minimum instead of finding the global minimum.

Ratio (Object detection only)

PowerAI Vision automatically “splits” the data set for internal validation of the model’s performance during training. The default value of 80/20 will result in 80% of the test data (at random) being used for training, and 20% being used for measurement / validation.

Test iteration (Image classification only)

The number of times data is passed through the training algorithm before possible completion. For example, if this value is 100, and Test interval is 50, the model is run through the algorithm at least 100 times; being tested ever 50 times.

Test interval (Image classification only)

The number of times the model is passed through the algorithm before testing. For example, if this value is 50, the model is tested every 50 iterations. Each of these tests becomes a data point on the metrics graphs.

Learning rate

This option determines how much the weights in the network are adjusted with respect to the loss gradient. A correctly tuned value can result in a shorter training time. However, it is recommended that only advanced users change this value.

Weight decay

This value specifies regularization in the network. It protects against over-fitting and is used to multiply the weights when training.
Note: PowerAI Vision assigns one GPU to each active training job or deployed deep learning API. For example, if a system has four GPUs and you have two deployed web APIs, there are two GPUs available for active training jobs. If a training job appears to be hanging, it might be waiting for another training job to complete, or there might not be a GPU available to run it. For information to fix this, see PowerAI Vision cannot train a model.
(Optional) Stop the training process by clicking Stop training > Keep Model > Continue.
You can wait for the entire training model process complete, or you can stop the training process when the lines in the training graph start to flatten out, as shown in the figure below. This is because improvements in quality of training might plateau over time. Therefore, the fastest way to deploy a model and refine the data set is to stop the process before quality stops improving.

Understanding the model training graph

As PowerAI Vision trains the model, the graph shows the relative performance of the model over time. The model should converge at the end of the training with low error and high accuracy.

In the figure, you can see the Loss CLS line and the Loss Bbox lines start to plateau. In the training graph, the lower the loss value, the better. Therefore, you can stop the training process when the loss value stops decreasing. The training model has completed enough iterations and you can continue to the next step.
Figure 1. Model training graph

Important: If the training graph converges quickly and has 100% accuracy, the data set does not have enough information. The same is true if the accuracy of the training graph fails to rise or the errors in the graph do not decrease at the end of the training process. For example, a model with high accuracy might be able to discover all instances of different race cars, but might have trouble differentiating between specific race cars or those that have different colors. In this situation, add more images or video frames to the data set, tag them, then try the training again.