Data sets for object detection models

When you are preparing a data set for training an object detection model, ensure that the following requirements are met.

Requirements for accurate training

  • The data set has at least five images.
  • Every defined object has an object label. Images that do not have object labels are not used to train the model.
Note: When requirements are not met, the model cannot be trained to recognize that object type.

Example

You are training an object detection model to recognize cars, and the data set contains the following parameters:

  • Five images: Ensure that you define and label a car as an object in at least five images.
  • Three images and one video: Ensure that you define and label a car as an object in three images and in at least two frames of the video. Labeling five cars in one image is not adequate.
Note: A data set that has a diverse representation of labeled objects produces a more accurately trained model. The exact number of images and objects cannot be specified, but the number is as high as 1,000 representative images for each class. However, you might not need such a large data set to train a model with satisfactory accuracy.

If your data set does not have many images or a sufficient variety for training, use the Augmentation feature to increase the data set.

Validation

Important: Not all of the images in a data set are used for training. Assuming that you did not change the value for Ratio when training your model, 20% of the images are randomly selected and used for validation instead of training. It is important to have enough images of every category or object.

For example, consider a data set to be used for training of an object detection model that has 200 images. With the default configuration for model training, 20% of the images, which is 40 images, are selected for testing the model. If a label LabelA is used to identify an object in the data set, the following scenarios are possible if the number of images that are labeled with the object are smaller than the test data set, for example, if only 20 images with objects are labeled as LabelA:

  • It is possible that all of the images with LabelA are in the "training" data set, and none of the images are used for testing of the model. This situation results in unknown accuracy for LabelA, since there are no tests of the accuracy.
  • Similarly, it is possible that all 20 images with LabelA objects are in the test data set, but no images are used for 88 training. This situation results in low or 0% accuracy for the object because the model was not trained with any images containing the LabelA objects.

If your data set does not have many images or sufficient variety for training, consider using the Augmentation feature to increase the data set.

Special considerations for object detection models

Accuracy for object detection models can be more challenging because it includes intersection over union (IoU), especially for models that use segmentation instead of bounding boxes.

IoU is calculated by the intersection between a ground truth bounding box and a predicted bounding box, divided by the union of both bounding boxes, where the intersection is the area of overlap, a ground truth bounding box is the hand-drawn box, and the predicted bounding box is drawn by IBM® Maximo® Visual Inspection.

In the case of object detection, the object might be correctly identified but the overlap of the boundary generated by the model is not accurate, which results in a poor IoU metric. This metric might be improved by more precise object labeling to reduce background noise, by training the model for longer, or both.

Ensure that the data sets that you use to train Anomaly optimized models contain only images of non-anomalous objects. These images are used to train a model that recognizes an object and identifies similar objects that have different characteristics. When you train and test the model, for best results, use images that present the object consistently. That is, make sure that the object's angle, centering, and scale are similar and that the image backgrounds are similar.