Data set considerations

When you prepare a data set for training, take into account these limitations, requirements, and special considerations.

Note: Unless otherwise noted, "images" refers to individual images and captured video frames.

Supported image files

The following image files are supported:

  • JPEG
  • PNG
  • DICOM

Training with high-resolution images

When you are training an object detection model type and your data set has high-resolution images, ensure that you select the high-resolution model optimization type. For more information about training, see Settings for training models.

When you are training a different model type, such as image classification, prepare images as follows:

  • Downsample the size of images to 1 or 2 megapixels if they do not require fine detail.
  • Divide images into smaller images of 1 or 2 megapixels each if they do require fine detail.

Supported video files

The following video files are supported:

  • Ogg Vorbis (.ogg)
  • VP8 or VP9 (.webm)
  • H.264 encoded videos in the MP4 format (.mp4)

The following video formats are supported only by API:

  • Matroska (.mkv)
  • Audio Video Interleave (.avi)
  • Moving Picture Experts Group (.mpg or .mpeg2)
Note: Videos that are encoded with the H.265 codec standard are not supported.

Uploading files

You can upload multiple images or video files individually or together in a .zip file.

Uploading image and video files has the following limitations:

  • Uploading a folder that contains images or videos is not supported.
  • Importing a .zip file that has a directory structure into an existing data set is not supported.
  • Do not leave the Maximo® Visual Inspection page, close the tab or window, or refresh the page before the upload is completed. However, you can go to different pages within Maximo Visual Inspection during upload.

Limitations on model types

Images are scaled up or down to a fixed resolution during model training and inference.

Table 1. Image resolution for model training and inference
Model type Image dimensions Notes
Single shot detector (SSD) 512 x 512 pixels  
YOLO v3 608 x 608 pixels  
Tiny YOLO v3 416 x 416 pixels  
Faster R-CNN 1000 x 600 pixels The original aspect ratio is maintained. If necessary, black bands are added to the image to make it fit.
Detectron2 1333 x 800 pixels The original aspect ratio is maintained. If necessary, black bands are added to the image to make it fit
GoogLeNet 224 x 224 pixels  
Structured segment network (SSN) 224 x 224 pixels  
High resolution 1333 x 800 pixels Images are scaled down only during training. Images are not scaled down during inference.
Note: Images with COCO annotations are supported. For more information about COCO annotations, see Importing images with COCO annotations.

How many images are needed?

A data set with various representative labeled objects can train a more accurate model. The exact number of images and objects cannot be specified, but some guidelines recommend as many as 1,000 representative images for each class. However, you might not need a data set so large to train a model with satisfactory accuracy.

The number of images that are required depends on the kind of training you plan on doing.

Image classification

At least two categories must exist. Each category must have at least five images.

Object detection

The data set must contain at least five images that have an object that is labeled for each defined object. For example, if you want to train the data set to recognize cars, you must add the car label to at least five images. But if you have three images and one video, you must add the label to each image and at least two frames of the video. Labeling five cars in one image is not adequate. If this requirement is not met, and you train the model, it will not be trained to recognize that type of object.

For object detection model training, images that do not have any labeled objects are not used in the training of the model.

Important: Not all of the images in a data set are used for training. Assuming that you did not change the value for Ratio when training your model, 20% of the images are randomly selected and used for validation instead of training. It is important to have enough images of every category or object.

For example, consider a data set to be used for training of an object detection model that has 200 images. With the default configuration for model training, 20% of the images, which is 40 images, are selected for testing the model. If a label LabelA is used to identify an object in the data set, the following scenarios are possible if the number of images that are labeled with the object are smaller than the test data set, for example, if only 20 images with objects are labeled as LabelA:

  • It is possible that all of the images with LabelA are in the "training" data set, and none of the images are used for testing of the model. This situation results in unknown accuracy for LabelA, since there are no tests of the accuracy.
  • Similarly, it is possible that all 20 images with LabelA objects are in the test data set, but no images are used for training. This situation results in low or 0% accuracy for the object because the model was not trained with any images containing the LabelA objects.

If your data set does not have many images or sufficient variety for training, consider using the Augmentation feature to increase the data set.

Special considerations for object detection models

Accuracy for object detection models can be more challenging because it includes intersection over union (IoU), especially for models that use segmentation instead of bounding boxes.

IoU is calculated by the intersection between a ground truth bounding box and a predicted bounding box, divided by the union of both bounding boxes, where the intersection is the area of overlap, a ground truth bounding box is the hand-drawn box, and the predicted bounding box is drawn by IBM Maximo Visual Inspection.

In the case of object detection, the object might be correctly identified but the overlap of the boundary generated by the model is not accurate, which results in a poor IoU metric. This metric might be improved by more precise object labeling to reduce background noise, by training the model for longer, or both.

Ensure that the data sets that you use to train Anomaly optimized models contain only images of non-anomalous objects. These images are used to train a model that recognizes an object and identifies similar objects that have different characteristics. When you train and test the model, for best results, use images that present the object consistently. That is, make sure that the object's angle, centering, and scale are similar and that the image backgrounds are similar.