Data set considerations
When you prepare a data set for training, take into account these limitations, requirements, and special considerations.
- Supported image files
- Training with high-resolution images
- Supported video files
- Uploading files
- Limitations on model types
- How many images are needed?
- Image classification
- Object detection
- Special considerations for object detection models
Supported image files
The following image files are supported:
- JPEG
- PNG
- DICOM
Training with high-resolution images
When you are training an object detection model type and your data set has high-resolution images, ensure that you select the high-resolution model optimization type. For more information about training, see Settings for training models.
When you are training a different model type, such as image classification, prepare images as follows:
- Downsample the size of images to 1 or 2 megapixels if they do not require fine detail.
- Divide images into smaller images of 1 or 2 megapixels each if they do require fine detail.
Supported video files
The following video files are supported:
- Ogg Vorbis (.ogg)
- VP8 or VP9 (.webm)
- H.264 encoded videos in the MP4 format (.mp4)
The following video formats are supported only by API:
- Matroska (.mkv)
- Audio Video Interleave (.avi)
- Moving Picture Experts Group (.mpg or .mpeg2)
Uploading files
You can upload multiple images or video files individually or together in a .zip file.
Uploading image and video files has the following limitations:
- Uploading a folder that contains images or videos is not supported.
- Importing a .zip file that has a directory structure into an existing data set is not supported.
- Do not leave the Maximo® Visual Inspection page, close the tab or window, or refresh the page before the upload is completed. However, you can go to different pages within Maximo Visual Inspection during upload.
Limitations on model types
Images are scaled up or down to a fixed resolution during model training and inference.
Model type | Image dimensions | Notes |
---|---|---|
Single shot detector (SSD) | 512 x 512 pixels | |
YOLO v3 | 608 x 608 pixels | |
Tiny YOLO v3 | 416 x 416 pixels | |
Faster R-CNN | 1000 x 600 pixels | The original aspect ratio is maintained. If necessary, black bands are added to the image to make it fit. |
Detectron2 | 1333 x 800 pixels | The original aspect ratio is maintained. If necessary, black bands are added to the image to make it fit |
GoogLeNet | 224 x 224 pixels | |
Structured segment network (SSN) | 224 x 224 pixels | |
High resolution | 1333 x 800 pixels | Images are scaled down only during training. Images are not scaled down during inference. |
How many images are needed?
A data set with various representative labeled objects can train a more accurate model. The exact number of images and objects cannot be specified, but some guidelines recommend as many as 1,000 representative images for each class. However, you might not need a data set so large to train a model with satisfactory accuracy.
The number of images that are required depends on the kind of training you plan on doing.
Image classification
At least two categories must exist. Each category must have at least five images.
Object detection
The data set must contain at least five images that have an object that is labeled for each defined object. For example, if you want to train the data set to recognize cars, you must add the car label to at least five images. But if you have three images and one video, you must add the label to each image and at least two frames of the video. Labeling five cars in one image is not adequate. If this requirement is not met, and you train the model, it will not be trained to recognize that type of object.
For object detection model training, images that do not have any labeled objects are not used in the training of the model.
For example, consider a data set to be used for training of an object detection model that has 200 images. With the default configuration for model training, 20% of the images, which is 40 images, are selected for testing the model. If a label LabelA is used to identify an object in the data set, the following scenarios are possible if the number of images that are labeled with the object are smaller than the test data set, for example, if only 20 images with objects are labeled as LabelA:
- It is possible that all of the images with LabelA are in the "training" data set, and none of the images are used for testing of the model. This situation results in unknown accuracy for LabelA, since there are no tests of the accuracy.
- Similarly, it is possible that all 20 images with LabelA objects are in the test data set, but no images are used for training. This situation results in low or 0% accuracy for the object because the model was not trained with any images containing the LabelA objects.
If your data set does not have many images or sufficient variety for training, consider using the Augmentation feature to increase the data set.
Special considerations for object detection models
Accuracy for object detection models can be more challenging because it includes intersection over union (IoU), especially for models that use segmentation instead of bounding boxes.
IoU is calculated by the intersection between a ground truth bounding box and a predicted bounding box, divided by the union of both bounding boxes, where the intersection is the area of overlap, a ground truth bounding box is the hand-drawn box, and the predicted bounding box is drawn by IBM Maximo Visual Inspection.
In the case of object detection, the object might be correctly identified but the overlap of the boundary generated by the model is not accurate, which results in a poor IoU metric. This metric might be improved by more precise object labeling to reduce background noise, by training the model for longer, or both.
Ensure that the data sets that you use to train Anomaly optimized models contain only images of non-anomalous objects. These images are used to train a model that recognizes an object and identifies similar objects that have different characteristics. When you train and test the model, for best results, use images that present the object consistently. That is, make sure that the object's angle, centering, and scale are similar and that the image backgrounds are similar.