The goal of this scenario is to create a deep learning model to monitor traffic on a busy
road.
This scenario uses a video that displays the traffic during the day. This video is used to
determine how many cars are on the road and to determine what the peak traffic times are.
The video file that is used in this scenario is available for download here: Download video file.
Follow these steps to create a deep learning model:
Click Data Sets in the side bar to open the Data Sets page. You can choose from several
ways to create a new data set. For this example, create a new, empty data set.
From the Data set page, click the icon and name the data set Traffic Video.
To add a video to the data set, click the Traffic Video data set and click Import file or
drag the video to the + area.
Important: Do not leave the Maximo Visual Inspection page, close the
tab or window, or refresh until the upload completes. You can go to different pages within
Maximo Visual Inspection during the upload.
Step 2: Label objects in a
video
Label objects in the video. For object detection, you must have at minimum five
labels for each object. Create "Car" and "Motorcycle" objects and label at least five frames in the
video with cars and at least five frames with motorcycles.
Select the video from your data set and select Label Objects.
Capture frames by using one of these methods:
Click Auto capture frames and specify a value for Capture Interval (Seconds) that
results in at least five frames. Select this option and specify 10 seconds. Depending on the length
and size of the video and the interval you specified to capture frames, the process to capture
frames can take several minutes.
Click Capture frame to manually capture frames. If you use this option, you must capture
a minimum of five frames from the video.
If you used Auto capture frames, verify that the video frames contain enough of each
object type. If not, follow these steps to add new frames to the existing data set. In this
scenario, the motorcycle is only in a single automatically captured frame at 40 seconds. Therefore,
you must capture at least four more frames with the motorcycle. The motorcycle comes into view at
36.72 seconds. To correctly capture the motorcycle in motion, create extra frames at 37.79 seconds,
41.53 seconds, and 42.61 seconds. Play the video. When the frame you want is displayed, click pause.
Click Capture Frame.
Create new object labels for the data set by clicking Add new by the Objects list. Enter
Car and click Add. Then, enter Motorcycle and click
OK. If you later want to delete the label, it must be done at the data set level. It cannot
be done from an individual frame or image.
Label the objects in the frames:
Select the first frame in the carousel.
Select the correct object label, for example, "Car".
Choose Box or Polygon, depending on the shape you want to draw around each object.
Boxes are faster to label and train, but less accurate. Only Detectron or High resolution models
support polygons. However, if you use polygons to label your objects, then use this data set to
train a model that does not support polygons, bounding boxes are defined and used. Draw the
appropriate shape around the object. When Box or Polygon is selected, you hold down
the Alt key for non-drawing interactions in the image. These interactions include
trying to selecting, moving, or editing previously drawn shapes in the image, and panning the image
by using the mouse. To return to the normal mouse interactions, deselect the Box or
Polygon button.
The following figure displays the captured video frame at 41.53 seconds with object labels
of Car and Motorcycle. Figure 1 also displays a box around the five frames (four of
the frames were added manually) in the carousel that required object labels for the motorcycle that
is in each frame.
Figure 1. Labeling objects in Maximo Visual Inspection
Step 3: Train a model
With all the object labels
that are identified in your data set, you can now train your deep learning model. To train a model,
complete the following steps:
From the Data set page, click Train.
Complete the fields on the Train Data set page, ensuring that you select Object
Detection. Choose Accuracy (faster R-CNN) for Model selection
Click Train.
(Optional - Only supported when training for object detection.) Stop the training process
by clicking Stop training > Keep Model > Continue.You can wait for the
entire training model process complete. However, you can opt to stop the training process when the
lines in the training graph start to flatten out, as shown in Figure 2. You might opt to stop the
training process because improvements in quality of training might plateau over time. Therefore, the
fastest way to deploy a model and refine the data set is to stop the process before quality stops
improving. Use the early stop functionality carefully when training segmented object detection
models (such as models that use the Detectron or High resolution model types). Larger iteration
counts and training times can improve accuracy even when the graph indicates that the accuracy is
plateauing. The precision of the label can continue to improve even when the accuracy of identifying
the object location stops improving.Figure 2. Model training graph
If the training graph converges quickly and has 100% accuracy, the data set does not have
enough information. The same is true if the accuracy of the training graph fails to rise or the
errors in the graph do not decrease at the end of the training process. For example, a model with
high accuracy might be able to discover all instances of different race cars. However, the same
model might be unable to differentiate between specific race cars or cars that have different
colors. In this situation, add more images, video frames, or videos to the data set. Then, label
those objects and try the training again.
Step 4: Deploy a trained model
GPU
usage depends on the model type:
Each High resolution, Structured segment network (SSN), Anomaly optimized, or custom deployed
model takes one GPU. The GPU group is listed as '-', which indicates that this model uses a full GPU
and does not share the resource with any other deployed models.
Multiple Faster R-CNN, GoogLeNet, SSD, YOLO v3, Tiny YOLO v3, and Detectron2 models are deployed
to a single GPU. That is, the model is deployed to the GPU that has the most models deployed on it,
if sufficient memory is available on the GPU. The GPU group can be used to determine which deployed
models share a GPU resource. To free up a GPU, all deployed models in a GPU group must be
deleted or undeployed. IBM® Maximo Visual Inspection leaves a variable buffer on the GPU. This
depends on the combination of models that are currently deployed.
To deploy the trained model, complete the following steps.
Click Models from the menu.
Select the model that you created in the previous section and click Deploy.
Specify a name for the model, and click Deploy. The Deployed Models page is displayed,
and the model is deployed when the status column displays Ready.
Double-click the deployed model to get the API endpoint and test other videos or images against
the model.
Note: For more information about APIs, see REST APIs.
Note: Because High resolution models are compute-intensive, they take much longer
than other models to perform video and image inference.
Step 5: Automatically label frames
in a video
You can use the auto-label function to automatically identify objects in the
frames of a video after you deploy a model.
In this scenario, you have only nine frames. To
improve the accuracy for your deep learning model, you can add more frames to the data set.
Remember, you can rapidly iterate by stopping the training on a model and checking the results of
the model against a test data set. You can also use the model to auto-label more objects in your
data set. This process improves the overall accuracy of your final model.
To use the
auto-label function, complete the following steps:
Note: Any frames that were
previously captured by using auto-capture but were not manually labeled are deleted before
auto-labeling. Deleting these frames helps avoid labeling duplicate frames. Manually captured frames
are not deleted.
Click Data sets from the menu, and select the data set that you used to create the
previously trained model.
Select the video in the data set that had nine frames, and click Label Objects.
Click Auto label.
Specify how often you want to capture frames and automatically label the frames. Select the name
of the trained model that you deployed in step 3 of the
deployment phase then click Auto label. In this scenario, you previously captured
frames every 10 seconds. To improve the accuracy of the deep learning model by capturing and
labeling more frames, you can specify 6 seconds.
After the auto-label process completes, the new frames are added to the carousel. Click the new
frames and verify that the objects have the correct labels. The object labels that were
automatically added are green and the object labels you manually added are in blue. In this
scenario, the carousel now has 17 frames.
Next steps
You can continue to refine the data set as much as you want. When you are satisfied with the data
set, you can retrain the model by repeating the first three phases (import, label, train).
This time when you retrain the model, you might want to train the model for a longer time to
improve the overall accuracy of the model. The goal is for the loss lines in the training model
graph to converge to a stable flat line. The lower the line value, the better.
After the training is completed, you can redeploy the model by completing steps 1 - 3 of the
deployment phase. You can double-click the deployed model to get the API endpoint and test other
videos or images against the model.