The goal of this scenario is to create a deep learning model to determine when a cash
register is being opened in a video.
This scenario assumes you have videos that are named Cashier 1 - Cashier
5. These videos are added to a data set named Open cash register. To create a
deep learning model, following these steps:
Preparing videos for import
Import videos and create a data set
Labeling actions in a video
Training the model
Deploying a model
Step 1: Preparing videos for
import
Before you import videos for use with action detection models, prepare them as
follows:
Cut out long periods of background video without any actions.
Transcode videos with FPS greater than 30 down to 30 FPS.
Crop the video so that actions take up a large part of the frame.
Step 2: Import videos and create a
data set
First, create a data set and add videos to it.
Log in to Maximo® Visual Inspection.
Click Data Sets in the side bar to open the Data
Sets page. You can choose from several ways to create a new data set. For this example,
create a new, empty data set.
From the Data set page, click the icon and name the data set
Open cash register.
To add a video to the data set, click the Open cash register data set and click
Import file or drag the video to the + area. In this
example, videos for Cashier 1 - Cashier 5 are added.
Important: Do not leave the Maximo Visual Inspection page, close the tab
or window, or refresh until the upload completes. You can go to different pages within
Maximo Visual Inspection during the upload.
Step 3: Labeling actions in a
video
The next step is to label actions in the videos. Create the Open action
and label it in several videos.
You do not need to provide a minimum number of labels, but more data
typically yields better results.
Each action label must be in the range of 5 - 1000 frames. The required length of time depends
on the video's FPS. For 30 FPS, each action label must be in the range of 0.166 - 33.367 seconds.
The label's duration is checked based on the frames per second and the selected start and end times.
For example, if an action label is marked with a start time of 12.295 seconds and end time of 12.395
seconds for a 30 FPS video, an error is generated. The error message that is returned is
Label duration of '100' milliseconds does not meet required duration
between '166.83333' milliseconds and '33366.668' milliseconds
At least 10 instances of each action tag in the data set are recommended.
Longer labeling action time yields better results.
If multiple types of actions are labeled in a data set, make sure that the total time for each
action type is similar. For example, assume that you tag 20 instances of the action
jump in a data set with a total time of 27 seconds. Then, you tag 10 instances of the
action drive in the data set with a total time of 53 seconds. In this case, the model
is biased toward the drive action. The total time for each action type is shown in the
Actions section.
Follow these steps to label actions. For more information, see Labeling actions.
Open the Open cash register data set.
Create the Open action tag in the data set by expanding
Actions and clicking Add action.
Select the appropriate video and click Label actions. The existing tags
are listed.
Find the start of an action by using the video control bar:
Use the slider or play button to get near the part of the video you want.
Set the playback rate (1x, 0.5x, and so on) to control how fast the video plays.
Use the +1 and -1 buttons to move forward or
backward one frame.
Find the end of the action, then click + in End
time.
Select Open for the action name, then click Save
action.
Continue adding actions to videos until you are done.
Step 4: Training the model
After you
identify all of the action labels in your data set, you can train your deep learning model by
following these steps:
From the Data set page, click Train.
Complete the fields on the Train Data set page, ensuring that you select Action
detection. Leave the default values for all other options.
Click Train.
(Optional - Only supported when training for object detection.) Stop the training process
by clicking Stop training > Keep Model >
Continue. You can wait for the entire training model process complete.
However, you can opt to stop the training process when the lines in the training graph start to
flatten out, as shown in Figure 1. You might opt to stop the training process because improvements
in quality of training might plateau over time. Therefore, the fastest way to deploy a model and
refine the data set is to stop the process before quality stops improving.Figure 1. Model training graph
If the training graph converges quickly and has 100% accuracy, the data set does not have
enough information. The same is true if the accuracy of the training graph fails to rise or the
errors in the graph do not decrease at the end of the training process. For example, a model with
high accuracy might be able to discover all instances of different race cars. However, the same
model might be unable to differentiate between specific race cars or cars that have different
colors. In this situation, add more images, video frames, or videos to the data set. Then, label
those objects and try the training again.
Step 5: Deploying a model
Each deployed
action detection model (SSD) takes one GPU.
To deploy the trained model, complete the
following steps.
Click Models from the menu.
Select the model that you created in the previous section and click
Deploy.
Specify a name for the model, and click Deploy. The Deployed Models page
is displayed, and the model is deployed when the status column displays
Ready.
Double-click the deployed model to get the API endpoint and test other videos or images against
the model. For information about APIs, see REST APIs.