Scenario: Detecting actions in videos

The goal of this scenario is to create a deep learning model to determine when a cash register is being opened in a video.

This scenario assumes you have videos that are named Cashier 1 - Cashier 5. These videos are added to a data set named Open cash register. To create a deep learning model, following these steps:

  1. Preparing videos for import
  2. Import videos and create a data set
  3. Labeling actions in a video
  4. Training the model
  5. Deploying a model

Step 1: Preparing videos for import

Before you import videos for use with action detection models, prepare them as follows:

  • Cut out long periods of background video without any actions.
  • Transcode videos with FPS greater than 30 down to 30 FPS.
  • Crop the video so that actions take up a large part of the frame.

Step 2: Import videos and create a data set

First, create a data set and add videos to it.

  1. Log in to Maximo® Visual Inspection.
  2. Click Data Sets in the side bar to open the Data Sets page. You can choose from several ways to create a new data set. For this example, create a new, empty data set.
  3. From the Data set page, click the icon and name the data set Open cash register.
  4. To add a video to the data set, click the Open cash register data set and click Import file or drag the video to the + area. In this example, videos for Cashier 1 - Cashier 5 are added.
Important: Do not leave the Maximo Visual Inspection page, close the tab or window, or refresh until the upload completes. You can go to different pages within Maximo Visual Inspection during the upload.

Step 3: Labeling actions in a video

The next step is to label actions in the videos. Create the Open action and label it in several videos.

You do not need to provide a minimum number of labels, but more data typically yields better results.

  • Each action label must be in the range of 5 - 1000 frames. The required length of time depends on the video's FPS. For 30 FPS, each action label must be in the range of 0.166 - 33.367 seconds. The label's duration is checked based on the frames per second and the selected start and end times. For example, if an action label is marked with a start time of 12.295 seconds and end time of 12.395 seconds for a 30 FPS video, an error is generated. The error message that is returned is
    Label duration of '100' milliseconds does not meet required duration 
    between '166.83333' milliseconds and '33366.668' milliseconds
  • At least 10 instances of each action tag in the data set are recommended.
  • Longer labeling action time yields better results.
  • If multiple types of actions are labeled in a data set, make sure that the total time for each action type is similar. For example, assume that you tag 20 instances of the action jump in a data set with a total time of 27 seconds. Then, you tag 10 instances of the action drive in the data set with a total time of 53 seconds. In this case, the model is biased toward the drive action. The total time for each action type is shown in the Actions section.

Follow these steps to label actions. For more information, see Labeling actions.

  1. Open the Open cash register data set.
  2. Create the Open action tag in the data set by expanding Actions and clicking Add action.
  3. Select the appropriate video and click Label actions. The existing tags are listed.
  4. Find the start of an action by using the video control bar:
    • Use the slider or play button to get near the part of the video you want.
    • Set the playback rate (1x, 0.5x, and so on) to control how fast the video plays.
    • Use the +1 and -1 buttons to move forward or backward one frame.
  5. Find the end of the action, then click + in End time.
  6. Select Open for the action name, then click Save action.
  7. Continue adding actions to videos until you are done.

Step 4: Training the model

After you identify all of the action labels in your data set, you can train your deep learning model by following these steps:

  1. From the Data set page, click Train.
  2. Complete the fields on the Train Data set page, ensuring that you select Action detection. Leave the default values for all other options.
  3. Click Train.
  4. (Optional - Only supported when training for object detection.) Stop the training process by clicking Stop training > Keep Model > Continue. You can wait for the entire training model process complete. However, you can opt to stop the training process when the lines in the training graph start to flatten out, as shown in Figure 1. You might opt to stop the training process because improvements in quality of training might plateau over time. Therefore, the fastest way to deploy a model and refine the data set is to stop the process before quality stops improving.
    Figure 1. Model training graph
    The image a loss on the vertical axis and iterations on the horizontal axis. The more iterations that occur the line for loss converge to a flat line.
    If the training graph converges quickly and has 100% accuracy, the data set does not have enough information. The same is true if the accuracy of the training graph fails to rise or the errors in the graph do not decrease at the end of the training process. For example, a model with high accuracy might be able to discover all instances of different race cars. However, the same model might be unable to differentiate between specific race cars or cars that have different colors. In this situation, add more images, video frames, or videos to the data set. Then, label those objects and try the training again.

Step 5: Deploying a model

Each deployed action detection model (SSD) takes one GPU.

To deploy the trained model, complete the following steps.

  1. Click Models from the menu.
  2. Select the model that you created in the previous section and click Deploy.
  3. Specify a name for the model, and click Deploy. The Deployed Models page is displayed, and the model is deployed when the status column displays Ready.
  4. Double-click the deployed model to get the API endpoint and test other videos or images against the model. For information about APIs, see REST APIs.