Scenario: Detecting objects in a video

In this fictional scenario, you want to create a deep learning model to monitor traffic on a busy road. You have a video that displays the traffic during the day. From this video, you want to know how many cars are on the busy road every day, and what are the peak times that have the most cars on the road.

You can complete the following tasks in this scenario:

Importing a video
Labeling objects in a video
Training a model
Deploying a model
Automatically label frames in a video

Importing a video

To import a video and create a data set, complete the following steps:

Attention: You can play only the following video types in PowerAI Vision:

Ogg Vorbis (.ogg)
VP8 or VP9 (.webm)
H.264 encoded videos with MP4 format (.mp4) - Requires fix pack 1

For further support details, see Planning for PowerAI Vision.

Login to PowerAI Vision
From the Welcome page, click Get started to create a data set.
From the Dataset page, click the icon and enter a name for your data set. For example, Traffic Video.
Click the data set that you created in step 3. To import your video, click Import file and select the video file. You can upload more than one video at a time by selecting multiple video files. You must stay on the page until the upload completes.

Labeling objects in a video

Now that the video is uploaded, you can label objects in the video. For object detection, you must have at minimum five labels for each object. In this scenario, you want to create a Car and Motorcycle objects and you must label at least five images with cars or motorcycles.

To label objects in a video, complete the following steps:

Select the video from your data set and select Tag Objects.
Click Auto capture frames and specify the time in seconds that you want to automatically create frames from the video. You can click Capture frame to manually capture frames. If you use this option, you must capture a minimum of five frames from the video. In this scenario, the value that is specified in the Capture Interval (Seconds) field is 10 seconds.
Note: Depending on the length and size of the video and the interval you specified to capture frames, the process to capture frames can take several minutes.
Create a new object label for the data set by clicking Add New. Enter the name of the object you want to create, and click the (+) icon. You can add multiple object labels to the data set at the same time. Next, click Add. In this scenario, you want to create a two new objects that are labeled Car and Motorcycle.
Select the first frame in the carousel. In the first frame, click the corresponding object label that you created in step 3 to start labeling objects in the frame. To label objects, hold down the left mouse button and draw a box around the object.
Review the following tips about identify and drawing objects:
- Do not label part of an object. For example, you do not want to label a car that is only partially in the video frame.
- If a video frame has more than one object, you must label all objects. For example, you have two objects (cars and motorcycles) in a video frame, you must label both objects. Do not label one object (cars) and not the other object (motorcycles). Label objects with a consistent approach.
- Draw a box around each individual object. Do not draw a box around groups of objects. For example, if two objects (Cars) are right next to each other, you must draw a box around each individual object.
- Draw the box as close to the objects as possible. Do not leave blank space around the object.
- You can draw boxes around objects that overlap. For example, if one object is behind another object. You can also draw boxes that touch each other.
- You can zoom in on the video frame to make it easier to draw the box around objects.
- You cannot draw boxes that extend off the edge of the image or video frame.
In this scenario, you want to select the Car or Motorcycle object and draw a box around the cars and motorcycle in each frame.
You must select at least five frames in the video and add labels for the objects you previously tagged in step 4.
In this scenario, the motorcycle is only in a single automatically captured frame at 40 seconds. Therefore, you must capture at minimum of four more frames with the motorcycle. The motorcycle comes into view at 36.72 seconds. To correctly capture the motorcycle in motion you can create extra frames at 37.79 seconds, 41.53 seconds, and 42.61 seconds (total five frames with the motorcycle). In these new video frames, you must label both the cars and motorcycle.
Complete the following steps to manually add new frames to an existing video data set:
1. Play the video and when the frame you want is displayed, click the pause icon. You can also use the video players status bar to find a frame you want.
2. Click Capture Frame.
3. The new frame is added to the carousel.
After all objects are labeled in all of the video frames in the carousel, click Done editing.

The following figure displays the captured video frame at 41.53 seconds with object labels of Car and Motorcycle. Figure 1 also displays a box around the five frames (four of the frames were added manually) in the carousel that required object labels for the motorcycle that is in each frame.

Figure 1. Labeling objects in PowerAI Vision

The image displays GUI interface for PowerAI Vision. The image displays a screen capture of the video frame with object labels for the cars and motorcycle. Below this video frame is an image carousel that has frames from the video with time stamps.

Training a model

With all the object labels that are identified in your video data set, you can now train your deep learning model.

To train a model, complete the following steps:

From the Dataset page, click Train.
From the Train Dataset page, complete the following steps:
1. Enter the name for the model.
2. Click Object Detection.
3. For the Performance Type field, select either System Default or Customized.
Click Train.

You can wait for the entire training model process complete, but it is recommend that you stop the training process after a few minutes or when the training graph looks similar to figure 2. The reason to stop the training process before it completes is because PowerAI Vision can automatically label a data set for a deployed model. Therefore, the fastest way to deploy a model and refine our data set is to not wait for the training process to complete. To stop the training model process, click Stop training > Keep Model > Continue.

As PowerAI Vision trains the model, the graph shows the relative performance of the model over time. The model should converge at the end of the training with low error and high accuracy. If the training graph converges quickly and has 100% accuracy, the data set does not have enough information and is insufficient. The same is true in the opposite case; if the accuracy of the training graph fails to rise or the errors in the graph do not fall at the end of the training process. For example, a model with high accuracy might be able to discover all instances of different race cars, but might have trouble differentiating between specific race cars that have different colors or logos.

In this scenario, the following figure displays a point in time when the Loss CLC line and the LossBbox line start to converge. Therefore, you can stop the training process even though the training for the model is not complete. The model completed enough testing that you can deploy the model to help improve the quality and quantity of data set by using the auto label function.

Figure 2. Model training graph

Deploying a trained model

You can deploy models even if the model is not 100% complete. A deployed model uses one GPU on the system.

To deploy the trained model, complete the following steps:

Click Models from the menu.
Select the model name that you specified in step 2a, and click Deploy.
Specify a name for the model, and click Deploy. The Deployed Models page is displayed, and the model is deployed when the status column displays Ready.
Note: Each deployed model uses one GPU on your system.
Double-click the deployed model to get the API endpoint and test other videos or images against the model.

Automatically label frames in a video

You can use the auto label function only when a model is deployed. You can use the auto label function to automatically identify objects in the frames of a video.

In this scenario, you have only nine frames. To improve the accuracy for your deep learning model, you can add more frames to the data set. Remember, you can deploy a model even though it is not 100% complete. Deploying a trained model that is not 100% complete is a quick method to add more object labels to your data set and improve the accuracy for your model.

To stop the training process, deploy a model, and use the auto label function, complete the following steps:

Click Datasets from the menu, and select the data set that you used to create the previously trained model.
Select the video in the data set that had nine frames, and click Tag Objects.
Click Auto label.
Specify how often you want to capture frames and automatically label the frames. Select the name of the trained model that you deployed in step 3, and click Auto label. In this scenario, you previously captured frames every 10 seconds. To improve the accuracy of the deep learning model by capturing and labeling more frames, you can specify 6 seconds.
After the auto label process completes, the new frames are added to the carousel. Click the new frames and verify that the objects have the correct labels. The object labels that were automatically added are green and the object labels you manually added are in blue. In this scenario, the carousel now has 17 frames.

Next steps

You can continue to refine the data set as much as you want. When you are satisfied with the data set, you can retrain the model by completing steps 1 - 3. This time when you retrain the model, you do not want to stop the training before it is 100% complete. The lines in the training model graph should converge. After the training completes, you can redeploy the model by completing steps 1 - 3. You can double-click the deployed model to get the API endpoint and test other videos or images against the model.