PowerAI Vision Inference Server

With a PowerAI Vision Inference server, you can quickly and easily deploy multiple models that were trained in PowerAI Vision to a single server. These models are portable and can be used by many users and on different systems. This allows you to make trained models available to others, such as customers or collaborators.

Hardware requirements
Platform requirements
Software requirements
Installing from IBM Passport Advantage
Deploying a trained model
Deployment output
Inference
Inference output
Stopping a deployed model
Decrypting a trained model

Hardware requirements

Disk space requirements

Installation - The Inference Server install package contains Docker containers for deployment on all supported platforms and requires 25 Gb to download. Only the images needed for the platform will be installed by the load_images.sh operation, but this requires at least 40 Gb available in the file system used by Docker, usually /var/lib/docker.
Deploying a model - Models are extracted into the /tmp directory before loading. The size of the model depends on the framework, but at least 1 Gb should be available in /tmp before deploying a model.

GPU model requirements

The Inference Server is supported only on NVIDIA Tesla GPUs: T4, V100, and P100.

GPU memory requirements

For deployment, the amount of memory required depends on the type of model you want to deploy. To determine how large a deployed GoogLeNet, Faster R-CNN, Tiny Yolo v2, or Detectron model is, run nvidia-smi from the host after deployment. Find the corresponding PID that correlates to the model you deployed and look at the Memory Usage.

Example:

$ nvidia-smi
Tue Feb 26 09:12:59 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.29       Driver Version: 418.29       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-SXM2...  On   | 00000002:01:00.0 Off |                    0 |
| N/A   36C    P0    39W / 300W |   1853MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-SXM2...  On   | 00000003:01:00.0 Off |                    0 |
| N/A   38C    P0    42W / 300W |   4179MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-SXM2...  On   | 0000000A:01:00.0 Off |                    0 |
| N/A   63C    P0   243W / 300W |   3351MiB / 16280MiB |     73%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-SXM2...  On   | 0000000B:01:00.0 Off |                    0 |
| N/A   35C    P0    31W / 300W |     10MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     15735      C   /opt/miniconda2/bin/python                   958MiB |
|    0     16225      C   python                                       885MiB |
|    1     39541      C   python                                      2253MiB |
|    1     86043      C   /opt/miniconda2/bin/python                   958MiB |
|    1     86299      C   /opt/miniconda2/bin/python                   958MiB |
|    2    103835      C   /opt/miniconda2/bin/python                  3341MiB |
+-----------------------------------------------------------------------------+

A custom model based on TensorFlow will take all remaining memory on a GPU. However, you can deploy it to a GPU that has at least 2GB memory.

Platform requirements

The Inference Server can be deployed on x86 and IBM® Power Systems™ platforms.
Detectron and SSD models require Nvidia GPUs. Other models can be deployed in CPU only environments.

Software requirements

Linux

Red Hat Enterprise Linux (RHEL) 7.6 (little endian).
Ubuntu 18.04 or later.

NVIDIA CUDA

x86 - 10.1 or later drivers. For information, see the NVIDIA CUDA Toolkit website.
ppc64le - 10.1 Update 1 or later drivers. For information, see the NVIDIA CUDA Toolkit website.

Docker

Docker must be installed. The recommended version is 1.13.1 or later. Version 1.13.1 is installed with RHEL 7.6.
Ubuntu - Docker CE or EE 18.06.01
When running Docker, nvidia-docker 2 is supported. For RHEL 7.6, see Using nvidia-docker 2.0 with RHEL 7.

Unzip

The unzip package is required on the system to deploy the zipped models.

Installing from IBM Passport Advantage

Download the product tar file from the IBM Passport Advantage website.

Optionally verify the downloaded product tar file by following the appropriate steps:

Download these files:

powerai-vision-inference-1.1.5.0.sig
PowerAI_Vision_1.1.5.0_public_key.pub
PowerAI_Vision_ocsp_1.1.5.0_publ_key.pub
PowrAI_Vis_ocspchain_1.1.5.0_pub_key.pub

If you want to verify the tar file by using the CISO code signing service, run the following command and ensure that the output is Verified OK:

openssl dgst -sha256 -verify PowerAI_Vision_1.1.5.0_public_key.pub \
> -signature powerai-vision-inference-1.1.5.0.sig powerai-vision-inference-1.1.5.0.tar.gz

To validate the tar file with the signing certificate authority directly, run the following command and ensure that the output includes Response verify OK:

openssl ocsp -no_nonce -issuer PowrAI_Vis_ocspchain_1.1.5.0_pub_key.pub \
  -cert PowerAI_Vision_ocsp_1.1.5.0_publ_key.pub -VAfile PowrAI_Vis_ocspchain_1.1.5.0_pub_key.pub \
  -text -url http://ocsp.digicert.com -respout ocsptest

Decompress the product tar file, and run the installation command for the platform you are installing on:
RHEL

sudo yum install ./<file_name>.rpm

Ubuntu

sudo dpkg -i ./<file_name>.deb
Load the product Docker images with the appropriate container's tar file. The file name has this format: powerai-vision-inference-<arch>-containers-<release>.tar, where <arch> is x86 or ppc64le, and <release> is the product version being installed.
```
/opt/powerai-vision/dnn-deploy-service/bin/load_images.sh -f <tar_file>
```
PowerAI Vision Inference Server will be installed at /opt/powerai-vision/dnn-deploy-service.

Install from AAS

Download the product tar.gz file from Advanced Administration System (AAS). This system is also called Entitled Software Support (ESS).
Unzip and untar the tar.gz file by running this command.
```
gunzip -c file_name.tar.gz | tar -xvf
    -
```
This will extract the following files:
powerai-vision-inference-aas-1.1.5.0.sig
powerai-vision-inference-aas-1.1.5.0.tar.gz
vision-1.1.5.0-key.pub
vision-ocsp-1.1.5.0-key.pub
vision-ocspchain-1.1.5.0-key.pub

(Optional) Verify the downloaded tar file:

To verify the tar file by using the CISO code signing service, run the following command and ensure that the output is Verified OK:

openssl dgst -sha256 -verify vision-1.1.5.0-key.pub \
> -signature powerai-vision-inference-aas-1.1.5.0.sig powerai-vision-inference-aas-1.1.5.0.tar.gz

To validate the tar file with the signing certificate authority directly, run the following command and ensure that the output includes Response verify OK:

openssl ocsp -no_nonce -issuer vision-ocspchain-1.1.5.0-key.pub \
  -cert vision-ocsp-1.1.5.0-key.pub -VAfile vision-ocspchain-1.1.5.0-key.pub \
  -text -url http://ocsp.digicert.com -respout ocsptest

Unzip and untar the powerai-vision-inference-aas-1.1.5.tar.gz file by running this command:
```
gunzip -c file_name.tar.gz | tar -xvf -
```
The install files are extracted to powerai-vision-inference-aas-1.1.5.0/.
Decompress the product tar file, and run the installation command for the platform you are installing on:
RHEL

sudo yum install ./<file_name>.rpm

Ubuntu

sudo dpkg -i ./<file_name>.deb
Load the product Docker images with the appropriate container's tar file. The file name has this format: powerai-vision-inference-<arch>-containers-<release>.tar, where <arch> is x86 or ppc64le, and <release> is the product version being installed.
```
/opt/powerai-vision/dnn-deploy-service/bin/load_images.sh -f <tar_file>
```
PowerAI Vision Inference Server will be installed at /opt/powerai-vision/dnn-deploy-service.

Deploying a trained model

The following types of models of models trained in PowerAI Vision can be deployed:

Object detection using Faster R-CNN (default), tiny-YOLO V2, Detectron, Single Shot Detector (SSD) ((POWER only; x86 deployment not supported), custom TensorFlow models, and Keras models.
Image classification using GoogLeNet (default) and custom TensorFlow models.

The model to be deployed must have been trained and exported via PowerAI Vision. see "Exporting a model" in Importing, exporting, and downloading PowerAI Vision information. To deploy a model, run this command:

/opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh

Note: The first time you run this command, you are prompted to accept the license agreement.

Usage:

./deploy_zip_model.sh -m <model-name> -p <port> -g <gpu> -t <time-limit> zipped_model_file

model-name: The docker container name for the deployed model.
port: The port to deploy the model to.
gpu: The GPU to deploy the model to. If specified as -1, the model will be deployed to a CPU.
Note: Detectron and SSD models cannot be deployed to a CPU.
time-limit: (Optional) Specify the time out limit for model deployment in seconds. The default value is 180 seconds.
zipped_model_file: The full path and file name of the trained model that was exported from PowerAI Vision. It can be an image classification model or an object detection model, but must be in zip format.

Examples:

/opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh --model dog --port 6001 --gpu 1 ./dog_classification.zip 
/opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh --m car -p 6002 -g -1 /home/user/mydata/car.zip
/opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh -m coco -p 6001 -g 1 /home/user/model/new_models/cdb-coco-30k_model.zip

Deployment output

There are several different results you might see when you deploy a model. For example:

Success

If a model is deployed successfully, it reports back with the message "Successfully deployed model."

/opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh -m coco -p 6001 -g 1 /home/user/model/new_models/cdb-coco-30k_model.zip 

Successfully deployed model.

Deployed in 22 seconds

Failure

If the deployment fails, it reports back with log information from the docker container, including error messages regarding the failure. Some possible error examples follow. See Troubleshooting known issues - PowerAI Vision Inference Server for details about dealing with errors.

Ran out of GPU memory

root@hostname ~]# /opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh -m user_detectron_cars8 -p 7018 -g 1 /root/inference-only-testing/cars_detectron_model.zip
Deployment failed. Here are logs before the failure:
  File "/opt/detectron/detectron/core/test_engine.py", line 331, in initialize_model_from_cfg
    model, weights_file, gpu_id=gpu_id,
  File "/opt/detectron/detectron/utils/net.py", line 112, in initialize_gpu_from_weights_file
    src_blobs[src_name].astype(np.float32, copy=False))
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 321, in FeedBlob
    return C.feed_blob(name, arr, StringifyProto(device_option))
RuntimeError: [enforce fail at context_gpu.cu:359] error == cudaSuccess. 2 vs 0. Error at: /tmp/pytorch/caffe2/core/context_gpu.cu:359: out of memory
root        : INFO     Callback message: {'msgId': '6ef7e371-1209-47b3-94c3-940640324ac8', 'msgReturnCode': 'ErrModelLoading', 'msgDesc': 'Traceback (most recent call last):\n  File "/opt/DNN/dnn/deploy_process.py", line 165, in modelLoading\n    self.caller.onModelLoading()\n  File "/opt/DNN/dnn_impl/cod_detectron/deploy_service.py", line 64, in onModelLoading\n    self.model = infer_engine.initialize_model_from_cfg(self.deploy)\n  File "/opt/detectron/detectron/core/test_engine.py", line 331, in initialize_model_from_cfg\n    model, weights_file, gpu_id=gpu_id,\n  File "/opt/detectron/detectron/utils/net.py", line 112, in initialize_gpu_from_weights_file\n    src_blobs[src_name].astype(np.float32, copy=False))\n  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 321, in FeedBlob\n    return C.feed_blob(name, arr, StringifyProto(device_option))\nRuntimeError: [enforce fail at context_gpu.cu:359] error == cudaSuccess. 2 vs 0. Error at: /tmp/pytorch/caffe2/core/context_gpu.cu:359: out of memory \n', 'msgState': 'aborted', 'msgTime': 1551801403956}
root        : INFO     Wait 5s for messaging completed...
[root@hostname ~]#

Invalid GPU ID specified

[root@hostname ~]# /opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh -m user_detectron_cars8 -p 7018 -g 5 /root/inference-only-testing/cars_detectron_model.zip
        Deployment failed. Here are logs before the failure:
  Failed building wheel for nvidia-ml-py
  Running setup.py clean for nvidia-ml-py
Failed to build nvidia-ml-py
Installing collected packages: nvidia-ml-py
  Running setup.py install for nvidia-ml-py: started
    Running setup.py install for nvidia-ml-py: finished with status 'done'
Successfully installed nvidia-ml-py-375.53.1
You are using pip version 8.1.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Cannot find gpu 5.
[root@hostname ~]#

Processing was interrupted:

/usr/bin/docker-current: Error response from daemon: Conflict. The container name "/decrypt" is already in use by container ec0932898a65b82ed47504c8baa2507046d7bb0fcf460405d6201d3088bc9731. 
You have to remove (or rename) that container to be able to reuse that name.

To fix the problem, run these commands:

docker stop decrypt
docker rm decrypt

Tried to deploy a Detectron model on a CPU:

[root@hostname ~]# /opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh -m user_detectron_cars8 -p 7018 -g -1 /root/inference-only-testing/cars_detectron_model.zip
Deployment failed. Here are logs before the failure:
  Failed building wheel for nvidia-ml-py
  Running setup.py clean for nvidia-ml-py
Failed to build nvidia-ml-py
Installing collected packages: nvidia-ml-py
  Running setup.py install for nvidia-ml-py: started
    Running setup.py install for nvidia-ml-py: finished with status 'done'
Successfully installed nvidia-ml-py-375.53.1
You are using pip version 8.1.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
We currently do not support CPU mode for Detectron models.
[root@hostname ~]#

Deployment times out:

[root@hostname ~]# /opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh -t 15 -m user_custom_cars3 -p 7008 -g -1 /root/inference-only-testing/cars_keras-frcnn_custom_model.zip
Deployment timed out at 15 seconds

If the deployment times out, increase the time limit by using the -t option.

Inference

Inference can be done by using the deployed model with a local image file or a URL to an uploaded image file.

Optional Parameters:

confthre: Confidence threshold. Specify a value in the range [0.0,1.0], treated as a percentage. Only results with a confidence greater than the specified threshold are returned. The smaller confidence threshold you specify, the more results are returned. If you specify 0, many, many results will be returned because there is no filter based on the confidence level of the model. The default value is 0.5.
containRle: This option is only available for Detectron models. If this is true, the inference output will include RLEs of the segments. The default value is false.
containPolygon: This option is only available for Detectron models. If it is set to true, the polygon for the segments is included in the output. The default value is true.

GET method:

Required Parameters:

imageurl: The URL address of the image. The URL must start with http:// or https://.

Example:

 
    curl -G -d "imageurl=https://ibm.box.com/shared/static/i98xa4dfpff6jwv0lxmcu4lybr8b5kxj.jpg&confthre=0.7&containPolygon=false&containRle=true" http://localhost:5000/inference

POST method:

Required Parameters:

imagefile: The name of the image file to be used for inference.

Example:

    curl -F "imagefile=@$DIR/data/bird.jpg" \
         -F "confthre=0.7" \
         -F "containPolygon=false" \
         -F "containRle=true" \
         http://localhost:5000/inference

Example 1 - Classification:

curl -F "imagefile=@/home/testdata/cocker-spaniel-dogs-puppies-1.jpg" http://localhost:6001/inference

Example 2 - Object detection:

curl -G -d "imageurl=https://assets.imgix.net/examples/couple.jpg" http://localhost:6002/inference

Example 3 – Object detection of a tiny YOLO model with confidence threshold:

curl -F "imagefile=@/home/testdata/Chihuahua.jpeg" –F "confthre=0.8" http://localhost:6001/inference

Note: Confidence threshold works for Faster R-CNN, Detectron, and tiny YOLO object detection models and GoogLeNet image classification models.

Example 4 - Object detection of a Detectron model that contains polygon segments instead of RLEs (default setting)

curl -F "imagefile=@/home/user/model/new_models/pics/cars.jpg" -F "confthre=0.98" http://localhost:6001/inference

Example 5 - Object detection of a Detectron model that contains RLE segments instead of a polygon:

curl -F "imagefile=@/home/user/model/new_models/pics/cars.jpg" -F "confthre=0.98" -F "containRle=true" -F "containPolygon=false" http://localhost:6001/inference

Inference output

The PowerAI Vision Inference Server can deploy image classification and object detection models.

Image classification model

A successful classification will report something similar to the following:

Example 1 output - success

{"classified": {"Cocker Spaniel": 0.93}, "result": "success"}

The image has been classified as a Cocker Spaniel with a confidence of .93.

Example 1 output - fail

{"result": "fail"}

The image could not be classified. This might happen if the image could not be loaded, for example.

Object detection model

A successful detection will report something similar to the following:

Example 2 output - success

{"classified": [{"confidence": 0.94, "ymax": 335, "label": "car", "xmax": 576, 
                  "xmin": 424, "ymin": 160, "attr": []}], "result": "success"}

The cars in the image are located at the specified coordinates. The confidence of each label is given.

Example 2 output - success

{"classified": [], "result": "success"}

Object detection was carried out successfully, but there was nothing to be labeled that has confidence above the threshold.

Example 2 output - fail

{"result": "fail"}

Objects could not be detected. This might happen if the image could not be loaded, for example.

Example 4 output - success

The output includes a rectangle and polygon.

{"classified": [{"confidence": 0.9874554872512817, "ymax": 244, "label": "car", "xmax": 391, "xmin": 291, "ymin": 166, "polygons": [[[325, 170], [322, 172], [318, 172], [311, 178], [311, 181], [300, 189], [297, 189], [289, 195], [289, 232], [297, 238], [297, 240], [304, 246], [307, 246], [315, 240], [322, 240], [325, 238], [369, 238], [372, 240], [387, 240], [394, 235], [394, 198], [387, 192], [387, 189], [383, 187], [383, 184], [376, 178], [376, 175], [372, 172], [369, 172], [365, 170]]]}], "result": "success"}

Example 5 output - success

The output includes a rectangle and rle.

{"classified": [{"confidence": 0.9874554872512817, "ymax": 244, "rle": "RXb3h0e;e0^O2nDcNl:b1O1O0O2O00000O100O1O1N2O1N2O1N2O1O001O10O01O1000O010000O100000000O1000000000O10000000000000000000000000000000000000000000000000000000000000001O00010O001O001O1O1O1O100O1O1O1O2N1O2N2N100N2O2M2N4Kmm2", "label": "car", "xmax": 391, "xmin": 291, "ymin": 166}], "result": "success"}

Stopping a deployed model

To stop the deployed model, run the following commands. When you stop the deployed model, the GPU memory is made available.

docker stop <model-name>
docker rm <model-name>

Example 1:

docker stop dog
docker rm dog

Example 2:

docker stop car
docker rm car

Decrypting a trained model

Models trained and exported by version 1.1.4 and earlier versions of PowerAI Vision are encrypted and are intended for deployment in PowerAI Vision Training and Inference or Inference Server products. Starting with version 1.1.5, trained and exported models are not encrypted.

You can decrypt a model that was trained with PowerAI Vision 1.1.4 or earlier by running decrypt_zip_model. This will allow data scientists to understand the weights and networks configured by PowerAI Vision and possibly use that information to further train the model. The decrypted model can also be used to port these models to edge devices not supported by PowerAI Vision.

Usage:/opt/powerai-vision/dnn-deploy-service/bin/decrypt_zip_model.sh [-h|--help] | [ [-o string ] model_file.zip]

output: Specifies the file name for the output decrypted model.
model_file: A trained model exported from PowerAI Vision.

Example:

/opt/powerai-vision/dnn-deploy-service/bin/decrypt_zip_model.sh -o car_frcnn_decrypted.zip car_frcnn.zip

This will generate a new zip file car_frcnn_decrypted.zip, which is not password protected.