PowerAI Vision Inference Server
With a PowerAI Vision Inference server, you can quickly and easily deploy multiple models that were trained in PowerAI Vision to a single server. These models are portable and can be used by many users and on different systems. This allows you to make trained models available to others, such as customers or collaborators.
- Hardware requirements
- Platform requirements
- Software requirements
- Installing from IBM Passport Advantage
- Deploying a trained model
- Deployment output
- Inference
- Inference output
- Stopping a deployed model
- Decrypting a trained model
Hardware requirements
Disk space requirements- Installation - The Inference Server install package contains Docker containers for deployment on all supported platforms and requires 25 Gb to download. Only the images needed for the platform will be installed by the load_images.sh operation, but this requires at least 40 Gb available in the file system used by Docker, usually /var/lib/docker.
- Deploying a model - Models are extracted into the /tmp directory before loading. The size of the model depends on the framework, but at least 1 Gb should be available in /tmp before deploying a model.
GPU model requirements
The Inference Server is supported only on NVIDIA Tesla GPUs: T4, V100, and P100.
GPU memory requirements- For deployment, the amount of memory required depends on the type of model you want
to deploy. To determine how large a deployed GoogLeNet, Faster R-CNN, Tiny Yolo v2, or Detectron
model is, run nvidia-smi from the host after deployment. Find the
corresponding PID that correlates to the model you deployed and look at the Memory
Usage.Example:
$ nvidia-smi Tue Feb 26 09:12:59 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.29 Driver Version: 418.29 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P100-SXM2... On | 00000002:01:00.0 Off | 0 | | N/A 36C P0 39W / 300W | 1853MiB / 16280MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla P100-SXM2... On | 00000003:01:00.0 Off | 0 | | N/A 38C P0 42W / 300W | 4179MiB / 16280MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla P100-SXM2... On | 0000000A:01:00.0 Off | 0 | | N/A 63C P0 243W / 300W | 3351MiB / 16280MiB | 73% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla P100-SXM2... On | 0000000B:01:00.0 Off | 0 | | N/A 35C P0 31W / 300W | 10MiB / 16280MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 15735 C /opt/miniconda2/bin/python 958MiB | | 0 16225 C python 885MiB | | 1 39541 C python 2253MiB | | 1 86043 C /opt/miniconda2/bin/python 958MiB | | 1 86299 C /opt/miniconda2/bin/python 958MiB | | 2 103835 C /opt/miniconda2/bin/python 3341MiB | +-----------------------------------------------------------------------------+
- A custom model based on TensorFlow will take all remaining memory on a GPU. However, you can deploy it to a GPU that has at least 2GB memory.
Platform requirements
- The Inference Server can be deployed on x86 and IBM® Power Systems™ platforms.
- Detectron and SSD models require Nvidia GPUs. Other models can be deployed in CPU only environments.
Software requirements
- Linux
- Red Hat Enterprise Linux (RHEL) 7.6 (little endian).
- Ubuntu 18.04 or later.
- NVIDIA CUDA
- x86 - 10.1 or later drivers. For information, see the NVIDIA CUDA Toolkit website.
- ppc64le - 10.1 Update 1 or later drivers. For information, see the NVIDIA CUDA Toolkit website.
- Docker
- Docker must be installed. The recommended version is 1.13.1 or later. Version 1.13.1 is installed with RHEL 7.6.
- Ubuntu - Docker CE or EE 18.06.01
- When running Docker, nvidia-docker 2 is supported. For RHEL 7.6, see Using nvidia-docker 2.0 with RHEL 7.
- Unzip
- The unzip package is required on the system to deploy the zipped models.
Installing from IBM Passport Advantage
- Download the product tar file from the IBM Passport Advantage website.
- Optionally verify the downloaded product tar file by following the
appropriate steps:
- Download these files:
powerai-vision-inference-1.1.5.0.sig PowerAI_Vision_1.1.5.0_public_key.pub PowerAI_Vision_ocsp_1.1.5.0_publ_key.pub PowrAI_Vis_ocspchain_1.1.5.0_pub_key.pub
- If you want to verify the tar file by using the CISO code signing service,
run the following command and ensure that the output is Verified
OK:
openssl dgst -sha256 -verify PowerAI_Vision_1.1.5.0_public_key.pub \ > -signature powerai-vision-inference-1.1.5.0.sig powerai-vision-inference-1.1.5.0.tar.gz
- To validate the tar file with the signing certificate authority directly, run the
following command and ensure that the output includes Response verify
OK:
openssl ocsp -no_nonce -issuer PowrAI_Vis_ocspchain_1.1.5.0_pub_key.pub \ -cert PowerAI_Vision_ocsp_1.1.5.0_publ_key.pub -VAfile PowrAI_Vis_ocspchain_1.1.5.0_pub_key.pub \ -text -url http://ocsp.digicert.com -respout ocsptest
- Download these files:
- Decompress the product tar file, and run the installation command
for the platform you are installing on:
- RHEL
- sudo yum install ./<file_name>.rpm
- Ubuntu
- sudo dpkg -i ./<file_name>.deb
- Load the product Docker images with the appropriate container's tar file.
The file name has this format:
powerai-vision-inference-<arch>-containers-<release>.tar,
where <arch> is x86 or ppc64le, and <release> is the
product version being installed.
/opt/powerai-vision/dnn-deploy-service/bin/load_images.sh -f <tar_file>
PowerAI Vision Inference Server will be installed at /opt/powerai-vision/dnn-deploy-service.
Install from AAS
- Download the product tar.gz file from Advanced Administration System (AAS). This system is also called Entitled Software Support (ESS).
- Unzip and untar the tar.gz file by running this
command.
gunzip -c file_name.tar.gz | tar -xvf -
This will extract the following files:powerai-vision-inference-aas-1.1.5.0.sig
powerai-vision-inference-aas-1.1.5.0.tar.gz
vision-1.1.5.0-key.pub
vision-ocsp-1.1.5.0-key.pub
vision-ocspchain-1.1.5.0-key.pub - (Optional) Verify the downloaded tar file:
- To verify the tar file by using the CISO code signing service, run the following command and
ensure that the output is Verified
OK:
openssl dgst -sha256 -verify vision-1.1.5.0-key.pub \ > -signature powerai-vision-inference-aas-1.1.5.0.sig powerai-vision-inference-aas-1.1.5.0.tar.gz
- To validate the tar file with the signing certificate authority directly,
run the following command and ensure that the output includes Response verify
OK:
openssl ocsp -no_nonce -issuer vision-ocspchain-1.1.5.0-key.pub \ -cert vision-ocsp-1.1.5.0-key.pub -VAfile vision-ocspchain-1.1.5.0-key.pub \ -text -url http://ocsp.digicert.com -respout ocsptest
- To verify the tar file by using the CISO code signing service, run the following command and
ensure that the output is Verified
OK:
- Unzip and untar the powerai-vision-inference-aas-1.1.5.tar.gz file by running this command:
gunzip -c file_name.tar.gz | tar -xvf -
The install files are extracted to powerai-vision-inference-aas-1.1.5.0/.
- Decompress the product tar file, and run the installation command
for the platform you are installing on:
- RHEL
- sudo yum install ./<file_name>.rpm
- Ubuntu
- sudo dpkg -i ./<file_name>.deb
- Load the product Docker images with the appropriate container's tar file.
The file name has this format:
powerai-vision-inference-<arch>-containers-<release>.tar,
where <arch> is x86 or ppc64le, and <release> is the
product version being installed.
/opt/powerai-vision/dnn-deploy-service/bin/load_images.sh -f <tar_file>
PowerAI Vision Inference Server will be installed at /opt/powerai-vision/dnn-deploy-service.
Deploying a trained model
- Object detection using Faster R-CNN (default), tiny-YOLO V2, Detectron, Single Shot Detector (SSD) ((POWER only; x86 deployment not supported), custom TensorFlow models, and Keras models.
- Image classification using GoogLeNet (default) and custom TensorFlow models.
The model to be deployed must have been trained and exported via PowerAI Vision. see "Exporting a model" in Importing, exporting, and downloading PowerAI Vision information. To deploy a model, run this command:
/opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh
./deploy_zip_model.sh -m <model-name> -p <port> -g <gpu> -t <time-limit> zipped_model_file
- model-name
- The docker container name for the deployed model.
- port
- The port to deploy the model to.
- gpu
- The GPU to deploy the model to. If specified as -1, the model will be deployed to a
CPU.Note: Detectron and SSD models cannot be deployed to a CPU.
- time-limit
- (Optional) Specify the time out limit for model deployment in seconds. The default value is 180 seconds.
- zipped_model_file
- The full path and file name of the trained model that was exported from PowerAI Vision. It can be an image classification model or an object detection model, but must be in zip format.
Examples:
/opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh --model dog --port 6001 --gpu 1 ./dog_classification.zip
/opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh --m car -p 6002 -g -1 /home/user/mydata/car.zip
/opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh -m coco -p 6001 -g 1 /home/user/model/new_models/cdb-coco-30k_model.zip
Deployment output
There are several different results you might see when you deploy a model. For example:
- Success
- If a model is deployed successfully, it reports back with the message "Successfully deployed
model."
/opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh -m coco -p 6001 -g 1 /home/user/model/new_models/cdb-coco-30k_model.zip Successfully deployed model. Deployed in 22 seconds
- Failure
- If the deployment fails, it reports back with log information from the docker container,
including error messages regarding the failure. Some possible error examples follow. See Troubleshooting known issues - PowerAI Vision Inference Server for details about dealing with
errors.
- Ran out of GPU memory
root@hostname ~]# /opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh -m user_detectron_cars8 -p 7018 -g 1 /root/inference-only-testing/cars_detectron_model.zip Deployment failed. Here are logs before the failure: File "/opt/detectron/detectron/core/test_engine.py", line 331, in initialize_model_from_cfg model, weights_file, gpu_id=gpu_id, File "/opt/detectron/detectron/utils/net.py", line 112, in initialize_gpu_from_weights_file src_blobs[src_name].astype(np.float32, copy=False)) File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 321, in FeedBlob return C.feed_blob(name, arr, StringifyProto(device_option)) RuntimeError: [enforce fail at context_gpu.cu:359] error == cudaSuccess. 2 vs 0. Error at: /tmp/pytorch/caffe2/core/context_gpu.cu:359: out of memory root : INFO Callback message: {'msgId': '6ef7e371-1209-47b3-94c3-940640324ac8', 'msgReturnCode': 'ErrModelLoading', 'msgDesc': 'Traceback (most recent call last):\n File "/opt/DNN/dnn/deploy_process.py", line 165, in modelLoading\n self.caller.onModelLoading()\n File "/opt/DNN/dnn_impl/cod_detectron/deploy_service.py", line 64, in onModelLoading\n self.model = infer_engine.initialize_model_from_cfg(self.deploy)\n File "/opt/detectron/detectron/core/test_engine.py", line 331, in initialize_model_from_cfg\n model, weights_file, gpu_id=gpu_id,\n File "/opt/detectron/detectron/utils/net.py", line 112, in initialize_gpu_from_weights_file\n src_blobs[src_name].astype(np.float32, copy=False))\n File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 321, in FeedBlob\n return C.feed_blob(name, arr, StringifyProto(device_option))\nRuntimeError: [enforce fail at context_gpu.cu:359] error == cudaSuccess. 2 vs 0. Error at: /tmp/pytorch/caffe2/core/context_gpu.cu:359: out of memory \n', 'msgState': 'aborted', 'msgTime': 1551801403956} root : INFO Wait 5s for messaging completed... [root@hostname ~]#
- Invalid GPU ID
specified
[root@hostname ~]# /opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh -m user_detectron_cars8 -p 7018 -g 5 /root/inference-only-testing/cars_detectron_model.zip Deployment failed. Here are logs before the failure: Failed building wheel for nvidia-ml-py Running setup.py clean for nvidia-ml-py Failed to build nvidia-ml-py Installing collected packages: nvidia-ml-py Running setup.py install for nvidia-ml-py: started Running setup.py install for nvidia-ml-py: finished with status 'done' Successfully installed nvidia-ml-py-375.53.1 You are using pip version 8.1.1, however version 19.0.3 is available. You should consider upgrading via the 'pip install --upgrade pip' command. Cannot find gpu 5. [root@hostname ~]#
- Processing was
interrupted:
/usr/bin/docker-current: Error response from daemon: Conflict. The container name "/decrypt" is already in use by container ec0932898a65b82ed47504c8baa2507046d7bb0fcf460405d6201d3088bc9731. You have to remove (or rename) that container to be able to reuse that name.
To fix the problem, run these commands:docker stop decrypt docker rm decrypt
- Tried to deploy a Detectron model on a
CPU:
[root@hostname ~]# /opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh -m user_detectron_cars8 -p 7018 -g -1 /root/inference-only-testing/cars_detectron_model.zip Deployment failed. Here are logs before the failure: Failed building wheel for nvidia-ml-py Running setup.py clean for nvidia-ml-py Failed to build nvidia-ml-py Installing collected packages: nvidia-ml-py Running setup.py install for nvidia-ml-py: started Running setup.py install for nvidia-ml-py: finished with status 'done' Successfully installed nvidia-ml-py-375.53.1 You are using pip version 8.1.1, however version 19.0.3 is available. You should consider upgrading via the 'pip install --upgrade pip' command. We currently do not support CPU mode for Detectron models. [root@hostname ~]#
- Deployment times
out:
[root@hostname ~]# /opt/powerai-vision/dnn-deploy-service/bin/deploy_zip_model.sh -t 15 -m user_custom_cars3 -p 7008 -g -1 /root/inference-only-testing/cars_keras-frcnn_custom_model.zip Deployment timed out at 15 seconds
If the deployment times out, increase the time limit by using the -t option.
- Ran out of GPU memory
Inference
Inference can be done by using the deployed model with a local image file or a URL to an uploaded image file.
Optional Parameters:
- confthre
- Confidence threshold. Specify a value in the range [0.0,1.0], treated as a percentage. Only results with a confidence greater than the specified threshold are returned. The smaller confidence threshold you specify, the more results are returned. If you specify 0, many, many results will be returned because there is no filter based on the confidence level of the model. The default value is 0.5.
- containRle
- This option is only available for Detectron models. If this is true, the inference output will include RLEs of the segments. The default value is false.
- containPolygon
- This option is only available for Detectron models. If it is set to true, the polygon for the segments is included in the output. The default value is true.
GET method:
- imageurl
- The URL address of the image. The URL must start with http:// or https://.
Example:
curl -G -d "imageurl=https://ibm.box.com/shared/static/i98xa4dfpff6jwv0lxmcu4lybr8b5kxj.jpg&confthre=0.7&containPolygon=false&containRle=true" http://localhost:5000/inference
POST method:
- imagefile
- The name of the image file to be used for inference.
Example:
curl -F "imagefile=@$DIR/data/bird.jpg" \
-F "confthre=0.7" \
-F "containPolygon=false" \
-F "containRle=true" \
http://localhost:5000/inference
Example
1 -
Classification:curl -F "imagefile=@/home/testdata/cocker-spaniel-dogs-puppies-1.jpg" http://localhost:6001/inference
Example 2 - Object detection:
curl -G -d "imageurl=https://assets.imgix.net/examples/couple.jpg" http://localhost:6002/inference
curl -F "imagefile=@/home/testdata/Chihuahua.jpeg" –F "confthre=0.8" http://localhost:6001/inference
curl -F "imagefile=@/home/user/model/new_models/pics/cars.jpg" -F "confthre=0.98" http://localhost:6001/inference
curl -F "imagefile=@/home/user/model/new_models/pics/cars.jpg" -F "confthre=0.98" -F "containRle=true" -F "containPolygon=false" http://localhost:6001/inference
Inference output
The PowerAI Vision Inference Server can deploy image classification and object detection models.
- Image classification model
- A successful classification will report something similar to the following:Example 1 output - success
The image has been classified as a Cocker Spaniel with a confidence of .93.{"classified": {"Cocker Spaniel": 0.93}, "result": "success"}
Example 1 output - fail
The image could not be classified. This might happen if the image could not be loaded, for example.{"result": "fail"}
- Object detection model
- A successful detection will report something similar to the following:Example 2 output - success
The cars in the image are located at the specified coordinates. The confidence of each label is given.{"classified": [{"confidence": 0.94, "ymax": 335, "label": "car", "xmax": 576, "xmin": 424, "ymin": 160, "attr": []}], "result": "success"}
Example 2 output - success
Object detection was carried out successfully, but there was nothing to be labeled that has confidence above the threshold.{"classified": [], "result": "success"}
Example 2 output - fail
Objects could not be detected. This might happen if the image could not be loaded, for example.{"result": "fail"}
Example 4 output - success
The output includes a rectangle and polygon.
{"classified": [{"confidence": 0.9874554872512817, "ymax": 244, "label": "car", "xmax": 391, "xmin": 291, "ymin": 166, "polygons": [[[325, 170], [322, 172], [318, 172], [311, 178], [311, 181], [300, 189], [297, 189], [289, 195], [289, 232], [297, 238], [297, 240], [304, 246], [307, 246], [315, 240], [322, 240], [325, 238], [369, 238], [372, 240], [387, 240], [394, 235], [394, 198], [387, 192], [387, 189], [383, 187], [383, 184], [376, 178], [376, 175], [372, 172], [369, 172], [365, 170]]]}], "result": "success"}
Example 5 output - success
The output includes a rectangle and rle.
{"classified": [{"confidence": 0.9874554872512817, "ymax": 244, "rle": "RXb3h0e;e0^O2nDcNl:b1O1O0O2O00000O100O1O1N2O1N2O1N2O1O001O10O01O1000O010000O100000000O1000000000O10000000000000000000000000000000000000000000000000000000000000001O00010O001O001O1O1O1O100O1O1O1O2N1O2N2N100N2O2M2N4Kmm2", "label": "car", "xmax": 391, "xmin": 291, "ymin": 166}], "result": "success"}
Stopping a deployed model
docker stop <model-name>
docker rm <model-name>
docker stop dog
docker rm dog
docker stop car
docker rm car
Decrypting a trained model
Models trained and exported by version 1.1.4 and earlier versions of PowerAI Vision are encrypted and are intended for deployment in PowerAI Vision Training and Inference or Inference Server products. Starting with version 1.1.5, trained and exported models are not encrypted.
You can decrypt a model that was trained with PowerAI Vision 1.1.4 or earlier by running decrypt_zip_model. This will allow data scientists to understand the weights and networks configured by PowerAI Vision and possibly use that information to further train the model. The decrypted model can also be used to port these models to edge devices not supported by PowerAI Vision.
Usage: /opt/powerai-vision/dnn-deploy-service/bin/decrypt_zip_model.sh [-h|--help] | [ [-o string ] model_file.zip]
- output
- Specifies the file name for the output decrypted model.
- model_file
- A trained model exported from PowerAI Vision.
Example:
/opt/powerai-vision/dnn-deploy-service/bin/decrypt_zip_model.sh -o car_frcnn_decrypted.zip car_frcnn.zip
This will generate a new zip file car_frcnn_decrypted.zip, which is not password protected.