Running the containers
IMPORTANT: Before running a container, review the Prerequisites topic.
There are different options for deploying the Watson NLP Runtime image. Deployment modes have different advantages and disadvantages, depending on your organization's needs.
Deployment modes
Standalone deployment
The standalone deployment option allows you to build a docker volume out of pretrained model images and mount it into a running Watson NLP Runtime container. When the container runs it will expose REST and gRPC endpoints that client programs can use to make inference requests. This is the quickstart, local deployment option and is good for demonstrating the capabilities of Watson NLP.
See Run with Docker run to deploy this way.
Single image serverless deployment
Single image serverless deployments provide a custom image build with a minimal set of library dependencies and pretrained models that can be deployed to a runtime offering such as IBM Cloud Code Engine, AWS Fargate, or Microsoft Azure Serverless.
Advantages of this type of deployment include:
- No infrastructure
- Minimal deliverable assets
- Automated horizontal scaling
Disadvantages to this type of deployment include:
- Per-model scaling requires separate deployment
- No in-place model upgrades
- No dynamic custom models
- A single large image with models "baked in"
See Run with a serverless container runtime offering to deploy this way.
Static model multi-pod deployment
Static model multi-pod deployments on Kubernetes are similar to Serverless deployments, serving a static set of pretrained models with no extra dependencies other than a vanilla Kubernetes cluster.
Advantages of this type of deployment include:
- Kubernetes as the only requirement
- Minimal deliverable assets
- Easy integration with other Kubernetes deployments
Disadvantages to this type of deployment include:
- No per-model scaling
- No in-place model upgrades
- No dynamic custom models
- Manual Kubernetes management
See Run with Kubernetes to deploy this way.
Dynamic model multi-pod deployment
Deployments using KServe ModelMesh Serving provide a cloud-native API for dynamic model management, and can be configured to handle internet-scale load. Advanced load-balancing and scaling techniques manage replicas of individual models across a serving cluster and automatically adjust to changing demand. This is IBM's recommended deployment method for production use of Watson NLP models.
You may also choose to use Knative Serving, an open-source, Enterprise-level solution to build serverless and event-driven applications in Kubernetes/OpenShift cluster. Knative Serving supports horizontal autoscaling based on the requests that come into a service, allowing the service to scale down to zero replicas.
Advantages of this type of deployment include:
- Dynamic model management
- Per-model replication within a cluster
- In-place model upgrades
- Cloud-native resource management
- Support for multiple heterogeneous runtimes behind a single Service
- Specially suited to use cases with:
- High volumes of models
- Unbalanced load across models
- Frequently updating models
Disadvantages to this type of deployment include:
- Additional infrastructure dependencies (for example, etcd, S3 Compatible Storage)
- Requires OpenShift and/or Kubernetes expertise
- May require installing Kubernetes Custom Resources on your cluster
- No REST API support
See Run with Kubernetes and KServe ModelMesh Serving to deploy with KServe Model Mesh Serving.
See Run with Kubernetes and Knative Serving to deploy with Knative Serving.