Running the containers

IMPORTANT: Before running a container, review the Prerequisites topic.

There are different options for deploying the Watson NLP Runtime image. Deployment modes have different advantages and disadvantages, depending on your organization's needs.

Deployment modes

Standalone deployment

The standalone deployment option allows you to build a docker volume out of pretrained model images and mount it into a running Watson NLP Runtime container. When the container runs it will expose REST and gRPC endpoints that client programs can use to make inference requests. This is the quickstart, local deployment option and is good for demonstrating the capabilities of Watson NLP.

See Run with Docker run to deploy this way.

Single image serverless deployment

Single image serverless deployments provide a custom image build with a minimal set of library dependencies and pretrained models that can be deployed to a runtime offering such as IBM Cloud Code Engine, AWS Fargate, or Microsoft Azure Serverless.

Advantages of this type of deployment include:

No infrastructure
Minimal deliverable assets
Automated horizontal scaling

Disadvantages to this type of deployment include:

Per-model scaling requires separate deployment
No in-place model upgrades
No dynamic custom models
A single large image with models "baked in"

See Run with a serverless container runtime offering to deploy this way.

Static model multi-pod deployment

Static model multi-pod deployments on Kubernetes are similar to Serverless deployments, serving a static set of pretrained models with no extra dependencies other than a vanilla Kubernetes cluster.

Advantages of this type of deployment include:

Kubernetes as the only requirement
Minimal deliverable assets
Easy integration with other Kubernetes deployments

Disadvantages to this type of deployment include:

No per-model scaling
No in-place model upgrades
No dynamic custom models
Manual Kubernetes management

See Run with Kubernetes to deploy this way.

Dynamic model multi-pod deployment

Deployments using KServe ModelMesh Serving provide a cloud-native API for dynamic model management, and can be configured to handle internet-scale load. Advanced load-balancing and scaling techniques manage replicas of individual models across a serving cluster and automatically adjust to changing demand. This is IBM's recommended deployment method for production use of Watson NLP models.

You may also choose to use Knative Serving, an open-source, Enterprise-level solution to build serverless and event-driven applications in Kubernetes/OpenShift cluster. Knative Serving supports horizontal autoscaling based on the requests that come into a service, allowing the service to scale down to zero replicas.

Advantages of this type of deployment include:

Dynamic model management
Per-model replication within a cluster
In-place model upgrades
Cloud-native resource management
Support for multiple heterogeneous runtimes behind a single Service
Specially suited to use cases with:
- High volumes of models
- Unbalanced load across models
- Frequently updating models

Disadvantages to this type of deployment include:

Additional infrastructure dependencies (for example, etcd, S3 Compatible Storage)
Requires OpenShift and/or Kubernetes expertise
May require installing Kubernetes Custom Resources on your cluster
No REST API support

See Run with Kubernetes and KServe ModelMesh Serving to deploy with KServe Model Mesh Serving.

See Run with Kubernetes and Knative Serving to deploy with Knative Serving.